5 november 2012

Smooth world from block data

Ephenation is a voxel based world. Everything (almost) are blocks that have a given address. Graphics based on square blocks doesn't look very good. There are ways to make it look good, e.g. the Marching Cubes algorithm.
However, this algorithm has some drawbacks. Every created triangle has a texture, and it is not trivial to decide what texture to use for triangles that span from one block type to another. Another problem is that some cubes in the world shall still be shown as cubes. This leads to difficulties in the transition between smooth terrain and cubistic blocks.

Use of a filter

In Ephenation, it was decided to use another algorithm, with similarities to a low pass filter. The basic idea is to take every coordinate and add a specific delta to each of the three dimensions. The magnitude of the delta is determined as a function of neighbor blocks. A two dimensional example of this could be:
Delta in X dimension
Next, if the same delta is applied in 'y', we get:
Delta in Y dimension
The principle is simple; the delta is computed in every dimension and can be applied independently.

Three dimensions

When computing the delta for 3 dimensions, neighbor vertices has to be take into account from 3 dimensions.

In the figure, there are 8 cubes. They are initially uniformly distributed. The content of these 8 cubes are used to determine the delta of the point P, in the middle, for each of the X, Y and Z dimensions. Only one filter is defined for one dimension, and then the transformation is rotated in three different directions. The algorithm uses a small matrix, of the size 2x2x2, that is initialized with the content of the 8 cubes. The delta computation uses that matrix to determine the delta in one dimension. And then this small matrix is filled with the same content, but rotated for the other dimensions, and the test is repeated.

Merging normals

After applying the delta, it is possible to change the appearance without changing the geometry. If the normals for all triangles that meet at the same vertex is replaced by an average, then it will further increase the smoothness look of the world. To speed up this process, the vertex data is sorted on vertex position. A std::multiset is used, with pointers to the vertex data. This set of pointers is then sorted. A question is whether normals from different materials shall also be averaged. This is not done currently.

There are some special cases that need to be taken care of. For example, it may be that the sum of the normals is a null vector. When that happens, the normals are simply left unchanged.
Non modified normals
And with normals merged, the same geometry will be:
Merged normals


The amount of the delta has to be calibrated. See below a video clip where the delta goes from 0 to 0.17.

There is a middle point where the slope is 45 degrees, which corresponds to a calibration constant of 0.125. There is no smooth transition between bitmaps of different kinds. It can be seen in this video clip as a checker pattern. To get smooth transition between different textures is outside of the scope here, and it is not currently implemented in Ephenation.

Texture mapping

Texture UV mapping in a tiled world is trivial. For example, the front face of a cube can be mapped as shown in the picture.
Default UV mapping

When a smoothing filter is applied (coordinates are modified with a delta), it would seem trivial to compute a new UV-mapping. If the new height is decreased from 1 to 0.8 and the lower left is raised by 0.2, then the upper left corner would now be mapped to 0,0.8 instead.
Delta applied on left side
However, there may also be delta added to the Z component. Different deltas can be added to the top and bottom. For the left border in the figure above, the height would still be 0.6 units, but the total length of the left border may be longer as it is could be leaning forward in Z. There are extreme cases, where the total height is approaching 0, but the difference in Z for the left side corners are growing dominant. If only X and Y are used for UV mapping, then the bitmap will appear stretched. This has not been taken care of yet.
The red material is stretched

Chunk borders

In Ephenation, all blocks are organized into chunks. A chunk consists of 32x32x32 blocks organized in a matrix. With a given chunk, it is easy to find individual blocks, as well as adjacent blocks.. But when analyzing blocks at the chunk border, data have to be fetched from another chunk. To simplify this process, the list of neighbor chunks are first prepared. This part of the algorithm is not complete yet, and the border between chunks can be seen as having no delta applied.

Random noise

There is yet another way to improve the realism, and that is to add a random value to the delta. 2D simplex noise is used to add a random noise to the height. The drawback with 2D noise that do not use the height component, is that it will generate the same height delta for all coordinates with the same horizontal coordinates. But that is an acceptable simplification, as it won't be noticeable unless floors above each other are compared. The drawback with 3D simplex noise is that it is more costly.

Special care has to be taken for some materials, like water. It would not look natural to have permanent slopes and hills in the water.


All together, it is quite a lot of computation that is done to every chunk. With a long viewing distance, a lot of chunks are needed. A thread pool is used for this, using available cores on the CPU. Still, with a 1 or 2 core CPU, the cost can be too high.

Update history

2012-11-11 Use video clip from YouTube instead, to get higher resolution.

26 oktober 2012

Deferred shader part 2

This is a description how deferred shading is used in Ephenation. The algorithm has been separated into a number of stages, that each will update one or more data buffers. The stages and the buffers used for these stages are explained below. Most of the links refer to the actual source code.


Green box: Shader execution
Blue box: Data stored in a bitmap (of varying formats)
Solid arrow: Indicates data is used
Wide orange arrow: Data is updated


There is a depth, diffuse, normal and position buffer allocated. These are common in many deferred shaders, usually called a G buffer, which is described in more detail in part 1.

Light buffer

All lighting effects are added up in a separate channel of type R16F. It is a bitmap texture allocated as follows:
glTexImage2D(GL_TEXTURE_2D, 0, GL_R16F, w, h, 0, G_RED, 0);  
The advantage of having only one channel is that less memory is needed. The disadvantage is that light contributions can not have separate effects for the different colors. The values will typically range from 0 to 1, but can exceed 1 if there are more than one light source (e.g. sun and lamp). Players can create any number of light sources, so it is important that this is displayed satisfactorily (see section about tone mapping below).

The buffer is initialized to 0, which would give a completely dark world unless some light is added.
Light map, using red channel

Blend data

This is a GL_RGBA buffer for separately managing blending. See more details about the blending stage below.

Shadow map

A depth buffer with information about the distance from the sun to the world. This is not described in detail in this article. More information can be found at www.opengl-tutorial.org.
Shadow map
The projection matrix is an orthogonal projection, not a perspective projection.

Note that the shadow map uses variable resolution (defined as a shader function), which explains the distortion in the picture at the edges. It has high resolution in the middle, near the player, and lower resolution far from the player. Even though the sun is incoming from an angle, matrix shearing is used to transform height values to normalized values in the Y dimension. Otherwise, height value at upper right would have oversaturated into white and values in the lower left oversaturated into black.


Point lights using tile based rendering

The effect from the point lights do not take into account shadows from objects. This is a shortcoming, but there can be many lamps and the cost to compute shadows would be too high. The fall-off function is a simple linear function, with max intensity at the lamp position, and 0 at a distance depending on the lamp size. The physically more correct function, giving an intensity proportional to 1/r^2, would give infinite intensity at r=0 and would never reach 0.

Each point light is rendered separately. A bounding 2D quad is positioned at the place of the point light. The fragment shader will thus be activated only for pixels that are visible (highlighted below). Some of these pixels will then be affected by the light. The position buffer is used to compute the distance from the point light to the pixel, and the normal buffer is used to find the angle.
The high-lighted square is used for lighting calculation
As the quad is drawn at the position of the point light, it may be that all pixels are culled because they fail in the depth test. This is a great advantage, and will speed up drawing considerably as lamps are frequently hidden behind walls or other objects. There are two adjustments done. The quad is not positioned exactly at the point light, but in front of it. The other issue is when the camera is inside the light sphere, in which case the quad has to be moved away from the player, or it would be drawn behind the camera and culled completely.


Blending is usually a problem with deferred shading. If the blending is done before light effects are applied, it will look bad. In Ephenation, drawing of semi transparent objects is done separately from the opaque objects. But it is done using the FBO, so as to have access to the depth buffer. Because of that, the result is saved in a special blend buffer that is applied in the deferred stage.

Textures used for the opaque objects use the alpha component to indicate either full transparency or full opaqueness. That is handled by the shader, which will discard transparent fragments completely. This will prevent updates of the depth buffer.

Deferred shading

All drawing until now has been done with a Frame Buffer Object as a target. The deferred stage is the one that combines the results from this FBO into a single output, which is the default frame buffer (the screen).

Gamma correction

The colors sent to the screen are clamped in the interval [0,1]. 0 is black, and 1 is as white as you can get. The value can be seen as an energy, where more energy gives more light. However, 0.5 is not half the energy of 1. The reason for this is that the monitor will transform the output with a gamma correction. The correction is approximately C^2.2. The constant 2.2 is called the gamma constant. To get a value half way between black and white, 0.5^0.45=0.73 should be used, to compensate for the non-linear behavior of the monitor.

SRGB input

The exact algorithm is defined by the sRGB format. LCD displays use the sRGB coding automatically. If all bitmaps are in the sRGB format, then the final output will automatically be correct. Or rather, it could be correct, but there are important cases where it is not. As the sRGB is not linear, you can't add two values correctly. For example, using the average between 0 and 1, which is 0.5, would not give the average energy in the final display on the monitor. So if there is pixel color manipulations, the final colors can get wrong or there can be artifacts.
 if (srgb < 0.04045)  
     linear = srgb / 12.92;  
     linear = pow((srgb + 0.055)/1.055, 2.4);  
If this transformation is done on an 8-bit color, the special case of values less than 0.04045 will all be rounded to 0 or 1 when divided by 12.92.

When you edit a bitmap in an editor, what you see is what you get. That means that the monitor will interpret the colors as being sRGB. OpenGL has built-in support for conversion from the sRGB format. If the format is specified for textures, OpenGL will automatically convert to linear color space. if sRGB is not specified, the transform has to be done manually in the shader. In Ephenation, bitmaps are specified as sRGB to get the automatic transformation, which means the equation above isn't needed.

SRGB output

In the last phase, when pixels are written to the default frame buffer, the value has to be manually transformed to non-linear (sRGB). There is actually automatic support for this in OpenGL if using a Frame Buffer Object with a texture target object in format sRGB. However, the final outputting is usually to the default frame buffer, which have no such automatic transformation. Regardless, it may be a good idea to implement it in the shader, to make it possible to calibrate and control by the end user.
if (linear <= 0.0031308)
 CRT = linear * 12.92;
 CRT = 1.055 * pow(linear, 1/2.4) - 0.055;


Colors are limited to the range [0,1], but there is no such limitation in the real world. The energy of a color is unlimited. But the limitation is needed, as it represents the maximum intensity of the display hardware. When doing light manipulations, it is easy to get values bigger than 1. One way would be to start with low values, and then make sure there can never be a situation where the final value will saturate. However, that could mean that the normal case will turn out to be too dark.

HDR is short for High Dynamic Range. It is basically images where the dynamic range (difference between the lowest and highest intensity) is bigger than can be shown on the display. Eventually, when the image is going to be shown, some mechanism is required to compress the range to something that will not saturate. A simple way would be to down scale the values, but then the lower ranges would again disappear. There are various techniques to prevent this from happening. In the case of gaming, we don't want the high values to saturate too much, and so a more simple algorithm can be used to compress the range.

Tone mapping

There are several ways to do tone mapping, and in Ephenation the Reinhard transformation is used. This will transform almost all values to the range [0,1]. If it is done separately for each color channel, it can give color shifts for colors if one of the components R, G or B is much bigger than the others. Because of that, the transformation is done on the luminance. This can be computed with the following in the deferred shader:
float lightIntensity;
vec3 diffuse;
vec3 rgb = diffuse * lightIntensity;
float L = 0.2126 * rgb.r + 0.7152 * rgb.g + 0.0722 * rgb.b;
float Lfact = (1+L/Lwhite2)/(1+L);
vec3 output = rgb * Lfact;
'rgb' is the color when lighting has been applied. This is the value that need to be adjusted by tone mapping.

One simple solution, that is sometimes used, is to transform each channel with x/(1+x). That would take away much of the white from the picture, as almost no values will get close to 1. The solution used above, is to compute the luminance L of the pixel. This luminance value is then transformed with tone mapping, and used to scale the RGB value. The idea is to set Lwhite to an intensity that shall be interpreted as white. Suppose we set Lwhite to 3.0. The tone mapping filter will transform everything below 3.0 to the range [0,1], and values above 3.0 will saturate.
The formula using white compensation will saturate at 3.0
Note how the transformation x/(1+x) will asymptotically approach 1. Without tone mapping, everything above 1.0 would have saturated, but now it is as 3.0.
Tone mapping disabled

Tone mapping enabled
The transformation using Lwhite can also be applied to each channel individually. That will give a little different results with many lights, as the final result would be almost near white. Which variant is best is not defined; it depends from application to application.

Tone mapping enabled per channel

For reference, diffuse data with no lighting applied

Monster selection

After the deferred shader, data from the G buffer can still be used. In Ephenation, there is a red marker around selected monsters. This is a color added to pixels that are inside a limited distance to the monster.
Red selection marker
The same technique is also used to make simple shadows if the dynamic shadow map is disabled.

21 juni 2012

Doing animations in OpenGL

This document explains how to do animations in OpenGL based on skeletal animation. The basic idea is to define the skin mesh once, and then only update the bones position. I will not show how to create the buffers (VBO) and uniforms, which is readily available elsewhere. Instead, I concentrate on how to interpret and prepare the animation data. In principle, animation is implemented in four steps:
1.       Use a tool, e.g. Blender, to create an animation.
2.       Load the data in the initialization phase of the application, transform and pre compute as much as possible.
3.       For every frame to be drawn, use interpolation to compute a transformation matrix for each joint.
4.       Let the shader do the final transformation of each vertex (skin section), as depending on the joint matrices.
Step one is only needed once, of course. Step two can conveniently be done by a custom conversion tool, and saved in a special file. Blender was used for creating the models. There are lots of tutorials about this, so I am not going to go into many details. For some background to animation and skinning, see Animation in video games by Jason Gregory.

Any comments are welcome, I will try to correct or improve.

Model file format

I use Assimp to load the model files. There are many possible formats that can be used, and it is not obvious which one is best. In a commercial project, consider using a custom format. This has the advantage that loading will be quick, and the files will be harder to copy. Also, the main application doesn't need to know about file formats of 3D modeling applications.

The easiest format is probably the .obj format, but it does not support animations and bones. I use the Collada (.dae) file format. Make sure not to use the pre transformation flag for vertices (aiProcess_PreTransformVertices), as this will remove the bones data.


This is a list of definitions used below:
bone matrix: The resulting skinning transformation matrix you'd upload to the vertex shader.
offset matrix: The matrix transforming from mesh space to bone space, also called the inverse bind pose transform in Assimp.
node matrix: a node's transformation matrix in relation to its parent node.
bind pose and rest position: The original position of a model.
frame: One complete picture rendering.
The words "bone" and "joint" are used now and then, but really mean the same in this text.

Bind pose and current pose

The bind pose is the rest position; the position where no animation has been applied. This is the position the meshes get when the influence of the bones are ignored. The current pose is one frame in an animation. The bones information in the node tree (pointed at from the aiNode) defines the bind pose of the skeleton.

Assimp data structure

Arrows represents pointers, and the blue dashed arrows represent references by name or index.

Mesh dependency of bones

In rest position, each mesh has a transformation matrix that is relative to its parent (as defined by the node tree aiNode). However, when doing animations, there is instead a list of bones that the mesh depends on. The offset matrix (in aiBone) defines how to get the mesh position in relation to these bones. When the animation bones are in rest position, the resulting transformation matrix will be the same as the mesh transformation matrix (in aiNode). If there is more than one mesh, a bone may be used more than once, with different offset matrices and weight tables for each mesh.

Every vertex in a mesh can depend on several joints. This is defined by the aiBone list in aiMesh. This list is a sub set of all bones, restricted to those that have an effect on the mesh. To make the shader program efficient, it has to have a reasonable limit on the number of joints. In my case, I want to limit this to at most three joints. Assimp has support for this, using the flag aiProcess_LimitBoneWeights with

importer.SetPropertyInteger(AI_CONFIG_PP_LBW_MAX_WEIGHTS, 3);

Key frames and interpolation

An animation is like a movie; there are a number of frames every second. Using 24 frames every second would require a lot of data. Instead, only key frames are used, and interpolation in between. The key frames can be defined at irregular time intervals. A movement of a bone consists of three parts: scaling, rotation, and translation. The scaling is usually not needed, but rotation and translation are. Interpolating translation movement is trivial, as the translation is linear. To convert from a key frame data to a transformation matrix, I use the code as follows. Scaling, rotation and translation, are values copied from the scaling key, quaternion key, and position key, respectively, and coded as the corresponding glm type.

aiVector3D ScalingKey;
aiQuaternion RotationKey;
aiVector3D PositionKey;

glm::vec3 s(
ScalingKey.x, ScalingKey.y, ScalingKey.z);
glm::quat q(
RotationKey.w, RotationKey.x, RotationKey.y, RotationKey.z);
glm::vec3 t(
PositionKey.x, PositionKey.y, PositionKey.z);

glm::mat4 S = glm::scale(glm::mat4(1), s);
glm::mat4 R = glm::mat4_cast(q);
glm::mat4 T = glm::translate(glm::mat4(1), t);

glm::mat4 M = T * R * S;

Rotation is coded as quaternions. That means that interpolation is efficient and of high precision. However, OpenGL uses 4x4 matrices for transformations. Interpolation with matrices (also called linear blend skinning) work well with scaling and translation, but not for rotation. For example, interpolating a rotation that is only given with two points 180 degrees from each other will cut a straight line through the origo instead of following the arc. The interpolation of rotation need to be done before the quaternion is converted to a matrix to avoid this problem.

There is a performance problem with using interpolation on quaternions between key frames. The interpolation itself is very quick, but the problem is the bone parent/child dependency. The interpolation has to be done for every bone. When combined with the scaling and translation, it will generate a new transformation matrix that is relative to the parent node. To get the final transformation matrix (the bone matrix), the result has to be multiplied with the parent node, etc., all the way up to the top node. Finally, the offset matrix has to be applied to each of them. This is a lot of work to do on the CPU for every frame that is going to be drawn. If interpolation is done only on transformation matrices, it is possible to pre calculate each matrix (from aiNodeAnim), including the offset matrix. It is a simplification I am using, which adds the requirement on the models to have a sufficient number of key frames when describing rotations.

Animation preparation

For a frame in an animation sequence, the bone (and mesh) positions defined in the node tree (aiNode) are not used. However, the information about parent/child relations is still needed. Instead, new positions are defined by aiNodeAnim. For every bone (called channel in aiAnimation), there are a couple of key frames. Problem is, this bone depends on the parent bone. That is, a bone defined in aiNodeAnim has a position defined relative to the parent node. As every bone can have different number of key frames, at independent times, a bone position may depend on a parent bone that does not have a defined position for the same key frame. To simplify, it was decided that all bones shall use the same number of key frames, at the same times.
When exporting animation from Blender, set the model is in rest position. Otherwise, the mesh offset matrices in the node tree (aiNode) will be set to the current bone position, instead for the rest position of the bone. You will want to toggle this mode back when working with the animations. It doesn't change the result of the animation, but it helps to debug if you want to compare to the rest position.
Rest position

Blender and bones

Blender has the 'z' axis pointing upward. Bones in Blender have they have their own coordinate system, with 'y' is pointing in the direction of the bone. That means, when an upright bone is added as seen from the ‘z’ axis of Blender, that the bone will have the local coordinate system where 'y' is up. This corresponds to a rotation of -pi/2 on the 'x' axis to get to the Blender space. That means that a rotation transformation is needed when using bones for animations. This is done automatically, and created in the export file from Blender. A typical result is a transformation matrix:

1  0  0  0
0  0  1  0
0 -1  0  0
0  0  0  1

This matrix will set the y value to the z value, and the z value to -y. It is possible to enable the display of the bone's local coordinate system in Blender in the Armature tab, "Axis" checkbox. These rotations, and counter rotations, unfortunately make it a little harder to debug and understand the matrix transformations.

Notice that OpenGL doesn't have the same coordinate system ('z' is by default pointing out of the screen) as Blender, which means that you eventually will have to make a model rotation of your own. If you don't, your models will lay down on the side.

Matrix multiplication

Exporting to Collada format from Blender usually gives a node tree (aiNode) as follows:


Mesh matrices are relative to the Scene, and has to be computed just like the bones. If that isn't done, all meshes will be drawn over each other, at the same position.

Each node inaiNodeAnim has a matrix that transforms to the parent node. To get the final transformation matrix of Bone2, a matrix multiplication is needed: Scene*Armature*Bone1*Bone2. This is true for the bind pose, as well as for the animations of bones. But when computing animation matrices, data from aiNodeAnim is used and replace the data from aiNode. When testing that animation works, start with defining an animation at the same rotation, location and scaling as the bind pose. That would give bone replacement matrices that are the same as the originally defined in aiNode.

The above matrix multiplication gives the final matrices for each bone. But that can't be used to transform the mesh vertices yet, as it will give the animated locations of the bones. The mesh absolute rest position is Scene*Mesh. Instead of using the mesh transformation matrix from the node tree, a new mesh matrix is computed based on the bones and an offset. There is a matrix that is meant for exactly that, and it is the offset matrix in aiBone. The new mesh matrix is Scene*Armature*Bone1*Bone2*Offs. This is the bone matrix that shall be sent to the shader.

Animation shader

This is the animation vertex shader, with functions irrelevant to animation removed.

uniform mat4 projectionMatrix;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
uniform mat4 bonesMatrix[64];
in vec4 vertex;
in vec3 weights;
in vec3 joints;
void main(void){
  mat4 animationMatrix =
    weights[0] * bonesMatrix[int(joints[0])] +
    weights[1] * bonesMatrix[int(joints[1])] +
    weights[2] * bonesMatrix[int(joints[2])];
  gl_Position = projectionMatrix*viewMatrix*modelMatrix*animationMatrix*vertex;

bonesMatrix: Up to 64 joints can be used in a model. It is a uniform, as the same list of bones is used for all vertices.
vertex: This is a vertex from the mesh that is going to be animated by 0 to 3 bones.
joints: The index of three joints for the current vertex.
weights: The weights to be used for the three joints. There is one set of weights for each vertex.


To debug the application, you can do as follows
  • Change the shader so as to use the identity matrix instead of bones matrix. That should draw the mesh in bind pose.
  • Do the same thing, but use bone indices to make a color in the fragment shader. That way, you can verify that the right bones are selected by the indices.
  • Instead, use weight information to make a color, that way you can test that the weights are correctly transferred.
To help debug an animation application, there are tools where matrix multiplication can easily be tested. I use Octave for this.

Column major and row major

The expressions column major and row major denotes how a matrix is stored in memory. OpenGL and glm use column major, DirectX and Assimp use row major. glm is the math library used in the Ephenation project. This isn't much of a problem, except when a conversion from one to another is needed. The most effective conversion would have been to simply copy 16 consecutive floats for a 4x4 matrix when converting from Assimp aiMatrix4x4 to glm::mat4, but it won't work because of different layouts in memory. I used the following:

void CopyaiMat(const aiMatrix4x4 *from, glm::mat4 &to) {
to[0][0] = from->a1; to[1][0] = from->a2;
to[2][0] = from->a3; to[3][0] = from->a4;
to[0][1] = from->b1; to[1][1] = from->b2;
to[2][1] = from->b3; to[3][1] = from->b4;
to[0][2] = from->c1; to[1][2] = from->c2;
to[2][2] = from->c3; to[3][2] = from->c4;
to[0][3] = from->d1; to[1][3] = from->d2;
to[2][3] = from->d3; to[3][3] = from->d4;

Edit history

2012-06-21: Added suggestions on how to debug.
2012-06-29: Added information about creating transformation matrix from assimp key frame.
2012-07-25: Correction of "offset matrix" definition. Correction of matrix order in key frames interpolation, improved example and improved explanation of the LBS problem. Clarification of Blender bone orientations and matrix multiplications. Thanks to Dark Helmet for pointing these out!

13 maj 2012

Making caves from simplex noise

In Ephenation, we want underground caves. The requirements on these caves, and their construction, are:

  1. Any sub underground region shall be possible to create without knowledge of neighbour regions.
  2. The caves shall be long and winding.
  3. They shall split and join randomly, sometimes ending in a dead end.
  4. Most of them shall be of a size to allow a player to pass through.
  5. The algorithm shall be based on 3D simplex noise.
The description below is not really depending on OpenGL. Anyway, path finding algorithms are out of the question. The first problem is the simplex noise. I use simplex algorithms defined by Stefan Gustavsson, normalized to the interval 0 to 1. Using a 3D simplex noise produces a density function. The the underground is created as empty space where this density is below a certain threshold, and you will get some kind of caves. But the simplex noise is spherical in nature, and not at all long and winding.

To demonstrate the result, I show pictures of inverted caves. That is, ground where the space should be, and vice versa. This makes it easier to visualize.
density > 0.85
These caves are not very nice. They are too round, and most of them are not connected to each other. One reason for this is the limit set on the density. With a lower density limit, the caves (that is the floating blobs in the picture) will grow, and start to connect.
density > 0.7
This is better. But the caves are starting to dominate the world. That is, there are caves almost everywhere. And they are very wide and spacey, with no feeling of a cramped cave. The question then is if another algorithm than simplex noise should be used.

There is a way to continue, based on this. The principle is that an intersection between two planes is a line. If the planes have a certain thickness, then the line will get a height and width. Thus, the next step is to change the above into curved planes instead of massive objects. An easy way to do this is to have the condition "make stone if density > 0.7 and less than 0.8". That will make most of them hollow. The inside will have no opening to the outside, making it difficult to visualize. But using the Ephenation X-ray view, it will look as follows:
density > 0.7 && density < 0.8
This is now curved planes, sometimes looping around into spheres. If used inverted as caves, you would run around inside these walls, which can be adjusted to an appropriate size. But they are still rather unnatural caves. The trick is to make two such worlds, based on different random seed. That will make two worlds, each looking a little like a bottle full of soap bubbles with thick membranes. Now create a third world as stone, but with the condition for every coordinate to be air if both the first and second world is air. That will be an intersection, looking as follows.
dens1 > 0.7 && dens1 < 0.8 && dens2 > 0.7 && dens2 < 0.8
It is easy to adjust how long the caves shall be. In my example, I am using the interval 0.7 to 0.8. Changing this to 0.45 to 0.55 increases the chance to make tunnels, while still remaining of the approximately same size, and gives the following, based on the same view.
dens1 > 0.45 && dens1 < 0.55 && dens2 > 0.45 && dens2 < 0.55
I should mention that I scale the y argument (the height) to the simplex function a factor of 2 compared to the x and z. That way, the caves get more elongated in horizontal level.

15 januari 2012

Measuring graphics performance

If you want to measure the render time, it doesn't work very well with a standard OS timer function. The reason for this is that OpenGL will do some of the work in the background, which means your timer function can return a value close to zero. There is support in OpenGL 3.3 to request the actual render time, using queries. This is done in a couple of steps:
  1. Request OpenGL to begin the query.
  2. Do the Draw operation
  3. Request OpenGL to stop the query.
  4. Read out the result of the query.
The problem is that you obviously can't read the result until the drawing is done. And as already mentioned, the actual drawing may be done in the background and still not be complete when you ask for the result. That means that OpenGL will have to wait until the drawing is actually done, and then return the result. This can severely degrade the performance. It could be okay if you only do this during development, but it will screw up the timing of other functions, and be less helpful.

The result of a query is available until you start the next query on the same query ID. As long as the result isn't requested too early, the pipeline will not be disturbed. The trick I am using is to read out the result the frame after, instead of in the current frame. The draw back is that the result will be one frame old, which is not a problem for doing statistics. That is why, in the pseudo code below, I read out the result first, and then request a new query to be set up.

GLuint queries[3]; // The unique query id
GLuint queryResults[3]; // Save the time, in nanoseconds

void Init() {
    glGenQueries(3, queries);

The main loop is as follows:

bool firstFrame = true;
while(1) {
    if (!firstFrame)
        glGetQueryObjectuiv(queries[0], GL_QUERY_RESULT, &queryResults[0]);
    glBeginQuery(GL_TIME_ELAPSED, queries[0]);

    if (!firstFrame)
        glGetQueryObjectuiv(queries[1], GL_QUERY_RESULT, &queryResults[1]);
    glBeginQuery(GL_TIME_ELAPSED, queries[1]);

    if (!firstFrame)
        glGetQueryObjectuiv(queries[2], GL_QUERY_RESULT, &queryResults[2]);
    glBeginQuery(GL_TIME_ELAPSED, queries[2]);

    printf("Terrain: %.2f ms, Transparent: %.2f ms, Monsters: %2.f ms\n",
        queryResult[0]*0.000001, queryResult[1]*0.000001, queryResult[2]*0.000001);
    firstFrame = false;

C++ Implementation

For a C++ class automating the measurement, see Ephenation TimeMeasure header file and implementation file.


If you have a graphics card from AMD, there is a tool available that will give very detailed timing reports: GPUPerfAPI.

Revision history

2012-06-13 Added reference to AMD tool.
2013-02-07 Added reference to class implementation.

8 januari 2012

Fog sphere implemented in shader

Fog effects are commonly used at the far horizon (far cut-off plane of the frustum). But local fog effects can also be used for atmosphere. This article is about using fogs defined by a centre and a radius, and how to implement that in the fragment shader. It may seem that fogs are similar to lamps, but there are important differences. Lamps will have a local effect on the near vicinity, while fog will change the view of every ray that pass through the fog cloud. That means different parts of the scene will change, depending on where the camera is.

I am using the fog effect as a transparent object with varying alpha, where the alpha is a function of the amount of fog that a ray pass through. The amount of fog thus depends on the entry point of the ray into the sphere and the exit point, which gives the total inside distance. To simplify, it is assumed that the density is the same everywhere in the sphere. There are 4 parameters needed: the position of the camera V, the position of the pixel that shall be transformed P, the centre of the fog sphere C and the radius of the sphere, r. All coordinates are in the world model, not screen coordinates. For the mathematical background, see line-sphere intersection in Wikipedia. The task is to find the distance that a ray is inside the sphere, and use this to compute an alpha for fog blending.

Using a normalized vector l, for the line from the camera V to the pixel P, the distance from the camera to the two intersections are:

If the value inside the square root is negative, then there is no solution; the line is outside of the sphere, and no fog effects shall be applied.

There are 4 cases that need to be considered:

  1. Camera and pixel are both inside the sphere.
  2. The camera is outside, but the pixel is inside.
  3. The camera is inside, but the pixel is outside.
  4. Both camera and pixel are outside of the sphere.
For the first case, it is trivial to compute the fog covered distance from camera to pixel: "distance(C,P)".

For the last case, with both camera and pixel are outside of the sphere, the distance will be the difference between the two intersections. This is the same as the double value of the square root. There are two non obvious exceptions that need to be taken care of. If the pixel is on the same side of the sphere as the camera, there shall be no fog effect. That means that the fog, for the given pixel, is occluded. The other special case is when you turn around. There would again be a fog cloud if you don't add a condition for it (l·C being negative).

For the two other cases, there is a point inside the sphere, and a distance to one intersection with the sphere. The entry or exit point E can be found by multiplying the unit vector l with the near or the far value value of d, and adding this to the camera position V. Given E, the effective distance can easily be computed to either P or V. The final fragment shader function looks as follows:

// r: Fog sphere radius
// V: Camera position
// C: Fog sphere centre
// P: Pixel position
// Return alpha to be used for the fog blending.
float fog(float r, vec4 V, vec4 C, vec4 P) {
    float dist = 0; // The distance of the ray inside the fog sphere
    float cameraToPixelDist = distance(V, P);
    float cameraToFogDist = distance(V, C);
    float pixelToFogDist = distance(P, C);
    if (cameraToFogDist < r && pixelToFogDist < r) {
       dist = cameraToPixelDist; // Camera and pixel completely inside fog
    } else {
        vec3 l = normalize(P-V);
        float ldotc = dot(l,C-V);
        float tmp = ldotc*ldotc - cameraToFogDist*cameraToFogDist + radius*radius;
        if (cameraToFogDist > r && pixelToFogDist > r && ldotc > 0 && tmp > 0) {
            // Both camera and pixel outside the fog. The fog is in front of
            // the camera, and the ray is going through the fog.
            float sqrttmp = sqrt(tmp);
            vec3 entrance = camera + l*(ldotc-sqrttmp);
            if (cameraToPixelDist > distance(V, entrance)) dist = sqrttmp*2;
        } else if (cameraToFogDist > r && pixelToFogDist < r) {
            // Outside of fog, looking at pixel inside. Thus tmp>0.
            vec3 entrance = camera + l*(ldotc-sqrt(tmp));
            dist = distance(entrance, P);
        } else if (cameraToFogDist < r && pixelToFogDist > r) {
            // Camera inside fog, looking at pixel on the outside
            vec3 exit = camera + l*(ldotc+sqrt(tmp));
            dist = distance(exit, V);
    // Maximum value of 'dist' will be the diameter of the sphere.
    return dist/(radius*2);

A test of using a fog sphere. It is clear that rays going through a lot of fog has a bigger fog effect.

Another example, using two fog spheres under ground. The colour of the fog need to be adapted, depending on how dark the surroundings are. It isn't shown above, but when there are overlapping fogs I use the most dominant alpha, not an accumulated value.

The GPU performance cost for a fog can get high if there are many of them. If so, it can be an advantage to use a deferred shader, where only pixels are fog compensated that will be shown.

7 januari 2012

Setting up a deferred shader

See also the part 2 about deferred rendering.

The idea with a deferred shader is to use two (or more) shader stages. The first stage will render to internal buffers, but with typically more information than is usually shown on screen. The second stage will use the internal buffers to create the final screen image.

Notice the difference between deferred shading and deferred lighting. The case of deferred lighting only do the lighting in the second (deferred) stage. Information about the geometry is not saved, and so need to be rendered again. It can still be efficient, as the depth buffer is reused.

If there are a lot of effects that are added, like lighting, and other pixel transformations, then it may be a disadvantage to do this in a single render stage (forward renderer). The reason is that a lot of GPU processing power can be used for computing effects of pixels that are thrown away because they were found to be occluded. One advantage of using a deferred shader is that all drawn objects will have light effects added from the same algorithms, even if they use separate first stage shaders (as long as the correct data for the second stage are created).

A disadvantage of a deferred shader is that transparent objects are more difficult to handle. One way is to simply draw the transparent objects after the deferred stage. In my case, I draw the transparent objects also in the deferred stage.

In the following, I will show an example on how it can be implemented. I am using one FBO (frame buffer object), one depth buffer as a render buffer and four colour buffers. The FBO is not a buffer on its own. It is a container object, much like vertex array objects. When a FBO is bound, all drawing will go to the attached buffers of the FBO instead of the visible screen. There are two different types of buffers that can be attached; textures and render buffers. The texture buffer is used when the result of the operation shall be used as a texture in another rendering stage. A render buffer, on the other hand, can't be used by a shader. A way to use the result from a render buffer after the draw operation is glReadPixels() or glBlitFramebuffer().

Setting up the FBO
This has to be done again if the screen size changes. As the depth buffer isn't used again after the FBO drawing, it is allocated in a render buffer.

glGenFramebuffers(1, &fboName);
glGenRenderbuffers(1, &fDepthBuffer);

// Bind the depth buffer
glBindRenderbuffer(GL_RENDERBUFFER, fDepthBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT24, width, height);

// Generate and bind the texture for diffuse
glGenTextures(1, &fDiffuseTexture);
glBindTexture(GL_TEXTURE_2D, fDiffuseTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA,

// Generate and bind the texture for positions
glGenTextures(1, &fPositionTexture);
glBindTexture(GL_TEXTURE_2D, fPositionTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT,

// Generate and bind the texture for normals
glGenTextures(1, &fNormalsTexture);
glBindTexture(GL_TEXTURE_2D, fNormalsTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_RGBA, GL_FLOAT,

// Generate and bind the texture for blending data
glGenTextures(1, &fBlendTexture);
glBindTexture(GL_TEXTURE_2D, fBlendTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA,

Now the buffers have been allocated, and have to be attached to the FBO.

// Bind the FBO so that the next operations will be bound to it.
glBindFramebuffer(GL_FRAMEBUFFER , fboName);
// Attach the texture to the FBO
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, fDiffuseTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, fPositionTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, fNormalsTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, fBlendTexture, 0);

GLenum fboStatus = glCheckFramebufferStatus(GL_FRAMEBUFFER);
printf("DeferredLighting::Init: FrameBuffer incomplete: 0x%x\n", fboStatus);
glBindFramebuffer(GL_FRAMEBUFFER , 0);

As can be seen, the colour buffers are texture buffers. They have an initialized size, but no initialized data. The GL_TEXTURE_MIN_FILTER and GL_TEXTURE_MAG_FILTER doesn't really matter, as the final screen will have the same size as the internal buffers. So there will be no magnification or reduction, but it still has to be defined as default for the reduction is GL_NEAREST_MIPMAP_LINEARDefault for magnification is GL_LINEAR, though.

The FBO is bound using glBindFramebuffer. There are three possible targets, GL_DRAW_FRAMEBUFFERGL_READ_FRAMEBUFFER  and GL_FRAMEBUFFER. It is recommended that GL_FRAMEBUFFER is used when the FBO is defined, and that GL_DRAW_FRAMEBUFFER or GL_READ_FRAMEBUFFER are bound when the FBO is used.

Some explanation is needed why I use 4 colour buffers. These buffers will consume many Megabytes of GPU memory, and should be kept to a minimum. However, with modern graphic cards, the problem is smaller. The fDiffuseTexture will contain the colour of the material. As the original textures are of type GL_RGBA, this buffer can as well be GL_RGBA. The fPositionTexture will store the world coordinates of the pixel. For this, we need higher precision (GL_RGBA32F). The coordinates are needed in the deferred shader to compute distances to lamps and other objects. The fNormalsTexture buffer stores the normals. In this case, a limited precision is good enough (GL_RGBA16F). The normals are needed to compute effects of directional light and lamps. Finally, there is also a fBlendTexture buffer. The blending can also be done in a separate render stage after the deferred shader (remember to reuse the depth buffer if that is the case). But I use the blending data for some special effects in the deferred shader.

First stage shader
The first stage vertex shader looks like this:

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform mat4 projectionMatrix;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
in vec3 normal;
in vec2 texCoord;
in vec4 vertex;
in float intensity; // sun light
in float ambientLight;
out vec3 fragmentNormal;
out vec2 fragmentTexCoord;
out float extIntensity;
out float extAmbientLight;
out vec3 position;
void main(void)
   fragmentTexCoord = texCoord;
   fragmentNormal = normalize((modelMatrix*vec4(normal, 0.0)).xyz);
   gl_Position = projectionMatrix * viewMatrix * modelMatrix * vertex;
   position = vec3(modelMatrix * vertex); // Copy position to the fragment shader
   extIntensity = intensity/255.0;        // Scale the intensity from [0..255] to [0..1].
   extAmbientLight = ambientLight/255.0;

To map output from the first fragment shader stage, I do as follows. This has to be done before the shader program is linked.

glBindFragDataLocation(prg, 0, "diffuseOutput");
glBindFragDataLocation(prg, 1, "posOutput");
glBindFragDataLocation(prg, 2, "normOutput");
glBindFragDataLocation(prg, 3, "blendOutput");

The names are the output names of the fragment shader, which looks as follows. A layout command could also have been used, but it is not available in OpenGL 3.0. The shader is executed twice; first for normal materials, and second for transparent materials. The second time will only have the blendOutput enabled. The blending uses pre multiplied alpha, which makes the operation associative. The first stage fragment shader looks as follows. For this example, the same shader is used for opaque objects and transparent objects, but eventually they should be split into two.

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform sampler2D firstTexture;
in vec3 fragmentNormal;
in vec2 fragmentTexCoord;
in vec3 position;       // The model coordinate, as given by the vertex shader
out vec4 diffuseOutput; // layout(location = 0)
out vec4 posOutput;     // layout(location = 1)
out vec4 normOutput;    // layout(location = 2)
out vec4 blendOutput;   // layout(location = 3)
void main(void)
   posOutput.xyz = position;   // Position given by the vertext shader
   normOutput = vec4(fragmentNormal, 0);
   vec4 clr = texture(firstTexture, fragmentTexCoord);
   float alpha = clr.a;
   if (alpha < 0.1)
       discard;   // Optimization that will not change the depth buffer
   blendOutput.rgb = clr.rgb * clr.a; // Pre multiplied alpha
   blendOutput.a = clr.a;
   diffuseOutput = clr;

Deferred stage shader
The vertex shader is very simple. It is used only to draw two triangles covering the whole window. The main work will be done in the fragment shader. The default projection of OpenGL is x and y in the range [-1,+1]. The position information forwarded to the fragment shader has to be in the range [0,1] as it is used to interpolate in the textures. The triangles are defined in the range [0,1], which I transform to the range [-1,+1]. This is a simple operation with no need for a transformation matrix.

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
in vec4 vertex;
out vec2 position;
void main(void)
   gl_Position = vertex*2-1;
   gl_Position.z = 0.0;
   // Copy position to the fragment shader. Only x and y is needed.
   position = vertex.xy;

The fragment shader for the deferred stage looks as follows. Some simplifications have been done to keep the listing short. Other lighting effects are easy to add, e.g. material properties for reflection. The specular glare should not be the same for all materials. Other things that can be added is information about ambient light and sun light, which would also need to be prepared in the first render stage. More texture buffers can be allocated for this, but there are unused space available already in the current buffers (i.e. the alpha channels). The input textures are the ones generated by the FBO.

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform sampler2D diffuseTex; // The color information
uniform sampler2D posTex;     // World position
uniform sampler2D normalTex;  // Normals
uniform sampler2D blendTex;   // A bitmap with colors to blend with.
uniform vec3 camera;          // The coordinate of the camera
in vec2 position;             // The world position
out vec4 fragColor;           // layout(location = 0)
void main(void)
   // Load data, stored in textures, from the first stage rendering.
   vec4 diffuse = texture2D(diffuseTex, position.xy);
   vec4 blend = texture2D(blendTex, position.xy);
   vec4 worldPos = texture2D(posTex, position.xy);
   vec4 normal = texture2D(normalTex, position.xy);
   // Use information about lamp coordinate (not shown here), the pixel
   // coordinate (worldpos.xyz), the normal of this pixel (normal.xyz)
   // to compute a lighting effect.
   // Use this lighting effect to update 'diffuse'
   vec4 preBlend = diffuse * lamp + specularGlare;
   // manual blending, using premultiplied alpha.
   fragColor = blend + preBlend*(1-blend.a);
// Some debug features. Enable any of them to get a visual representation
// of an internal buffer.
// fragColor = (normal+1)/2;
//      fragColor = diffuse;
// fragColor = blend;
// fragColor = worldPos; // Scaling may be needed to range [0,1]
// fragColor = lamp*vec4(1,1,1,1);

Execute the drawing every frame
Now everything has been prepared, and can be used for every frame update. Clear the fbo buffers from the previous frame:

glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fboName);
glDrawBuffers(4, windowBuffClear); // Select all buffers
glClearColor(0.0f, 0.0f, 0.0f, 0.0f); // Set everything to zero.

Execute the first render stage, which will fill out the internal buffers with data:

// Do not produce any blending data on the 4:th render target.
glDrawBuffers(4, windowBuffOpaque);
DrawTheWorld(); // Will also produce depth data in the depth buffer

GLenum windowBuffTransp[] = { GL_NONE, GL_NONE, GL_NONE, GL_COLOR_ATTACHMENT3 };
glDrawBuffers(4, windowBuffTransp); // Only update blending buffer
// Use alpha 1 for source, as the colours are premultiplied by the alpha.
// The depth buffer shall not be updated.
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); // Restore to default
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);

The output from the first render stage is now available in the texture buffers. Execute the second render stage, the deferred shader.

// The depth buffer from stage 1 is not used now as the fbo is disabled.


glBindTexture(GL_TEXTURE_2D, fBlendTexture);

glBindTexture(GL_TEXTURE_2D, fNormalsTexture);

glBindTexture(GL_TEXTURE_2D, fPositionTexture);

glBindTexture(GL_TEXTURE_2D, fDiffuseTexture);



The result

The material colour information.

Positional data.


Final result. In this picture, blending data, lamps, fog effects and ambient light are also used.

Update history

2012-09-13 Added reference to deferred lighting. Clarified some distinction between using GL_FRAMEBUFFER, GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER. Cleaned up the fragment shader of the first stag.
2012-10-26 Add reference to part 2.