15 januari 2012

Measuring graphics performance

If you want to measure the render time, it doesn't work very well with a standard OS timer function. The reason for this is that OpenGL will do some of the work in the background, which means your timer function can return a value close to zero. There is support in OpenGL 3.3 to request the actual render time, using queries. This is done in a couple of steps:
  1. Request OpenGL to begin the query.
  2. Do the Draw operation
  3. Request OpenGL to stop the query.
  4. Read out the result of the query.
The problem is that you obviously can't read the result until the drawing is done. And as already mentioned, the actual drawing may be done in the background and still not be complete when you ask for the result. That means that OpenGL will have to wait until the drawing is actually done, and then return the result. This can severely degrade the performance. It could be okay if you only do this during development, but it will screw up the timing of other functions, and be less helpful.

The result of a query is available until you start the next query on the same query ID. As long as the result isn't requested too early, the pipeline will not be disturbed. The trick I am using is to read out the result the frame after, instead of in the current frame. The draw back is that the result will be one frame old, which is not a problem for doing statistics. That is why, in the pseudo code below, I read out the result first, and then request a new query to be set up.

GLuint queries[3]; // The unique query id
GLuint queryResults[3]; // Save the time, in nanoseconds

void Init() {
    glGenQueries(3, queries);

The main loop is as follows:

bool firstFrame = true;
while(1) {
    if (!firstFrame)
        glGetQueryObjectuiv(queries[0], GL_QUERY_RESULT, &queryResults[0]);
    glBeginQuery(GL_TIME_ELAPSED, queries[0]);

    if (!firstFrame)
        glGetQueryObjectuiv(queries[1], GL_QUERY_RESULT, &queryResults[1]);
    glBeginQuery(GL_TIME_ELAPSED, queries[1]);

    if (!firstFrame)
        glGetQueryObjectuiv(queries[2], GL_QUERY_RESULT, &queryResults[2]);
    glBeginQuery(GL_TIME_ELAPSED, queries[2]);

    printf("Terrain: %.2f ms, Transparent: %.2f ms, Monsters: %2.f ms\n",
        queryResult[0]*0.000001, queryResult[1]*0.000001, queryResult[2]*0.000001);
    firstFrame = false;

C++ Implementation

For a C++ class automating the measurement, see Ephenation TimeMeasure header file and implementation file.


If you have a graphics card from AMD, there is a tool available that will give very detailed timing reports: GPUPerfAPI.

Revision history

2012-06-13 Added reference to AMD tool.
2013-02-07 Added reference to class implementation.

8 januari 2012

Fog sphere implemented in shader

Fog effects are commonly used at the far horizon (far cut-off plane of the frustum). But local fog effects can also be used for atmosphere. This article is about using fogs defined by a centre and a radius, and how to implement that in the fragment shader. It may seem that fogs are similar to lamps, but there are important differences. Lamps will have a local effect on the near vicinity, while fog will change the view of every ray that pass through the fog cloud. That means different parts of the scene will change, depending on where the camera is.

I am using the fog effect as a transparent object with varying alpha, where the alpha is a function of the amount of fog that a ray pass through. The amount of fog thus depends on the entry point of the ray into the sphere and the exit point, which gives the total inside distance. To simplify, it is assumed that the density is the same everywhere in the sphere. There are 4 parameters needed: the position of the camera V, the position of the pixel that shall be transformed P, the centre of the fog sphere C and the radius of the sphere, r. All coordinates are in the world model, not screen coordinates. For the mathematical background, see line-sphere intersection in Wikipedia. The task is to find the distance that a ray is inside the sphere, and use this to compute an alpha for fog blending.

Using a normalized vector l, for the line from the camera V to the pixel P, the distance from the camera to the two intersections are:

If the value inside the square root is negative, then there is no solution; the line is outside of the sphere, and no fog effects shall be applied.

There are 4 cases that need to be considered:

  1. Camera and pixel are both inside the sphere.
  2. The camera is outside, but the pixel is inside.
  3. The camera is inside, but the pixel is outside.
  4. Both camera and pixel are outside of the sphere.
For the first case, it is trivial to compute the fog covered distance from camera to pixel: "distance(C,P)".

For the last case, with both camera and pixel are outside of the sphere, the distance will be the difference between the two intersections. This is the same as the double value of the square root. There are two non obvious exceptions that need to be taken care of. If the pixel is on the same side of the sphere as the camera, there shall be no fog effect. That means that the fog, for the given pixel, is occluded. The other special case is when you turn around. There would again be a fog cloud if you don't add a condition for it (l·C being negative).

For the two other cases, there is a point inside the sphere, and a distance to one intersection with the sphere. The entry or exit point E can be found by multiplying the unit vector l with the near or the far value value of d, and adding this to the camera position V. Given E, the effective distance can easily be computed to either P or V. The final fragment shader function looks as follows:

// r: Fog sphere radius
// V: Camera position
// C: Fog sphere centre
// P: Pixel position
// Return alpha to be used for the fog blending.
float fog(float r, vec4 V, vec4 C, vec4 P) {
    float dist = 0; // The distance of the ray inside the fog sphere
    float cameraToPixelDist = distance(V, P);
    float cameraToFogDist = distance(V, C);
    float pixelToFogDist = distance(P, C);
    if (cameraToFogDist < r && pixelToFogDist < r) {
       dist = cameraToPixelDist; // Camera and pixel completely inside fog
    } else {
        vec3 l = normalize(P-V);
        float ldotc = dot(l,C-V);
        float tmp = ldotc*ldotc - cameraToFogDist*cameraToFogDist + radius*radius;
        if (cameraToFogDist > r && pixelToFogDist > r && ldotc > 0 && tmp > 0) {
            // Both camera and pixel outside the fog. The fog is in front of
            // the camera, and the ray is going through the fog.
            float sqrttmp = sqrt(tmp);
            vec3 entrance = camera + l*(ldotc-sqrttmp);
            if (cameraToPixelDist > distance(V, entrance)) dist = sqrttmp*2;
        } else if (cameraToFogDist > r && pixelToFogDist < r) {
            // Outside of fog, looking at pixel inside. Thus tmp>0.
            vec3 entrance = camera + l*(ldotc-sqrt(tmp));
            dist = distance(entrance, P);
        } else if (cameraToFogDist < r && pixelToFogDist > r) {
            // Camera inside fog, looking at pixel on the outside
            vec3 exit = camera + l*(ldotc+sqrt(tmp));
            dist = distance(exit, V);
    // Maximum value of 'dist' will be the diameter of the sphere.
    return dist/(radius*2);

A test of using a fog sphere. It is clear that rays going through a lot of fog has a bigger fog effect.

Another example, using two fog spheres under ground. The colour of the fog need to be adapted, depending on how dark the surroundings are. It isn't shown above, but when there are overlapping fogs I use the most dominant alpha, not an accumulated value.

The GPU performance cost for a fog can get high if there are many of them. If so, it can be an advantage to use a deferred shader, where only pixels are fog compensated that will be shown.

7 januari 2012

Setting up a deferred shader

See also the part 2 about deferred rendering.

The idea with a deferred shader is to use two (or more) shader stages. The first stage will render to internal buffers, but with typically more information than is usually shown on screen. The second stage will use the internal buffers to create the final screen image.

Notice the difference between deferred shading and deferred lighting. The case of deferred lighting only do the lighting in the second (deferred) stage. Information about the geometry is not saved, and so need to be rendered again. It can still be efficient, as the depth buffer is reused.

If there are a lot of effects that are added, like lighting, and other pixel transformations, then it may be a disadvantage to do this in a single render stage (forward renderer). The reason is that a lot of GPU processing power can be used for computing effects of pixels that are thrown away because they were found to be occluded. One advantage of using a deferred shader is that all drawn objects will have light effects added from the same algorithms, even if they use separate first stage shaders (as long as the correct data for the second stage are created).

A disadvantage of a deferred shader is that transparent objects are more difficult to handle. One way is to simply draw the transparent objects after the deferred stage. In my case, I draw the transparent objects also in the deferred stage.

In the following, I will show an example on how it can be implemented. I am using one FBO (frame buffer object), one depth buffer as a render buffer and four colour buffers. The FBO is not a buffer on its own. It is a container object, much like vertex array objects. When a FBO is bound, all drawing will go to the attached buffers of the FBO instead of the visible screen. There are two different types of buffers that can be attached; textures and render buffers. The texture buffer is used when the result of the operation shall be used as a texture in another rendering stage. A render buffer, on the other hand, can't be used by a shader. A way to use the result from a render buffer after the draw operation is glReadPixels() or glBlitFramebuffer().

Setting up the FBO
This has to be done again if the screen size changes. As the depth buffer isn't used again after the FBO drawing, it is allocated in a render buffer.

glGenFramebuffers(1, &fboName);
glGenRenderbuffers(1, &fDepthBuffer);

// Bind the depth buffer
glBindRenderbuffer(GL_RENDERBUFFER, fDepthBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT24, width, height);

// Generate and bind the texture for diffuse
glGenTextures(1, &fDiffuseTexture);
glBindTexture(GL_TEXTURE_2D, fDiffuseTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA,

// Generate and bind the texture for positions
glGenTextures(1, &fPositionTexture);
glBindTexture(GL_TEXTURE_2D, fPositionTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT,

// Generate and bind the texture for normals
glGenTextures(1, &fNormalsTexture);
glBindTexture(GL_TEXTURE_2D, fNormalsTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_RGBA, GL_FLOAT,

// Generate and bind the texture for blending data
glGenTextures(1, &fBlendTexture);
glBindTexture(GL_TEXTURE_2D, fBlendTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA,

Now the buffers have been allocated, and have to be attached to the FBO.

// Bind the FBO so that the next operations will be bound to it.
glBindFramebuffer(GL_FRAMEBUFFER , fboName);
// Attach the texture to the FBO
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, fDiffuseTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, fPositionTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, fNormalsTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, fBlendTexture, 0);

GLenum fboStatus = glCheckFramebufferStatus(GL_FRAMEBUFFER);
printf("DeferredLighting::Init: FrameBuffer incomplete: 0x%x\n", fboStatus);
glBindFramebuffer(GL_FRAMEBUFFER , 0);

As can be seen, the colour buffers are texture buffers. They have an initialized size, but no initialized data. The GL_TEXTURE_MIN_FILTER and GL_TEXTURE_MAG_FILTER doesn't really matter, as the final screen will have the same size as the internal buffers. So there will be no magnification or reduction, but it still has to be defined as default for the reduction is GL_NEAREST_MIPMAP_LINEARDefault for magnification is GL_LINEAR, though.

The FBO is bound using glBindFramebuffer. There are three possible targets, GL_DRAW_FRAMEBUFFERGL_READ_FRAMEBUFFER  and GL_FRAMEBUFFER. It is recommended that GL_FRAMEBUFFER is used when the FBO is defined, and that GL_DRAW_FRAMEBUFFER or GL_READ_FRAMEBUFFER are bound when the FBO is used.

Some explanation is needed why I use 4 colour buffers. These buffers will consume many Megabytes of GPU memory, and should be kept to a minimum. However, with modern graphic cards, the problem is smaller. The fDiffuseTexture will contain the colour of the material. As the original textures are of type GL_RGBA, this buffer can as well be GL_RGBA. The fPositionTexture will store the world coordinates of the pixel. For this, we need higher precision (GL_RGBA32F). The coordinates are needed in the deferred shader to compute distances to lamps and other objects. The fNormalsTexture buffer stores the normals. In this case, a limited precision is good enough (GL_RGBA16F). The normals are needed to compute effects of directional light and lamps. Finally, there is also a fBlendTexture buffer. The blending can also be done in a separate render stage after the deferred shader (remember to reuse the depth buffer if that is the case). But I use the blending data for some special effects in the deferred shader.

First stage shader
The first stage vertex shader looks like this:

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform mat4 projectionMatrix;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
in vec3 normal;
in vec2 texCoord;
in vec4 vertex;
in float intensity; // sun light
in float ambientLight;
out vec3 fragmentNormal;
out vec2 fragmentTexCoord;
out float extIntensity;
out float extAmbientLight;
out vec3 position;
void main(void)
   fragmentTexCoord = texCoord;
   fragmentNormal = normalize((modelMatrix*vec4(normal, 0.0)).xyz);
   gl_Position = projectionMatrix * viewMatrix * modelMatrix * vertex;
   position = vec3(modelMatrix * vertex); // Copy position to the fragment shader
   extIntensity = intensity/255.0;        // Scale the intensity from [0..255] to [0..1].
   extAmbientLight = ambientLight/255.0;

To map output from the first fragment shader stage, I do as follows. This has to be done before the shader program is linked.

glBindFragDataLocation(prg, 0, "diffuseOutput");
glBindFragDataLocation(prg, 1, "posOutput");
glBindFragDataLocation(prg, 2, "normOutput");
glBindFragDataLocation(prg, 3, "blendOutput");

The names are the output names of the fragment shader, which looks as follows. A layout command could also have been used, but it is not available in OpenGL 3.0. The shader is executed twice; first for normal materials, and second for transparent materials. The second time will only have the blendOutput enabled. The blending uses pre multiplied alpha, which makes the operation associative. The first stage fragment shader looks as follows. For this example, the same shader is used for opaque objects and transparent objects, but eventually they should be split into two.

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform sampler2D firstTexture;
in vec3 fragmentNormal;
in vec2 fragmentTexCoord;
in vec3 position;       // The model coordinate, as given by the vertex shader
out vec4 diffuseOutput; // layout(location = 0)
out vec4 posOutput;     // layout(location = 1)
out vec4 normOutput;    // layout(location = 2)
out vec4 blendOutput;   // layout(location = 3)
void main(void)
   posOutput.xyz = position;   // Position given by the vertext shader
   normOutput = vec4(fragmentNormal, 0);
   vec4 clr = texture(firstTexture, fragmentTexCoord);
   float alpha = clr.a;
   if (alpha < 0.1)
       discard;   // Optimization that will not change the depth buffer
   blendOutput.rgb = clr.rgb * clr.a; // Pre multiplied alpha
   blendOutput.a = clr.a;
   diffuseOutput = clr;

Deferred stage shader
The vertex shader is very simple. It is used only to draw two triangles covering the whole window. The main work will be done in the fragment shader. The default projection of OpenGL is x and y in the range [-1,+1]. The position information forwarded to the fragment shader has to be in the range [0,1] as it is used to interpolate in the textures. The triangles are defined in the range [0,1], which I transform to the range [-1,+1]. This is a simple operation with no need for a transformation matrix.

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
in vec4 vertex;
out vec2 position;
void main(void)
   gl_Position = vertex*2-1;
   gl_Position.z = 0.0;
   // Copy position to the fragment shader. Only x and y is needed.
   position = vertex.xy;

The fragment shader for the deferred stage looks as follows. Some simplifications have been done to keep the listing short. Other lighting effects are easy to add, e.g. material properties for reflection. The specular glare should not be the same for all materials. Other things that can be added is information about ambient light and sun light, which would also need to be prepared in the first render stage. More texture buffers can be allocated for this, but there are unused space available already in the current buffers (i.e. the alpha channels). The input textures are the ones generated by the FBO.

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform sampler2D diffuseTex; // The color information
uniform sampler2D posTex;     // World position
uniform sampler2D normalTex;  // Normals
uniform sampler2D blendTex;   // A bitmap with colors to blend with.
uniform vec3 camera;          // The coordinate of the camera
in vec2 position;             // The world position
out vec4 fragColor;           // layout(location = 0)
void main(void)
   // Load data, stored in textures, from the first stage rendering.
   vec4 diffuse = texture2D(diffuseTex, position.xy);
   vec4 blend = texture2D(blendTex, position.xy);
   vec4 worldPos = texture2D(posTex, position.xy);
   vec4 normal = texture2D(normalTex, position.xy);
   // Use information about lamp coordinate (not shown here), the pixel
   // coordinate (worldpos.xyz), the normal of this pixel (normal.xyz)
   // to compute a lighting effect.
   // Use this lighting effect to update 'diffuse'
   vec4 preBlend = diffuse * lamp + specularGlare;
   // manual blending, using premultiplied alpha.
   fragColor = blend + preBlend*(1-blend.a);
// Some debug features. Enable any of them to get a visual representation
// of an internal buffer.
// fragColor = (normal+1)/2;
//      fragColor = diffuse;
// fragColor = blend;
// fragColor = worldPos; // Scaling may be needed to range [0,1]
// fragColor = lamp*vec4(1,1,1,1);

Execute the drawing every frame
Now everything has been prepared, and can be used for every frame update. Clear the fbo buffers from the previous frame:

glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fboName);
glDrawBuffers(4, windowBuffClear); // Select all buffers
glClearColor(0.0f, 0.0f, 0.0f, 0.0f); // Set everything to zero.

Execute the first render stage, which will fill out the internal buffers with data:

// Do not produce any blending data on the 4:th render target.
glDrawBuffers(4, windowBuffOpaque);
DrawTheWorld(); // Will also produce depth data in the depth buffer

GLenum windowBuffTransp[] = { GL_NONE, GL_NONE, GL_NONE, GL_COLOR_ATTACHMENT3 };
glDrawBuffers(4, windowBuffTransp); // Only update blending buffer
// Use alpha 1 for source, as the colours are premultiplied by the alpha.
// The depth buffer shall not be updated.
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); // Restore to default
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);

The output from the first render stage is now available in the texture buffers. Execute the second render stage, the deferred shader.

// The depth buffer from stage 1 is not used now as the fbo is disabled.


glBindTexture(GL_TEXTURE_2D, fBlendTexture);

glBindTexture(GL_TEXTURE_2D, fNormalsTexture);

glBindTexture(GL_TEXTURE_2D, fPositionTexture);

glBindTexture(GL_TEXTURE_2D, fDiffuseTexture);



The result

The material colour information.

Positional data.


Final result. In this picture, blending data, lamps, fog effects and ambient light are also used.

Update history

2012-09-13 Added reference to deferred lighting. Clarified some distinction between using GL_FRAMEBUFFER, GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER. Cleaned up the fragment shader of the first stag.
2012-10-26 Add reference to part 2.

6 januari 2012

Light saturation and transformation

Lights are coded in the range 0 to 1, which is not realistic. In real life, light intensity is ranging from 0 to the infinity. A maximum of 1 is a necessary trade off, though, as a monitor can only offer a limited amount of light. This may lead to problems in the fragment shader, when light effects are added. Because of this, it is usually easier to remove light, than to add light. That is one of the reasons that some of the early games were dark and gloomy.

One way to avoid this is to transform the colour channel range from [0,1], to the range [0,infinity]. This is one form of HDR, High Dynamic Range. After doing light manipulations in this new range, the colour is transformed back again. This transformation is called Tone mapping. In my case, I am first using a reverse tone mapping to first expand the range: Y/(1-Y). Lighting effects can then be applied, using additive or multiplicative transformations. Afterwords, a normal tone mapping is used: Y/(1+Y) to get the range back to [0,1].

The results of adding 0.2 or multiplying with 1.2 to a colour in the normal range [0,1]:
When instead doing the same to the transformed range, the result, after transforming it back again, is:
There are some interesting differences between these.
  • A colour can no longer be saturated (as it would be on the right side in the first diagram).
  • The colour change is no longer linear. This is more natural in my opinion. If you light a lamp in full sunshine (in the real world), you may not notice the difference.
  • Adding the same constant to R, G and B that differs will produce a result that adds more to the darker channels. This will have the effect of making the reflection more white (or only grey). When used for specular glare, for example, I think it is less interesting to preserve the original colour balance.
I tried this technique in Ephenation. The game contains sun light, lamps, ambient lights, specular glare, and other light effects. It is now possible to calibrate each of them, one at a time, and then simply combine all of them together, without any risk of saturated effects. I no longer have to estimate the worst case to avoid saturation. The pseudo code for the shader looks as follows:

vec4 hdr;
hdr.r = diffuse.r/(1-diffuse.r);
hdr.g = diffuse.g/(1-diffuse.g);
hdr.b = diffuse.b/(1-diffuse.b);
float lampAdd = 0;
for (i=0;i<numLamps;i++) {
    lampAdd = lampAdd + ... // Add light depending on distance to lamp
// The constants on the next line are separately calibrated on their own.
float fact = lampAdd*1.5 + ambient*0.3 + sun*1.5;
// 'fact' can both be smaller than 1 or bigger than one 1. 'refl' is the
// reflection attribute of the material
vec4 step1 = fact*hdr + pow(max(dot(normal.xyz,vHalfVector),0.0), 100) * refl;
fragColour.r = step1.r/(1+step1.r);
fragColour.g = step1.g/(1+step1.g);
fragColour.b = step1.b/(1+step1.b);

A disadvantage of this technique is that colours that are already saturated (equal to 1) can not be mapped to the expanded range, so I had to clamp them down a little.