Note: you are watching revision 2 of this wiki entry.
(

view plain diff
)

Todays GPUs are very powerful but it&#039;s important to understand the limitations of the hardware of GPUs. For example, branching in GLSL is very expensive due to the way that the stream processors on GPUs work. In many cases branching causes both branches to be executed and the correct result is then picked afterwards.
A general tip when coding shaders is to use the built-in functions as much as possible. They are always faster than manually doing the calculations.
[list]
[li]Never manually normalize a vector by calculating the length of it using a square root and dividing it by it. Always use normalize().[/li]
[li]Don&#039;t use branching to clamp values. Use min(), max() and clamp() for that.[/li]
[li]A very common function is linear blending and there&#039;s a function called mix() for it.[/li]
[/list]
Generating random numbers
Generating random numbers on a GPU in a traditional way is impossible since we can&#039;t use a global seed (well, we can in OGL4+ using atomic counters, but I wouldn&#039;t count on good performance). Random numbers can be useful to introduce noise to counter banding in algorithms like HBAO (randomly rotate the sampling ray) or volumetric lighting (random offsets) to trade banding for noise which is much harder to spot and looks better when blurred. This one-line function is a pretty simple noise function seeded with a 2D position (you can use the ~~pix~~scre~~l~~en position as the seed).
[code]
float rand(vec2 co){
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}
[/code]
Dot products
The dot() function is used to calculate the dot-product of two vectors, which is the same as multiplying the vectors component-wise and then adding them together. For a 3D vector, that means that [icode]dot(v1, v2) = v1.x*v2.x + v1.y*v2.y + v1.z*v2.z[/icode]. This is a very useful function for doing many things. For example, calculating the distance between two points using Pythagoras&#039; theorem:
[code]
vec3 p1;
vec3 p2;
//...
vec3 delta = p1-p2;
float distSqrd = dot(delta, delta); //Distance^2, can be useful for lighting which saves you the square root
float dist = sqrt(distSqrd);
[/code]
Converting a color to grayscale:
[code]
vec3 color;
//...
float grayscale = dot(color, vec3(0.21, 0.71, 0.07));
[/code]
Shadow mapping
Shadow mapping is basically a software depth test against a shadow map. The shadow map coordinates are interpolated as a vec4, so we need to do a w-divide per pixel, get the shadow map depth at that coordinate and compare it to the pixel&#039;s depth. A simple implementation does this:
[code]
uniform sampler2D shadowMap;
float shadow(){
vec3 wDivShadowCoord = shadowCoord.xyz / shadowCoord.w; //z-divide
float distanceFromLight = texture(shadowMap, wDivShadowCoord.xy).z;
return distanceFromLight &lt; wDivShadowCoord.z ? 0.0 : 1.0;
}
[/code]
This is not optimal. By using the function called step() we can eliminate the branch by just writing [icode]return step(wDivShadowCoord.z, distanceFromLight);[/icode] instead.
Even better, the GPU can do the shadow test for us in hardware with some basic shadow filtering if we use a sampler2DShadow instead of a normal sampler2D. That way we just feed in the xyz w-divided shadow coordinates into it. On the shadow map, set up the following parameter to enable hardware shadow testing: [icode]GL11.glTexParameteri(GL_TEXTURE_2D, GL14.GL_TEXTURE_COMPARE_MODE, GL14.GL_COMPARE_R_TO_TEXTURE);[/icode] and change the sampler type to sampler2DShadow. It&#039;s also possible to enable GL_LINEAR as the texture filter and get 4-tap PCF bilinear filtering.
There is one final optimization. Not only can the GPU do the shadow test in hardware with filtering, it can also do the w-divide in hardware using [icode]textureProj()[/icode]! It can&#039;t get better than that!
[code]
float shadow(){
return textureProj(sampler, shadowCoord);
}
[/code]
We get better performance, better image quality thanks to the PCF filtering AND a simpler shader. However, the first shader is extremely fast anyway, so why optimize it this much? Shadow filtering. To get smoother shadow edges you do lots of shadow tests on nearby pixels in the shadow map, usually 8 to 16 of them. In that case we would&#039;ve gotten 16 branches, not just one, so eliminating them means a lot here. Using hardware filtering also gives you 4 samples per texture lookup instead of just one, allowing you to sample a bigger area.

Todays GPUs are very powerful but it's important to understand the limitations of the hardware of GPUs. For example, branching in GLSL is very expensive due to the way that the stream processors on GPUs work. In many cases branching causes both branches to be executed and the correct result is then picked afterwards.

A general tip when coding shaders is to use the built-in functions as much as possible. They are always faster than manually doing the calculations.

- Never manually normalize a vector by calculating the length of it using a square root and dividing it by it. Always use normalize().
- Don't use branching to clamp values. Use min(), max() and clamp() for that.
- A very common function is linear blending and there's a function called mix() for it.

Generating random numbers

Generating random numbers on a GPU in a traditional way is impossible since we can't use a global seed (well, we can in OGL4+ using atomic counters, but I wouldn't count on good performance). Random numbers can be useful to introduce noise to counter banding in algorithms like HBAO (randomly rotate the sampling ray) or volumetric lighting (random offsets) to trade banding for noise which is much harder to spot and looks better when blurred. This one-line function is a pretty simple noise function seeded with a 2D position (you can use the screen position as the seed).

1 2 3
| float rand(vec2 co){ return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453); } |

Dot products

The dot() function is used to calculate the dot-product of two vectors, which is the same as multiplying the vectors component-wise and then adding them together. For a 3D vector, that means that

dot(v1, v2) = v1.x*v2.x + v1.y*v2.y + v1.z*v2.z |

. This is a very useful function for doing many things. For example, calculating the distance between two points using Pythagoras' theorem:

1 2 3 4 5 6 7 8
| vec3 p1; vec3 p2;
vec3 delta = p1-p2; float distSqrd = dot(delta, delta); float dist = sqrt(distSqrd); |

Converting a color to grayscale:

1 2 3 4 5
| vec3 color;
float grayscale = dot(color, vec3(0.21, 0.71, 0.07)); |

Shadow mapping

Shadow mapping is basically a software depth test against a shadow map. The shadow map coordinates are interpolated as a vec4, so we need to do a w-divide per pixel, get the shadow map depth at that coordinate and compare it to the pixel's depth. A simple implementation does this:

1 2 3 4 5 6 7 8 9
| uniform sampler2D shadowMap;
float shadow(){ vec3 wDivShadowCoord = shadowCoord.xyz / shadowCoord.w;
float distanceFromLight = texture(shadowMap, wDivShadowCoord.xy).z; return distanceFromLight < wDivShadowCoord.z ? 0.0 : 1.0; } |

This is not optimal. By using the function called step() we can eliminate the branch by just writing

return step(wDivShadowCoord.z, distanceFromLight); |

instead.

Even better, the GPU can do the shadow test for us in hardware with some basic shadow filtering if we use a sampler2DShadow instead of a normal sampler2D. That way we just feed in the xyz w-divided shadow coordinates into it. On the shadow map, set up the following parameter to enable hardware shadow testing:

GL11.glTexParameteri(GL_TEXTURE_2D, GL14.GL_TEXTURE_COMPARE_MODE, GL14.GL_COMPARE_R_TO_TEXTURE); |

and change the sampler type to sampler2DShadow. It's also possible to enable GL_LINEAR as the texture filter and get 4-tap PCF bilinear filtering.

There is one final optimization. Not only can the GPU do the shadow test in hardware with filtering, it can also do the w-divide in hardware using

! It can't get better than that!

1 2 3
| float shadow(){ return textureProj(sampler, shadowCoord); } |

We get better performance, better image quality thanks to the PCF filtering AND a simpler shader. However, the first shader is extremely fast anyway, so why optimize it this much? Shadow filtering. To get smoother shadow edges you do lots of shadow tests on nearby pixels in the shadow map, usually 8 to 16 of them. In that case we would've gotten 16 branches, not just one, so eliminating them means a lot here. Using hardware filtering also gives you 4 samples per texture lookup instead of just one, allowing you to sample a bigger area.

This wiki entry has had 8 revisions with contributions from 3 members.
(

more info)