Note: you are watching revision 6 of this wiki entry.
(
view plain diff
)
Todays GPUs are very powerful but it&#039;s important to understand the limitations of the hardware of GPUs. For example, branching in GLSL is very expensive due to the way that the stream processors on GPUs work. In many cases branching causes both branches to be executed and the correct result is then picked afterwards.
A general tip when coding shaders is to use the builtin functions as much as possible. They are always faster than manually doing the calculations.
[list]
[li]Never manually normalize a vector by calculating the length of it using a square root and dividing it by it. Always use normalize().[/li]
[li]Don&#039;t use branching to clamp values. Use min(), max() and clamp() for that.[/li]
[li]A very common function is linear blending and there&#039;s a function called mix() for it.[/li]
[/list]
Generating random numbers
Generating random numbers on a GPU in a traditional way is impossible since we can&#039;t use a global seed (well, we can in OGL4+ using atomic counters, but I wouldn&#039;t count on good performance). Random numbers can be useful to introduce noise to counter banding in algorithms like HBAO (randomly rotate the sampling ray) or volumetric lighting (random offsets) to trade banding for noise which is much harder to spot and looks better when blurred. This oneline function is a pretty simple noise function seeded with a 2D position (you can use the screen position or texture coordinates as the seed).
[i]Generating random numbers on the GPU presents a couple of challenges. The first is that from a practical standpoint you start with some nonrandom data (say a texture coordinate) which needs to be hashed to give a &quot;random&quot; starting value. The second is that most GPUs currently in use are very slow at integer computations which are invaluable in hashing and generating PRNGs. The results needing to perform very hacky hashing and random number generation entirely in floating point until your low end target has fullspeed integer support.[/i]
[code]
float rand(vec2 co){
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}
[/code]
Permutation polynomials. In use examples: ([url=http://www.javagaming.org/topics/valuenoise2d3djavaglsl/28020/view.html]Local value noise[/url], [url=https://github.com/ashima/webglnoise]gradient and simplex noise[/url])
[code]
// repeated for other type
vec2 mod289(vec2 x) { return x  floor(x * (1.0 / 289.0)) * 289.0; }
vec2 permute(vec2 x) { return mod289(((x*34.0)+1.0)*x); }
vec2 uniform(vec2 x} { return fract(x*1.0/41.0); }
// bad example
float bar(vec2 x)
{
float h, r;
vec2 m = mod289(x); // values must be bound to (289,289) for precision
h = permute(permute(m.x)+m.y); // hash the coordinates together
r = uniform(h); // first random number
...
h = permute(h); // hash the hash
r = uniform(h); // second random number
}
[/code]
Dot products
The dot() function is used to calculate the dotproduct of two vectors, which is the same as multiplying the vectors componentwise and then adding them together. For a 3D vector, that means that [icode]dot(v1, v2) = v1.x*v2.x + v1.y*v2.y + v1.z*v2.z[/icode]. This is a very useful function for doing many things. For example, calculating the distance between two points using Pythagoras&#039; theorem:
[code]
vec3 p1;
vec3 p2;
//...
vec3 delta = p1p2;
float distSqrd = dot(delta, delta); //Distance^2, can be useful for lighting which saves you the square root
float dist = sqrt(distSqrd);
[/code]
Converting a color to grayscale:
[code]
vec3 color;
//...
float grayscale = dot(color, vec3(0.21, 0.71, 0.07));
[/code]
Shadow mapping
Shadow mapping is basically a software depth test against a shadow map. The shadow map coordinates are interpolated as a vec4, so we need to do a wdivide per pixel, get the shadow map depth at that coordinate and compare it to the pixel&#039;s depth. A simple implementation does this:
[code]
uniform sampler2D shadowMap;
float shadow(){
vec3 wDivShadowCoord = shadowCoord.xyz / shadowCoord.w; //zdivide
float distanceFromLight = texture(shadowMap, wDivShadowCoord.xy).z;
return distanceFromLight &lt; wDivShadowCoord.z ? 0.0 : 1.0;
}
[/code]
This is not optimal. By using the function called step() we can eliminate the branch by just writing [icode]return step(wDivShadowCoord.z, distanceFromLight);[/icode] instead.
Even better, the GPU can do the shadow test for us in hardware with some basic shadow filtering if we use a sampler2DShadow instead of a normal sampler2D. That way we just feed in the xyz wdivided shadow coordinates into it. On the shadow map, set up the following parameter to enable hardware shadow testing: [icode]GL11.glTexParameteri(GL_TEXTURE_2D, GL14.GL_TEXTURE_COMPARE_MODE, GL14.GL_COMPARE_R_TO_TEXTURE);[/icode] and change the sampler type to sampler2DShadow. It&#039;s also possible to enable GL_LINEAR as the texture filter and get 4tap PCF bilinear filtering.
There is one final optimization. Not only can the GPU do the shadow test in hardware with filtering, it can also do the wdivide in hardware using [icode]textureProj()[/icode]! It can&#039;t get better than that!
[code]
float shadow(){
return textureProj(sampler, shadowCoord);
}
[/code]
We get better performance, better image quality thanks to the PCF filtering AND a simpler shader. However, the first shader is extremely fast anyway, so why optimize it this much? Shadow filtering. To get smoother shadow edges you do lots of shadow tests on nearby pixels in the shadow map, usually 8 to 16 of them. In that case we would&#039;ve gotten 16 branches, not just one, so eliminating them means a lot here. Using hardware filtering also gives you 4 samples per texture lookup instead of just one, allowing you to sample a bigger area.
[b]GLSL Gotchas[/b]
[list][*]Array declaration is broken on Mac Snow Leopard[url=http://openradar.appspot.com/6121615][1][/url][/list]
Todays GPUs are very powerful but it's important to understand the limitations of the hardware of GPUs. For example, branching in GLSL is very expensive due to the way that the stream processors on GPUs work. In many cases branching causes both branches to be executed and the correct result is then picked afterwards.
A general tip when coding shaders is to use the builtin functions as much as possible. They are always faster than manually doing the calculations.
 Never manually normalize a vector by calculating the length of it using a square root and dividing it by it. Always use normalize().
 Don't use branching to clamp values. Use min(), max() and clamp() for that.
 A very common function is linear blending and there's a function called mix() for it.
Generating random numbers
Generating random numbers on a GPU in a traditional way is impossible since we can't use a global seed (well, we can in OGL4+ using atomic counters, but I wouldn't count on good performance). Random numbers can be useful to introduce noise to counter banding in algorithms like HBAO (randomly rotate the sampling ray) or volumetric lighting (random offsets) to trade banding for noise which is much harder to spot and looks better when blurred. This oneline function is a pretty simple noise function seeded with a 2D position (you can use the screen position or texture coordinates as the seed).
Generating random numbers on the GPU presents a couple of challenges. The first is that from a practical standpoint you start with some nonrandom data (say a texture coordinate) which needs to be hashed to give a "random" starting value. The second is that most GPUs currently in use are very slow at integer computations which are invaluable in hashing and generating PRNGs. The results needing to perform very hacky hashing and random number generation entirely in floating point until your low end target has fullspeed integer support.1 2 3
 float rand(vec2 co){ return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453); } 
Permutation polynomials. In use examples: (
Local value noise,
gradient and simplex noise)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
 vec2 mod289(vec2 x) { return x  floor(x * (1.0 / 289.0)) * 289.0; } vec2 permute(vec2 x) { return mod289(((x*34.0)+1.0)*x); } vec2 uniform(vec2 x} { return fract(x*1.0/41.0); }
float bar(vec2 x) { float h, r; vec2 m = mod289(x);
h = permute(permute(m.x)+m.y); r = uniform(h); ... h = permute(h); r = uniform(h); }

Dot products
The dot() function is used to calculate the dotproduct of two vectors, which is the same as multiplying the vectors componentwise and then adding them together. For a 3D vector, that means that
dot(v1, v2) = v1.x*v2.x + v1.y*v2.y + v1.z*v2.z 
. This is a very useful function for doing many things. For example, calculating the distance between two points using Pythagoras' theorem:
1 2 3 4 5 6 7 8
 vec3 p1; vec3 p2;
vec3 delta = p1p2; float distSqrd = dot(delta, delta); float dist = sqrt(distSqrd); 
Converting a color to grayscale:
1 2 3 4 5
 vec3 color;
float grayscale = dot(color, vec3(0.21, 0.71, 0.07)); 
Shadow mapping
Shadow mapping is basically a software depth test against a shadow map. The shadow map coordinates are interpolated as a vec4, so we need to do a wdivide per pixel, get the shadow map depth at that coordinate and compare it to the pixel's depth. A simple implementation does this:
1 2 3 4 5 6 7 8 9
 uniform sampler2D shadowMap;
float shadow(){ vec3 wDivShadowCoord = shadowCoord.xyz / shadowCoord.w;
float distanceFromLight = texture(shadowMap, wDivShadowCoord.xy).z; return distanceFromLight < wDivShadowCoord.z ? 0.0 : 1.0; } 
This is not optimal. By using the function called step() we can eliminate the branch by just writing
return step(wDivShadowCoord.z, distanceFromLight); 
instead.
Even better, the GPU can do the shadow test for us in hardware with some basic shadow filtering if we use a sampler2DShadow instead of a normal sampler2D. That way we just feed in the xyz wdivided shadow coordinates into it. On the shadow map, set up the following parameter to enable hardware shadow testing:
GL11.glTexParameteri(GL_TEXTURE_2D, GL14.GL_TEXTURE_COMPARE_MODE, GL14.GL_COMPARE_R_TO_TEXTURE); 
and change the sampler type to sampler2DShadow. It's also possible to enable GL_LINEAR as the texture filter and get 4tap PCF bilinear filtering.
There is one final optimization. Not only can the GPU do the shadow test in hardware with filtering, it can also do the wdivide in hardware using
! It can't get better than that!
1 2 3
 float shadow(){ return textureProj(sampler, shadowCoord); } 
We get better performance, better image quality thanks to the PCF filtering AND a simpler shader. However, the first shader is extremely fast anyway, so why optimize it this much? Shadow filtering. To get smoother shadow edges you do lots of shadow tests on nearby pixels in the shadow map, usually 8 to 16 of them. In that case we would've gotten 16 branches, not just one, so eliminating them means a lot here. Using hardware filtering also gives you 4 samples per texture lookup instead of just one, allowing you to sample a bigger area.
GLSL Gotchas Array declaration is broken on Mac Snow Leopard[1]
This wiki entry has had 8 revisions with contributions from 3 members.
(
more info)