Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (476)
Games in Android Showcase (106)
games submitted by our members
Games in WIP (530)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  GLSL tricks  (Read 8114 times)
0 Members and 1 Guest are viewing this topic.
Offline Wiki Duke

?





« Posted 2012-12-07 05:44:28 »

Note: you are watching revision 3 of this wiki entry. ( view plain diff )
Todays GPUs are very powerful but it's important to understand the limitations of the hardware of GPUs. For example, branching in GLSL is very expensive due to the way that the stream processors on GPUs work. In many cases branching causes both branches to be executed and the correct result is then picked afterwards.

A general tip when coding shaders is to use the built-in functions as much as possible. They are always faster than manually doing the calculations.

  • Never manually normalize a vector by calculating the length of it using a square root and dividing it by it. Always use normalize().
  • Don't use branching to clamp values. Use min(), max() and clamp() for that.
  • A very common function is linear blending and there's a function called mix() for it.


Generating random numbers

Generating random numbers on a GPU in a traditional way is impossible since we can't use a global seed (well, we can in OGL4+ using atomic counters, but I wouldn't count on good performance). Random numbers can be useful to introduce noise to counter banding in algorithms like HBAO (randomly rotate the sampling ray) or volumetric lighting (random offsets) to trade banding for noise which is much harder to spot and looks better when blurred. This one-line function is a pretty simple noise function seeded with a 2D position (you can use the screen position or texture coordinates as the seed).

1  
2  
3  
float rand(vec2 co){
    return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}




Dot products

The dot() function is used to calculate the dot-product of two vectors, which is the same as multiplying the vectors component-wise and then adding them together. For a 3D vector, that means that
dot(v1, v2) = v1.x*v2.x + v1.y*v2.y + v1.z*v2.z
. This is a very useful function for doing many things. For example, calculating the distance between two points using Pythagoras' theorem:
1  
2  
3  
4  
5  
6  
7  
8  
vec3 p1;
vec3 p2;

//...

vec3 delta = p1-p2;
float distSqrd = dot(delta, delta); //Distance^2, can be useful for lighting which saves you the square root
float dist = sqrt(distSqrd);


Converting a color to grayscale:
1  
2  
3  
4  
5  
vec3 color;

//...

float grayscale = dot(color, vec3(0.21, 0.71, 0.07));





Shadow mapping

Shadow mapping is basically a software depth test against a shadow map. The shadow map coordinates are interpolated as a vec4, so we need to do a w-divide per pixel, get the shadow map depth at that coordinate and compare it to the pixel's depth. A simple implementation does this:

1  
2  
3  
4  
5  
6  
7  
8  
9  
uniform sampler2D shadowMap;

float shadow(){
    vec3 wDivShadowCoord = shadowCoord.xyz / shadowCoord.w; //z-divide

    float distanceFromLight = texture(shadowMap, wDivShadowCoord.xy).z;
   
    return distanceFromLight < wDivShadowCoord.z ? 0.0 : 1.0;
}


This is not optimal. By using the function called step() we can eliminate the branch by just writing
return step(wDivShadowCoord.z, distanceFromLight);
instead.

Even better, the GPU can do the shadow test for us in hardware with some basic shadow filtering if we use a sampler2DShadow instead of a normal sampler2D. That way we just feed in the xyz w-divided shadow coordinates into it. On the shadow map, set up the following parameter to enable hardware shadow testing:
GL11.glTexParameteri(GL_TEXTURE_2D, GL14.GL_TEXTURE_COMPARE_MODE, GL14.GL_COMPARE_R_TO_TEXTURE);
and change the sampler type to sampler2DShadow. It's also possible to enable GL_LINEAR as the texture filter and get 4-tap PCF bilinear filtering.

There is one final optimization. Not only can the GPU do the shadow test in hardware with filtering, it can also do the w-divide in hardware using
textureProj()
! It can't get better than that!

1  
2  
3  
float shadow(){
    return textureProj(sampler, shadowCoord);
}


We get better performance, better image quality thanks to the PCF filtering AND a simpler shader. However, the first shader is extremely fast anyway, so why optimize it this much? Shadow filtering. To get smoother shadow edges you do lots of shadow tests on nearby pixels in the shadow map, usually 8 to 16 of them. In that case we would've gotten 16 branches, not just one, so eliminating them means a lot here. Using hardware filtering also gives you 4 samples per texture lookup instead of just one, allowing you to sample a bigger area.
This wiki entry has had 8 revisions with contributions from 3 members. (more info)
Offline Roquen
« Reply #1 - Posted 2012-12-07 08:22:40 »

I'll add some more RNGs, but in my option atomic counters is a massive red herring.  The current problem is that integer performance on most folks card is awful.  Once integer performance is good enough you can do the same thing as on the CPU for procedural content...use a hash either for the whole local RNG, or hash to seed a standard generator.
Offline Roquen
« Reply #2 - Posted 2012-12-07 15:29:03 »

OK, added a permutation polynomial RNG and some references (include a value noise in shared source)..I'll additionally add a Weyl generator (probably on the PRNG page, as it's interesting there as well).  Also stuck some alternate verbage on RNG (in itallics) that I'll let the original author do with as they wish.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #3 - Posted 2012-12-07 17:02:12 »

I'll add some more RNGs, but in my option atomic counters is a massive red herring.  The current problem is that integer performance on most folks card is awful.  Once integer performance is good enough you can do the same thing as on the CPU for procedural content...use a hash either for the whole local RNG, or hash to seed a standard generator.
Since atomic counters are only supported by OpenGL 4.2 cards. Are you sure integer performance is bad for that kind of hardware? I'm pretty sure OGL4 hardware can at least get a seed for a random number generator rather quickly. I think the main problem is the atomic operation, not integer performance, but I haven't tried it out yet (my main PC only has OGL 3).

OK, added a permutation polynomial RNG and some references (include a value noise in shared source)..I'll additionally add a Weyl generator (probably on the PRNG page, as it's interesting there as well).  Also stuck some alternate verbage on RNG (in itallics) that I'll let the original author do with as they wish.
I was the original poster, but I think you already knew that. =S Your point on random numbers and integer performance on low end hardware is good. However I'm a bit confused by your example code since it starts with
1  
2  
3  
// bad example
float bar(vec2 x){
    ...

I assume you mean that that exact use case is a bad example, not the emulated hashing part?

Myomyomyo.
Offline Roquen
« Reply #4 - Posted 2012-12-07 17:35:46 »

Quote
Since atomic counters are only supported by OpenGL 4.2 cards. Are you sure integer performance is bad for that kind of hardware? I'm pretty sure OGL4 hardware can at least get a seed for a random number generator rather quickly. I think the main problem is the atomic operation, not integer performance, but I haven't tried it out yet (my main PC only has OGL 3).
I'm not clear at what point that integer performance has become reasonable...since it isn't on any of my cards I haven't worried about it.  I'm assuming it's still pretty wide spread since FlipCode recently had a link to a blog from an engineer at either nvidia or amd about emulation of some integer ops in float.  I'd have some concern about accessing an atomic counter...it seems like it must be a serializing instruction which (again) seems like it must bring a bunch of processors to a halt and therefore should only be used sparingly.  With full performance integer operations however hashing isn't too bad and neither is PRNGs of OK to excellent statistical quality.  Purely in float, they pretty much all blow chunks, with the exception (to my knowledge) of nested-shifted-weyl generators, which is a bit expensive.  (but let's not place undue emphasis on statistical quality)

Quote
However I'm a bit confused by your example code since it starts with (//bad example) ...I assume you mean that that exact use case is a bad example, not the emulated hashing part?
I intended to say "this is a bad example...but hopeful enough that you can use this."  But it is a bad hashing function...but again sufficient for a number of purposes.
Offline theagentd
« Reply #5 - Posted 2012-12-07 18:18:17 »

Well, the simple random function I posted turned out to be good enough for me.

Here's volumetric lighting done by ray-marching with 32 samples.


That banding is horrifying! Adding offset per pixel to the sampling depths trades the banding for much less noticable noise, working much like almost perfect dithering making it much harder to see that only 32 samples are used for this.


I plan on adding a blur to the effect too which will completely hide the noise with a gaussian blur with a radius of just 2 (3 optimized taps, two passes). The 15% performance hit of the good version is not due to the random function per se, but due to the lower cache coherency of the texture samples. Lowering the shadow map resolution reveals that the actual shader arithmetic performance difference is around 985 vs 1000 FPS of the non-offset version at 720p with 32 samples, obviously well worth it considering the gains.

Myomyomyo.
Offline Roquen
« Reply #6 - Posted 2012-12-07 18:22:43 »

Yeah hash/PRNG quality is frequently a red herring.  Speed is a different story.
Offline ra4king

JGO Kernel


Medals: 336
Projects: 2
Exp: 5 years


I'm the King!


« Reply #7 - Posted 2012-12-09 06:03:13 »


Pages: [1]
  ignore  |  Print  
 
 

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

pw (12 views)
2014-07-24 01:59:36

Riven (10 views)
2014-07-23 21:16:32

Riven (11 views)
2014-07-23 21:07:15

Riven (12 views)
2014-07-23 20:56:16

ctomni231 (42 views)
2014-07-18 06:55:21

Zero Volt (38 views)
2014-07-17 23:47:54

danieldean (32 views)
2014-07-17 23:41:23

MustardPeter (34 views)
2014-07-16 23:30:00

Cero (50 views)
2014-07-16 00:42:17

Riven (50 views)
2014-07-14 18:02:53
HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54

HotSpot Options
by Roquen
2014-05-06 15:03:10

Escape Analysis
by Roquen
2014-04-29 22:16:43

Experimental Toys
by Roquen
2014-04-28 13:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!