Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (499)
Games in Android Showcase (118)
games submitted by our members
Games in WIP (567)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Multiple shader passes LWJGL  (Read 2632 times)
0 Members and 1 Guest are viewing this topic.
Offline RobinB

JGO Ninja


Medals: 44
Projects: 1
Exp: 3 years


Spacegame in progress


« Posted 2013-05-14 17:03:15 »

Hello,

Im having a simple question, of wich i can hardly find any info.
How to use multiple shader passes on some rendered data?

Usually i can just bind one shader (including one frag and vert shader), and do the drawing.
But how does this work when i want to use 2 vertex shaders and one fragment shader for in example gaussian blur?
And what should i do if i want do do another pass to post process this blurred data?

So i want to do this:
- one pass horizontal blur
- one pass vertical blur
- one pas post processing

Is it possible do do this at once, or do i need to buffer the scene to a fbo each pass?
Thanks in advance Smiley
Offline davedes
« Reply #1 - Posted 2013-05-14 19:47:41 »

You need to use FBOs. See my tutorials here on the subject:
https://github.com/mattdesl/lwjgl-basics/wiki/FrameBufferObjects
https://github.com/mattdesl/lwjgl-basics/wiki/ShaderLesson5

Offline RobinB

JGO Ninja


Medals: 44
Projects: 1
Exp: 3 years


Spacegame in progress


« Reply #2 - Posted 2013-05-14 20:08:09 »

Nice tutorials, thanks.

It seems really slow to use multiple fbo's just for a little blur Sad.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline pitbuller
« Reply #3 - Posted 2013-05-14 21:16:02 »

Nice tutorials, thanks.

It seems really slow to use multiple fbo's just for a little blur Sad.

Slow in developing time or rendering time? Never assume anything but measure.

My box2dLights use two pass gaussian blur and it run fine even on couple year old android devices.

Reason to use two pass blur vs one pass is that you need only 2N samples instead of N^2. For bigger kernels this saving is really big factor.

http://www.unrealengine.com/files/downloads/Smedberg_Niklas_Bringing_AAA_Graphics.pdf
On that paper they present 6th pass god ray post process effect and that run fine on iPad2.
Offline RobinB

JGO Ninja


Medals: 44
Projects: 1
Exp: 3 years


Spacegame in progress


« Reply #4 - Posted 2013-05-14 21:27:07 »

I ment rendering time, and it does affect some devices.
My laptop with a HD3000 or something needs 1ms to render 1 fbo pass, but maybe thats an exception.

I understand why i need two passes, thats why i created this thread.
Also your tutorial explains this pretty clear =D

Thansks for all of your info, it really helps.
Now i only need to implement some liquid behaviour and then i can show the result Smiley.

Amazing presentation btw, this stuff is informative =D
Offline theagentd
« Reply #5 - Posted 2013-05-14 23:21:37 »

I've never seen a graphics card that renders slower when rendering to an FBO instead of directly to the window. The slow part usually depends more on how many pixels you process and how expensive the fragment shader is. It's mostly independent of what you render to as long as the render target has the same bit depth.

Reason to use two pass blur vs one pass is that you need only 2N samples instead of N^2. For bigger kernels this saving is really big factor.
You can optimize it even further by exploiting bilinear filtering to get a correctly weighted average of two texels per texture sample. You can achieve a 9x9 gaussian blur using only 5+5 texture samples, which means you'll go from 81 samples down to 10. Another trick is that you can do a 3x3 gaussian blur using only 4 texture samples and a single pass.

Myomyomyo.
Offline pitbuller
« Reply #6 - Posted 2013-05-15 08:18:22 »


Reason to use two pass blur vs one pass is that you need only 2N samples instead of N^2. For bigger kernels this saving is really big factor.
You can optimize it even further by exploiting bilinear filtering to get a correctly weighted average of two texels per texture sample. You can achieve a 9x9 gaussian blur using only 5+5 texture samples, which means you'll go from 81 samples down to 10. Another trick is that you can do a 3x3 gaussian blur using only 4 texture samples and a single pass.
Already doing linear sampling trick.
Also I am calculating texture coordinates at vertex shader and passing them as varying to get rid of all "dependent" texture lookups with mobile hardware and some fragment shader math. Last part give over twice the performance compared to traditional approach. Still don't think it would do any good on pc hardware.
Offline Danny02
« Reply #7 - Posted 2013-05-15 09:04:26 »

what to you mean with calculating texcoords in the vertex shader?

just use simple constants in the fragment shader
Offline RobinB

JGO Ninja


Medals: 44
Projects: 1
Exp: 3 years


Spacegame in progress


« Reply #8 - Posted 2013-05-15 11:27:54 »

Noo, calculating texture coordinates every pixel is expensive .
Best way is to precalculate these vars:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
/* HBlurVertexShader.glsl */
attribute vec4 a_position;
attribute vec2 a_texCoord;
 
varying vec2 v_texCoord;
varying vec2 v_blurTexCoords[14];
 
void main()
{
    gl_Position = a_position;
    v_texCoord = a_texCoord;
    v_blurTexCoords[ 0] = v_texCoord + vec2(-0.028, 0.0);
    v_blurTexCoords[ 1] = v_texCoord + vec2(-0.024, 0.0);
    v_blurTexCoords[ 2] = v_texCoord + vec2(-0.020, 0.0);
    v_blurTexCoords[ 3] = v_texCoord + vec2(-0.016, 0.0);
    v_blurTexCoords[ 4] = v_texCoord + vec2(-0.012, 0.0);
    v_blurTexCoords[ 5] = v_texCoord + vec2(-0.008, 0.0);
    v_blurTexCoords[ 6] = v_texCoord + vec2(-0.004, 0.0);
    v_blurTexCoords[ 7] = v_texCoord + vec2( 0.004, 0.0);
    v_blurTexCoords[ 8] = v_texCoord + vec2( 0.008, 0.0);
    v_blurTexCoords[ 9] = v_texCoord + vec2( 0.012, 0.0);
    v_blurTexCoords[10] = v_texCoord + vec2( 0.016, 0.0);
    v_blurTexCoords[11] = v_texCoord + vec2( 0.020, 0.0);
    v_blurTexCoords[12] = v_texCoord + vec2( 0.024, 0.0);
    v_blurTexCoords[13] = v_texCoord + vec2( 0.028, 0.0);
}
Offline theagentd
« Reply #9 - Posted 2013-05-15 13:16:35 »

Noo, calculating texture coordinates every pixel is expensive.

This is false, at least for AMD hardware. The problem is that interpolating vertex attributes for each pixel is done by specialized hardware, and with that many vertex attributes that need to be interpolated you run
into a bottleneck there instead. High end hardware has no problem with this, but low end hardware can hit a huge bottleneck here. I compared these two shaders against each other:

Interpolated texture coordinates: http://www.java-gaming.org/?action=pastebin&id=582
Calculate coordinates per pixel: http://www.java-gaming.org/?action=pastebin&id=583

Using AMD's ShaderAnalyzer I checked the (theoretical) performance of those two shaders. On newer high-end and mid-end cards the performance was the same since they're bottlenecked by the texture fetches, but for all low-end and most older cards performance was much worse for the interpolated one.

Name
Radeon HD 2400
Radeon HD 2600
Radeon HD 2900
Radeon HD 3870
Radeon HD 4550
Radeon HD 4670
Radeon HD 4770
Radeon HD 4870
Radeon HD 4890
Radeon HD 5450
Radeon HD 5670
Radeon HD 5770
Radeon HD 5870
Radeon HD 6450
Radeon HD 6670
Radeon HD 6870
Radeon HD 6970
Throughput(Bi) interpolated
200 MPixels\Sec
200 MPixels\Sec
791 MPixels\Sec
827 MPixels\Sec
300 MPixels\Sec
750 MPixels\Sec
1500 MPixels\Sec
1500 MPixels\Sec
1700 MPixels\Sec
179 MPixels\Sec
1033 MPixels\Sec
2267 MPixels\Sec
2267 MPixels\Sec
828 MPixels\Sec
2560 MPixels\Sec
1680 MPixels\Sec
2816 MPixels\Sec
Throughput(Bi) calculated
160 MPixels\Sec
213 MPixels\Sec
791 MPixels\Sec
827 MPixels\Sec
320 MPixels\Sec
800 MPixels\Sec
1600 MPixels\Sec
2000 MPixels\Sec
2267 MPixels\Sec
306 MPixels\Sec
1033 MPixels\Sec
2267 MPixels\Sec
2267 MPixels\Sec
1412 MPixels\Sec
2560 MPixels\Sec
1680 MPixels\Sec
2816 MPixels\Sec

With the exception of the HD2400, calculating texture coordinates is always equally fast or faster.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 801
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #10 - Posted 2013-05-15 13:45:12 »

Noo, calculating texture coordinates every pixel is expensive.

This is false, at least for AMD hardware.

Please remember that this was a performance trick in the context of mobile GPUs, as described in the link posted earlier in this thread:
Quote
Texture Lookups
● Don’t perform texture lookups in the pixel shader!
   ● Let the “pre-shader” queue them up ahead of time
   ● I.e. avoid dependent texture lookups
● Don’t manipulate texture coordinate with math
  ● Move all math to vertex shader and pass down
● Don't use .zw components for texture coordinates
   ● Will be handled as a dependent texture lookup
   ● Only use .xy and pass other data in .zw

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline theagentd
« Reply #11 - Posted 2013-05-15 15:17:36 »

Ouch, I missed that. I was actually going to note that I read some similar trick for mobile GPUs for blurring, but I decided not to in the end. xd

Myomyomyo.
Offline RobinB

JGO Ninja


Medals: 44
Projects: 1
Exp: 3 years


Spacegame in progress


« Reply #12 - Posted 2013-05-15 16:44:02 »

Ah thanks for the info Smiley.
Interesting to see the table, im crious how this works on other cards.
Guess i should test more stuff before assuming they are true.
Offline pitbuller
« Reply #13 - Posted 2013-05-15 17:06:13 »

Stuffing texcoord math is just special case for some mobile chips notably powerVR. If you don't touch textureVoord in any way fragment shader unit may prefetch texels before shader even run. Also this is just not theoretical performance gain but battle tested method.
Simple example: I switched from shadow2DProjEXT to shadow2DProj and gained 1.5ms rendertime when doing fullscreen shadow mapping pass with iPhone4s today. This was because proj and bias variants always cause "depentant" texture reads.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Pippogeek (39 views)
2014-09-24 16:13:29

Pippogeek (30 views)
2014-09-24 16:12:22

Pippogeek (19 views)
2014-09-24 16:12:06

Grunnt (45 views)
2014-09-23 14:38:19

radar3301 (27 views)
2014-09-21 23:33:17

BurntPizza (63 views)
2014-09-21 02:42:18

BurntPizza (33 views)
2014-09-21 01:30:30

moogie (41 views)
2014-09-21 00:26:15

UprightPath (50 views)
2014-09-20 20:14:06

BurntPizza (54 views)
2014-09-19 03:14:18
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!