Java-Gaming.org Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (739)
Games in Android Showcase (224)
games submitted by our members
Games in WIP (820)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Optimizing Performance  (Read 6670 times)
0 Members and 1 Guest are viewing this topic.
Offline CopyableCougar4
« Posted 2017-06-09 04:53:53 »

Recently I've been using deferred rendering and learning about a lot of post processing effects and applying them. However, the game runs (on a 5 year old laptop) at 60 fps with a somewhat simple scene.

Right now I am using bloom, depth of field, fog, FXAA, HDR, and deferred lighting on a scene with skeletal animation, a skybox, and basic SAT collision detection. Is the current performance to be expected or is it likely that the shaders need to optimized?

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Offline Archive
« Reply #1 - Posted 2017-06-09 05:00:16 »

What I've learned about deferred shading (and what theAgentD told me) is that deferred shading makes the engine run at a more constant framerate. So the low FPS areas are brought up, and the higher FPS areas are brought down. This is compared to forward rendering, of course.

Though, it won't hurt to optimize your shaders :p

Offline CopyableCougar4
« Reply #2 - Posted 2017-06-09 05:08:24 »

Thanks, I just wanted to make sure that I wasn't screwing anything up. I'm probably going to time the shader passes and see if any are taking up a noticeable amount of time per frame.

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline h.pernpeintner

JGO Coder


Medals: 55



« Reply #3 - Posted 2017-06-09 06:48:07 »

Seems that everyone is bugging theagentd with his problems xD

If your engine ... or your rendering runs efficiently depends on many things. For example on the scene complexity, the gbuffer resolution, the instruction count of your post effects etc. My (subjective) feelings are, that if you are talking about a GPU of the kind 540m, rendering at a 720p resolution with animations and lots of post effects, 60 fps is okay. Don't forget to turn off framerate limiters, like vsync, do you? Nonetheless, measuring is the only thing you can do - without measuring, you can't optimize anythin, as you probably know.
Offline theagentd
« Reply #4 - Posted 2017-06-09 09:37:42 »

Your first step should be getting basic GPU profiling working. See this thread: http://www.java-gaming.org/index.php?topic=33135.0. Using that, you can get the exact time your different render passes and postprocessing effects take, which will allow you to see where you should focus your efforts.

Once you've figured out your bottleneck, you can start optimizing it. If you find anything that stands out, I can help you diagnose what's making that particular part slow.

Myomyomyo.
Offline CopyableCougar4
« Reply #5 - Posted 2017-06-10 03:48:25 »

I ran the profiler and got the following results:
Quote
Frame 1063 : 13.555ms
    Geometry : 6.223ms
        Skybox : 0.648ms
        Terrain : 5.439ms
    Lighting : 2.362ms
    Bloom and HDR : 1.113ms
    FXAA : 1.957ms
    Depth of Field : 0.491ms
    Fog : 0.911ms
    Final Render : 0.487ms

It seems the most noticeable slowdowns are for terrain (a 2000x2000 mesh broken into triangles every 10 units) and lighting (that 2ms is one point light and ambient light).

I know one optimization would be using light volumes as opposed to fullscreen passes, but are there any other obvious slowdowns or mistakes in my lighting shader? (The shader code is based on JMonkeyEngine lighting code)
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
#version 330

uniform sampler2D diffuseTexture;
uniform sampler2D normalTexture;
uniform sampler2D depthTexture;

uniform vec3 lightColor;
uniform vec4 lightPos;
uniform vec3 viewPos;
uniform vec4 lightDirPacked;
uniform float lightRadius;
uniform int directional;

uniform mat4 invProjectionMatrix;
uniform mat4 invViewMatrix;
uniform float near;
uniform float far;

in vec3 pass_Position;
in vec2 pass_TextureCoord;

out vec4 out_Color;

const float kPi = 3.14159265;
const float kShininess = 16.0;
const float kEnergyConservation = (8.0 + kShininess) / (8.0 * kPi);

float getAttenuation(float distance) {
   if (distance > lightRadius) {
      return 0;
   }
   float x = distance / lightRadius;
   return 1 / (1 + x * x);
}

vec3 reconstructPosition() {
   vec4 clipSpaceLocation;
   clipSpaceLocation.xy = pass_TextureCoord * 2.0 - 1.0;
   clipSpaceLocation.z = texture2D(depthTexture, pass_TextureCoord).r * 2.0 - 1.0;
   clipSpaceLocation.w = 1.0;
   vec4 homogenousLocation = invViewMatrix * invProjectionMatrix * clipSpaceLocation;
   return homogenousLocation.xyz / homogenousLocation.w;
}

float computeSpecular(vec3 normal, vec3 viewDir, vec3 lightDir, float shininess) {
   vec3 halfwayDir = (viewDir + lightDir) * vec3(0.5);
   return pow(max(dot(halfwayDir, normal), 0.0), shininess);
}

float computeDiffuse(vec3 normal, vec3 viewDir, vec3 lightDir) {
   return max(0.0, dot(normal, lightDir));
}

vec2 computeLighting(vec3 position, vec3 normal, vec3 viewDir, vec4 lightDir, float shininess) {
   float diffuseFactor = computeDiffuse(normal, viewDir, lightDir.xyz);
   float specularFactor = computeSpecular(normal, viewDir, lightDir.xyz, shininess);
   return vec2(diffuseFactor, specularFactor) * vec2(lightDir.w);
}

float computeSpotFalloff(vec4 lightDir, vec3 lightVec) {
   vec3 L = normalize(lightVec);
   vec3 spotDir = normalize(lightDir.xyz);
   float curAngleCos = dot(-L, spotDir);
   float innerAngleCos = floor(lightDir.w) * 0.0001;
   float outerAngleCos = fract(lightDir.w);
   float angle = (curAngleCos - outerAngleCos) / (innerAngleCos - outerAngleCos);
   float falloff = clamp(angle, step(lightDir.w, 0.001), 1.0);
   return pow(clamp(angle, 0.0, 1.0), 4.0);
}

vec4 lightComputeDir(vec3 worldPos, vec4 color, vec4 position, vec4 spotDir) {
   if (directional == 0) {
      return vec4(-position.xyz, 1.0);
   }
   vec3 lightVec = position.xyz - worldPos.xyz;
   vec4 lightDir = vec4(0.0);
   lightDir.xyz = lightVec;
   float dist = length(lightDir.xyz);
   lightDir.w = clamp(1.0 - position.w * dist, 0.0, 1.0);
   lightDir.xyz /= dist;
   if (directional == 2) {
      lightDir.w = computeSpotFalloff(spotDir, lightVec) * lightDir.w;
   }
   return lightDir;
}

float computeOcclusion(vec3 worldPos, vec3 lightPos, vec3 cameraPos) {
   //float distanceToLight = length(lightPos - cameraPos);
   //float distanceToFragment = length(worldPos - cameraPos);
   //return distanceToLight <= distanceToFragment ? 1.0 : 0.0;
   return 1.0;
}

void main(void) {
   vec4 diffuseColor = texture2D(diffuseTexture, pass_TextureCoord);
   if (diffuseColor.a == 0.0) {
      discard;
   }

   vec3 normal = normalize(texture2D(normalTexture, pass_TextureCoord).rgb * 2.0 - 1.0);
   vec3 position = reconstructPosition();

   vec3 viewDir = normalize(viewPos - position);
   vec4 lightDir = lightComputeDir(position, vec4(lightColor, 1.0), lightPos, lightDirPacked);
   vec2 light = computeLighting(position, normal, viewDir, lightDir, 32.0);
   vec4 color = vec4(light.x * diffuseColor.xyz + light.y * vec3(1.0), 1.0);
   out_Color = color * vec4(lightColor, 1.0) * computeOcclusion(position, lightPos.xyz, viewPos);
}

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Offline theagentd
« Reply #6 - Posted 2017-06-10 10:59:42 »

I strongly recommend that you first of all try to optimize the terrain rendering. It's by far the slowest part, so optimizing it should be a priority. You should try to figure out what's making it slow. Either you're drawing too many triangles and/or your vertex shader is too expensive, so processing the geometry is the slow part, or the slow part is processing the pixels. You can diagnose this by changing the resolution you render at. If you halve the resolution (width/2, height/2), does the timing of the terrain rendering stay the same?
> Performance is ~4x faster ---> The slow part is either the fragment shader and/or the writing of the data to the G-buffer textures, so we should take a look at the fragment shader of the terrain to investigate further.
> Performance roughly stays the same ---> The slow part is the sheer number of vertices and/or the vertex shader. Take a look at the vertex shader and consider adding a LOD system to reduce the number of vertices of distant terrain, if you don't already have that.



Concerning your lighting shader...

 - Be careful! Some compilers don't accept automatic int-->float conversion. Line 30 and 33 seem to cause issues on at least some AMD hardware. Make sure those are float literals.

 - I'd recommend having different shaders for different types of lights. If you want, you can inject #defines into the source code to specialize the shader for different lights instead of relying on runtime branching on uniform variables. Although branching is not usually very expensive anymore (especially branching on uniforms as that means all shader invocations will take the same branch), it still forces the GPU to allocate enough registers for the worst case branch for all invocations, which can negatively impact texture read performance.

 - Consider using a signed texture format for your normalTexture (GL_RGB8_SNORM for example). They'll be more accurate and are automatically normalized to (-1, +1) instead of (0, 1), so you won't have to do that conversion yourself.

 - You seem to be doing lighting in world space instead of view space which is more common. Doing it in view space has the advantage of placing the camera at (0, 0, 0), which simplifies some of the math you have.

 - Even if you choose to do the lighting in world space, precompute the inverse projection view matrix and do a single matrix multiply. Currently, around half of the assembly instructions in your shader comes from this single line:
1  
vec4 homogenousLocation = invViewMatrix * invProjectionMatrix * clipSpaceLocation;

This line first calculates a mat4*mat4 operation, which you might recognize requires computing a 4D dot product for every single element in the array. This requires 4 operations per element, so that's 64 operations right there. The resulting matrix is then used to do a mat4*vec4 operation, which is much cheaper; this only requires 4 dot products = 16 operations. That means that changing it to the following code is 60% faster as it avoids the mat4*mat4 operation:
1  
vec4 homogenousLocation = invViewMatrix * (invProjectionMatrix * clipSpaceLocation);

but the fastest will always be
1  
vec4 homogenousLocation = invViewProjectionMatrix * clipSpaceLocation;

which will make the entire shader around 80% faster in total. In other words, doing that instead gets rid of 44% of the instructions in your entire shader. As your lighting shader is ALU-bound (lots of math instructions), you're likely to see a very significant performance increase from that optimization alone.

Together, all the optimizations above (signed normal texture + view space lighting + precomputed matrix) should theoretically yield a 87% increase in performance of the lighting.



In addition:

 - Doing a fullscreen pass for the ambient light is extremely inefficient. That requires you to do an entire fullscreen pass just to add the ambientLight*diffuseColor to the lighting computation. This requires your GPU to read in the entire diffuse texture and blend with the entire lighting buffer, which is going to involve gigabytes of memory moved around and millions of pixels filled just so you can do three math instructions per pixel. I can see that you're doing bloom/HDR right after the lighting. See if you can pack the ambient light calculation into one of those shaders instead. Adding 3 math instructions to a different shader is always going to be faster than doing an entire fullscreen pass.

 - I get the impression that fog could be merged into another shader as well to save the overhead of a fullscreen pass (like the DoF).

 - Your FXAA shader looks a bit expensive for some reason. Are you using a custom one?

 - You shouldn't be doing fullscreen passes for local lights either. There are a number of different techniques for making sure you're not rendering too many unnecessary pixels.
 > Render an actual sphere and only compute lighting for the pixels covered by the sphere (pretty simple to implement).
 > Use the scissor test and the depth bounds test to only process pixels in a rectangular area around the sphere that are within the depth bounds of the sphere (best and fastest, simple but some complicated math to calculate everything).
Consider implementing one of those.


Myomyomyo.
Offline CopyableCougar4
« Reply #7 - Posted 2017-06-12 02:41:00 »

Thanks for the notes. I have already implemented some of them and will post some updated timings when I'm done implementing the rest. But so far the scene renders at 75fps Smiley

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Offline theagentd
« Reply #8 - Posted 2017-06-12 03:28:38 »

Cool, let me know if you have any questions. I'd love to hear about your results when you're ready, too. =P

Myomyomyo.
Offline CopyableCougar4
« Reply #9 - Posted 2017-06-12 04:25:24 »

I've been working on my terrain shader, and I've encountered an interesting situation.

This code runs at 80fps:
1  
2  
3  
4  
5  
6  
7  
8  
9  
void main(void) {
   vec4 blendSample = texture2D(blendMap, pass_TextureCoord);
   vec2 terrainTextureCoord = pass_TextureCoord * 75.0;
   vec4 rSample = texture2D(rTexture, terrainTextureCoord) * blendSample.r;
   vec4 gSample = texture2D(gTexture, terrainTextureCoord) * blendSample.g;
   vec4 bSample = texture2D(bTexture, terrainTextureCoord) * blendSample.b;
   vec4 aSample = texture2D(aTexture, terrainTextureCoord) * (1.0 - (blendSample.r + blendSample.g + blendSample.b));
   out_Color = rSample + gSample + bSample + aSample;
}

While this code runs at 125fps:
1  
2  
3  
4  
5  
6  
7  
8  
9  
void main(void) {
   vec4 blendSample = texture2D(blendMap, pass_TextureCoord);
   vec2 terrainTextureCoord = pass_TextureCoord;
   vec4 rSample = texture2D(rTexture, terrainTextureCoord) * blendSample.r;
   vec4 gSample = texture2D(gTexture, terrainTextureCoord) * blendSample.g;
   vec4 bSample = texture2D(bTexture, terrainTextureCoord) * blendSample.b;
   vec4 aSample = texture2D(aTexture, terrainTextureCoord) * (1.0 - (blendSample.r + blendSample.g + blendSample.b));
   out_Color = rSample + gSample + bSample + aSample;
}

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #10 - Posted 2017-06-12 12:57:48 »

Are you on mobile? I've heard about that being a major issue on mobile, but never on desktop. Mobile likes to prefetch texture data before the shader starts, which isn't possible if you need to run the shader to figure out the texture coordinates.

Myomyomyo.
Offline CopyableCougar4
« Reply #11 - Posted 2017-06-12 18:04:03 »

No I'm testing this on a laptop. Although the laptop is somewhat old.

I tried moving the texture coord calculation to the vertex shader, but that didn't affect the time the shader takes.

Although I'm also curious as to why the timings change for each chunk of terrain rendered. The first chunk takes 42x as long to render as the fourth chunk even though they are all the same size.
Quote
Frame 414 : 7.737ms
    Geometry : 2.094ms
        Skybox : 0.709ms
        Terrain : 1.268ms
            Terrain Chunk 0 : 0.845ms
                Texture Pack : 0.001ms
            Terrain Chunk 1 : 0.2ms
            Terrain Chunk 2 : 0.194ms
            Terrain Chunk 3 : 0.023ms
    Lighting : 1.968ms
    Bloom and HDR : 1.32ms
    FXAA : 1.083ms
    Environment : 0.782ms
    Final Render : 0.48ms

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Offline Riven
Administrator

« JGO Overlord »


Medals: 1313
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #12 - Posted 2017-06-12 19:09:29 »

Although I'm also curious as to why the timings change for each chunk of terrain rendered. The first chunk takes 42x as long to render as the fourth chunk even though they are all the same size
Are they also the same size on the framebuffer?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline CopyableCougar4
« Reply #13 - Posted 2017-06-12 20:17:50 »

They're the same size, with the same distribution of vertices, and the same matrices applied to transform each point. I also don't do any culling or other mesh optimizations.

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Offline Riven
Administrator

« JGO Overlord »


Medals: 1313
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #14 - Posted 2017-06-12 20:37:47 »

I mean that you may be fragment shader bound, hence chunk 0 may be responsible for much more rendered fragments than the other chunks.

What happens to the stats when you look up to the sky, without any pixel showing terrain?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline CopyableCougar4
« Reply #15 - Posted 2017-06-12 21:09:29 »

Yeah, I just checked and it is fragment shader bound. I rotated the camera and chunk 3 became the most expensive chunk.

Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Offline CopyableCougar4
« Reply #16 - Posted 2017-06-15 01:01:45 »

I've been rewriting my lighting shaders, and in the new shader, the diffuse factor
dot(N, L);
only has diffuse light in a semi circle.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
#version 330

uniform mat4 invProjectionMatrix;

uniform sampler2D diffuseTexture;
uniform sampler2D depthTexture;
uniform sampler2D normalTexture;

uniform vec3 lightColor;
uniform vec3 eyePosition;
uniform float radius;

in vec3 pass_LightPos;
in vec2 pass_TextureCoord;
in mat3 pass_NormalMatrix;

out vec4 out_Color;

void main(void) {
   vec4 clipPos = vec4(vec3(pass_TextureCoord, texture2D(depthTexture, pass_TextureCoord).r) * 2.0 - 1.0, 1.0);
   vec4 eyeSpace = invProjectionMatrix * clipPos;
   eyeSpace.xyz /= eyeSpace.w;
   
   vec3 distanceToLight = pass_LightPos - eyeSpace.xyz;
   float distance = length(distanceToLight);
   vec3 lightDir = normalize(distanceToLight);
   vec3 normal = texture2D(normalTexture, pass_TextureCoord).rgb * 2.0 - 1.0;
   vec3 albedo = texture2D(diffuseTexture, pass_TextureCoord).rgb;
   float attenuation = 1.0 - clamp(distance / radius, 0.0, 1.0);
   
   float diffuseFactor = max(dot(normal, lightDir), 0.0);
   if (diffuseFactor == 0) {
      discard;
   }
   vec3 diffuse = diffuseFactor * albedo * lightColor * attenuation;
   
   out_Color = vec4(diffuse, 1.0);
}


Either wandering the forum or programming. Most likely the latter Smiley

Github: http://github.com/CopyableCougar4
Offline Archive
« Reply #17 - Posted 2017-06-15 01:56:17 »

I'm sure it would be better to write this line:
1  
2  
3  
4  
   float diffuseFactor = max(dot(normal, lightDir), 0.0);
   if (diffuseFactor == 0) {
      discard;
   }


as
1  
2  
3  
4  
   float diffuseFactor = dot(normal, lightDir);
   if (diffuseFactor <= 0) {
      discard;
   }

Pages: [1]
  ignore  |  Print  
 
 

 
Ecumene (53 views)
2017-09-30 02:57:34

theagentd (76 views)
2017-09-26 18:23:31

cybrmynd (184 views)
2017-08-02 12:28:51

cybrmynd (182 views)
2017-08-02 12:19:43

cybrmynd (189 views)
2017-08-02 12:18:09

Sralse (198 views)
2017-07-25 17:13:48

Archive (747 views)
2017-04-27 17:45:51

buddyBro (881 views)
2017-04-05 03:38:00

CopyableCougar4 (1429 views)
2017-03-24 15:39:42

theagentd (1320 views)
2017-03-24 15:32:08
List of Learning Resources
by elect
2017-03-13 14:05:44

List of Learning Resources
by elect
2017-03-13 14:04:45

SF/X Libraries
by philfrei
2017-03-02 08:45:19

SF/X Libraries
by philfrei
2017-03-02 08:44:05

SF/X Libraries
by SkyAphid
2017-03-02 06:38:56

SF/X Libraries
by SkyAphid
2017-03-02 06:38:32

SF/X Libraries
by SkyAphid
2017-03-02 06:38:05

SF/X Libraries
by SkyAphid
2017-03-02 06:37:51
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!