Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (803)
Games in Android Showcase (237)
games submitted by our members
Games in WIP (867)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  LibGDX Depth Testing is horrendously slow  (Read 3749 times)
0 Members and 1 Guest are viewing this topic.
Offline Ecumene

JGO Kernel

Medals: 200
Projects: 4
Exp: 8 years

I did not hit her! I did not!

« Posted 2016-09-28 03:15:02 »

In the render method we do two things
 1. Render all entities to an FBO
 2. Scale FBO and draw it to the screen

In the FBO, depth testing is enabled and the resolution is very low
entityBuffer = new FrameBuffer(Format.RGBA8888, (int) camera.viewportWidth, (int) camera.viewportHeight, true);

When drawing to the FBO, there seems to be a pretty big performance difference on what we depth test.
 - The game runs slower the more we depth-test
 - The depth buffer is cleared with 0.0 every frame
 - The framebuffer IS bound, and when we render it depth-testing is disabled
 - We're using
in a shader that renders every entity to the framebuffer to set the depth, disabling it does nothing performance wise

It seems like when I enable the depth buffer while drawing a large group of objects it becomes very slow on my laptop, but my PC with it's dedicated video card does fine.

Is there any way to speed up depth testing? It seems like that's what is slowing the game down

Offline theagentd
« Reply #1 - Posted 2016-09-28 05:20:08 »

- The game runs slower the more we depth-test
= "the more overdraw we add"? That would of course be expected, as more pixels = more work, especially for a shitty integrated laptop GPU.

That being said, depth testing with 3D geometry usually improves performance as the depth test can run before the fragment shader, allowing the GPU to avoid running the fragment shader for occluded pixels. This depends on two things to work:

 - If the shader writes to gl_FragDepth, the shader implicitly has to run before the depth test as it determines the value used in the comparison. This has VERY significant performance implications.
 - If discard; is used, the early depth test's functionality is severely limited as it cannot update the value in the depth buffer until the shader has executed or it would write depth for discarded pixels. This again can have performance implications, but usually not as severe as an early depth test can still be run against previously drawn geometry, potentially avoiding shader execution, but this has to be much more conservative.

You should never write to gl_FragDepth if you can avoid it since it disables so many important optimizations. If your geometry is flat, then simply outputting the depth to the vertices will give you the same result but allow all the optimizations to work as expected. If you however for some reason need non-linear per-pixel depth, there are still things you can do to improve performance. If you are able to calculate the minimum depth (the depth value closest to the camera), you're able to output that as a conservative depth value in the vertex shader. You can then in the fragment shader specify how exactly you will modify the depth value of gl_FragDepth to allow the GPU to run a conservative depth test against the hardware computed depth (the one you outputted from the vertex shader). You always want to modify the depth in the OPPOSITE way that you're testing it. Example:

 - You use GL_LESS for depth testing and the depth is cleared to 1.0.
 - You output the MINIMUM depth that the polygon can possibly have from the vertex shader.
 - In the fragment shader, you specify that the depth value will always be GREATER than the hardware computed value using
layout (depth_greater) out float gl_FragDepth;

This will allow your GPU to run a conservative depth test using the hardware computed depth value, at least giving the GPU a chance (similar to when discard; is used) of culling things before running the fragment shader. This feature requires hardware compatibility though, but GL_ARB_conservative_depth is available on all OGL3 GPUs as an extension, even Intel, plus OGL2 Nvidia GPUs. Additionally, this can be queried in the GLSL shader and be enabled if available, and won't cause any damage if it isn't available at least (at least if you skip computing the minimum depth in vertex shader as well).

Clearing the screen to 0.0 would cause nothing to ever pass the depth test if you use standard GL_LESS depth testing. I'd strongly suggest using GL_LESS and clearing to 1.0 instead as that is the standard way of using a depth buffer, which in some cases could be faster in hardware.

If you could specify some more information about your use case, I could give you better advice and more information.

Offline purenickery
« Reply #2 - Posted 2016-09-28 17:32:40 »

Thanks for the in-depth reply! Here's some more information:

Based on the testing and timings that we've done, I figure it *has* to be something other than writing to the depth buffer itself that is causing the problem. Just to test it out (and I might end up keeping it this way), I had a free channel in my shadow frame buffer that I wasn't using, so I manually wrote the depth of all the objects to the free channel in the shadow buffer. Then, in the main rendering shader, I checked the depth against the channel in the shadow buffer and drew/didn't draw the pixel accordingly. On my laptop with integrated GPU, it ran at around 250 FPS this way, while with the standard depth buffer it ran at 30-40 FPS. On my PC with dedicated graphics it has no real impact compared to standard depth testing.

There is no way the code I scraped together is 5 times more efficient than the built in depth testing so I figure there has to be something else we were doing wrong  Huh

Working on Questica!
Twitter: @purenickery
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #3 - Posted 2016-09-29 02:03:44 »

Thanks for the in-depth reply!

I'm drawing the same conclusion as you. You're most likely hitting a slow path on Intel for some reason. Some possibilities to explore:
 - gl_FragDepth disables hierarchical depth buffers, compression, etc making it slow.
 - gl_FragDepth may just be inherently slow in hardware on Intel cards.
 - Clearing the depth buffer to 0.0 and using GL_GREATER depth testing may be slow and/or disable hardware optimizations.

Offline theagentd
« Reply #4 - Posted 2016-10-02 04:57:38 »

I'm a bit interested in a follow-up on this, if you have time sometime. =P

Offline purenickery
« Reply #5 - Posted 2016-10-05 05:02:58 »

I tried clearing the depth buffer to 1 and using GL_LEQUAL but it remained just as slow. I believe the problem was due to calling gl_FragDepth on every single fragment. I sped it up by going *full 3d mode* and having openGL calculate depth for me so I don't have to call gl_FragDepth, consequently I basically have a 3d game now  Roll Eyes

Working on Questica!
Twitter: @purenickery
Offline Hydroque

JGO Coder

Medals: 25
Exp: 5 years

I'm always inspiring a good time.

« Reply #6 - Posted 2016-10-05 06:50:17 »

I found one potential issue. Make sure

Because the depth format is a normalized integer format, the driver will have to use the CPU to convert the normalized integer data into floating-point values. This is slow.

The preferred way to handle this is with this code:

  if(depth_buffer_precision == 16)
    GLushort mypixels[width*height];
    glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_UNSIGNED_SHORT, mypixels);
  else if(depth_buffer_precision == 24)
    GLuint mypixels[width*height];    //There is no 24 bit variable, so we'll have to settle for 32 bit
    glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_UNSIGNED_INT_24_8, mypixels);  //No upconversion.
  else if(depth_buffer_precision == 32)
    GLuint mypixels[width*height];
    glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, mypixels);

Dunno tho tryna help. Also, make sure you allocate all buffers and check if there are multiple depth testing actions going on, which would be more than necessary ofc.

You think I haven't been monitoring the chat? is a compilation <3
Offline theagentd
« Reply #7 - Posted 2016-10-05 07:49:01 »

@Hydroque: Completely irrelevant to the thread. The problem here is depth TESTING.

Pages: [1]
  ignore  |  Print  

Riven (397 views)
2019-09-04 15:33:17

hadezbladez (5280 views)
2018-11-16 13:46:03

hadezbladez (2204 views)
2018-11-16 13:41:33

hadezbladez (5544 views)
2018-11-16 13:35:35

hadezbladez (1150 views)
2018-11-16 13:32:03

EgonOlsen (4584 views)
2018-06-10 19:43:48

EgonOlsen (5462 views)
2018-06-10 19:43:44

EgonOlsen (3119 views)
2018-06-10 19:43:20

DesertCoockie (4015 views)
2018-05-13 18:23:11

nelsongames (4708 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!