Java-Gaming.org Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (769)
Games in Android Showcase (230)
games submitted by our members
Games in WIP (855)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 ... 204 205 [206]
  ignore  |  Print  
  What I did today  (Read 2892993 times)
0 Members and 5 Guests are viewing this topic.
Offline Sickan
« Reply #6150 - Posted 2018-10-10 18:55:44 »

Today I finished the 3D camera, now on to adding height to the map terrain.

Offline KaiHH

JGO Kernel


Medals: 614



« Reply #6151 - Posted 2018-10-12 18:06:01 »

On my way to do hierarchical occlusion culling on the GPU using compute shaders, guided by the scene's kd-tree.
Here is a video showing only hierarchical frustum culling:
<a href="http://www.youtube.com/v/FNuqvgHlrz8?version=3&amp;hl=en_US&amp;start=" target="_blank">http://www.youtube.com/v/FNuqvgHlrz8?version=3&amp;hl=en_US&amp;start=</a>
(notice how the culled areas become larger the farther away the camera is from a particular node)
Also, the frustum the culling uses is smaller than the camera's frustum to show the effect.
So, how does it work:
- The CPU (Java, that is) initially assembles a list of kd-tree nodes at level N (where N is really small) very quickly and submits that to the compute shader via a SSBO
- At every pass, the compute shader reads in that list, performs culling, and writes the visible nodes into an output SSBO
- Those two buffer bindings ping-pong between each other
- An atomic counter buffer is used to record the actual number of written nodes in each step, which will be the input for the next step (stream compaction with parallel prefix sums is absolute overkill for these very small amount of nodes) (I've also implemented warp-aggregated atomics in GLSL only to see that the GLSL compiler seemingly does this automatically already (as is mentioned in the article for nvcc))
- The last pass is used to write MultiDrawArraysIndirect structs for the voxels in the visible nodes to a separate SSBO (the voxels SSBO containing all voxels is built in such a way that all voxels within any kd-tree node are contiguous, so that it is easy to generate a MultiDrawArraysIndirect when only having a kd-tree node, which contains the index of the first voxel it or its descendends contain along with the number of voxels)
- This last SSBO is then used as the draw indirect buffer for a glMultiDrawArraysIndirectCount() call together with the number of draw call structs stored in the atomic counter buffer, which becomes the indirect parameter buffer

Here is the compute shader doing the culling:
https://gist.github.com/httpdigest/15399efe2b60a2b31d1c2cbe414ce5cf
and here is some portion of the host code driving the culling:

Next will be what will bring the most benefit for this highly fragment shader and ROP bound rendering: Combining Hi-Z occlusion culling with the hierarchical frustum culling. This means that Hi-Z occlusion culling will also be done hierarchically, starting with a coarse kd-tree node level and refining the nodes when they are visible. The reason why I am doing frustum culling on the GPU is: Hi-Z culling has* to be done on the GPU and doing it hierarchically through the kd-tree will benefit from fewer nodes to be tested.

*that's not entirely true, since there are games out there using a software rasterizer to cull on the CPU
Offline KaiHH

JGO Kernel


Medals: 614



« Reply #6152 - Posted 2018-10-13 21:45:42 »

Today was an interesting day, as I had witnessed how two parts of the rendering pipeline competed for being the major bottleneck when applying two slightly different techniques of voxel rendering.
The first technique was rendering point sprite quads covering the screen-space projection of the voxel, as presented by: http://www.jcgt.org/published/0007/03/04/
The second technique I came up with was to use the geometry shader to compute and generate the convex hull of the projected voxel with the help of this nice paper: https://pdfs.semanticscholar.org/1f59/8266e387cf367702d16acf5a4e02cc72cb99.pdf
While the first technique produces a very low load on vertex transform/primitive assembly, it suffers from many additional fragments being generated for close voxels, where the quad enclosing the screen-space projected voxel contains a large margin/oversize to a) still make it a quad and b) cover the voxel entirely. This produces a higher load on fragment operations (fragment shader doing the final ray/AABB intersection and likely more importantly the ROPs reading/comparing/writing depth and writing color).
Now my idea was to reduce fragment operation costs by reducing the amount of excess fragments the quad produces, by not making it a quad anymore but a perfectly fit convex hull comprising either 4 or 6 vertices.
Having heard many bad stories about how geometry shaders perform, I still gave it a try and I was positively surprised at an increase of roughly 21% in total frame time when generating the fragments with the convex hull for close voxels.
Here, the cost of fragment operations was reduced to a point where this wasn't the bottleneck anymore, but: vertex operations (passing the GL_POINTS to the geometry shader and there emitting a single triangle strip of either 4 or 6 vertices) now were. One could literally see how for moderately far away voxels where the screen-space quad had little oversize/margin, the quad rendering solution overtook the convex hull geometry shader solution.
The latter however was ideal for close voxels. So, it's going to be a hybrid solution in the end.
Here are some images and a video showing the overlap/margin of the point sprite quad rendering and the (missing) overlap of the convex hull rendering:

<a href="http://www.youtube.com/v/7TFKwAUZ0qE?version=3&amp;hl=en_US&amp;start=" target="_blank">http://www.youtube.com/v/7TFKwAUZ0qE?version=3&amp;hl=en_US&amp;start=</a>

Convex hull generation geometry shader: https://gist.github.com/httpdigest/fa7e071b87ca24b0a54fe0821e3c1a60
Pages: 1 ... 204 205 [206]
  ignore  |  Print  
 
 

 
EgonOlsen (1569 views)
2018-06-10 19:43:48

EgonOlsen (1541 views)
2018-06-10 19:43:44

EgonOlsen (1141 views)
2018-06-10 19:43:20

DesertCoockie (1568 views)
2018-05-13 18:23:11

nelsongames (1172 views)
2018-04-24 18:15:36

nelsongames (1549 views)
2018-04-24 18:14:32

ivj94 (2306 views)
2018-03-24 14:47:39

ivj94 (1515 views)
2018-03-24 14:46:31

ivj94 (2602 views)
2018-03-24 14:43:53

Solater (879 views)
2018-03-17 05:04:08
Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08

Deployment and Packaging
by gouessej
2018-08-22 08:03:45

Deployment and Packaging
by philfrei
2018-08-20 02:33:38

Deployment and Packaging
by philfrei
2018-08-20 02:29:55

Deployment and Packaging
by philfrei
2018-08-19 23:56:20

Deployment and Packaging
by philfrei
2018-08-19 23:54:46
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!