Java-Gaming.org Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (775)
Games in Android Showcase (230)
games submitted by our members
Games in WIP (856)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 [2]
  ignore  |  Print  
  Global Illumination via Voxel Cone Tracing in LWJGL  (Read 60902 times)
0 Members and 1 Guest are viewing this topic.
Offline Kefwar
« Reply #30 - Posted 2016-06-02 09:04:54 »

sorry, can't say much about it right now, because my gpu got busted and I haven't had time yet to buy a new pc - I'm currently working on my notebook with a geforce 730m, which is so sad...needless to say my framerate is not above 20.
If you're in need of other testing hardware, I think I'm not the only one willing to help you out with that. I can run it on a GeForce GTX 980M for you.  Pointing

Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #31 - Posted 2016-06-15 16:22:45 »

Okay, for everyone interested in more info about my implementation, which in fact is still very naive, here are some performance facts:

Running on my good old GTX770, it takes:

0.4ms to reset all voxels with glClearTexImage (no distinction between static and dynamic voxels yet)
12.7ms to voxelize sponza completely (no atomic average used)
15.8ms for vct fullscreen post-process (1.9ms for diffuse cone trace only, 8ms for specular tracing only (straaaange), 4 diffuse cones, 1 specular)
3.34ms for mipmap generation with custom compute shader

so when revoxelization has to be done (object moves, light moves), my implementation doesn't get the 30fps any more for sponza on my card. Currently working on a solution with distinction between static and dynamic objects...and a version with unlimited bounces of gi of course Smiley
Offline theagentd
« Reply #32 - Posted 2016-06-15 16:26:07 »

15.8ms for vct fullscreen post-process (1.9ms for diffuse cone trace only, 8ms for specular tracing only (straaaange), 4 diffuse cones, 1 specular)
Not strange at all. A specular cone is thinner and therefore requires more iterations. Since it also reads from a larger mipmap, you're probably completely thrashing the texture cache, further screwing up performance.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline elect

JGO Knight


Medals: 60



« Reply #33 - Posted 2016-06-15 16:38:58 »

Gpu timers, right?

Primitive and mesh count?
Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #34 - Posted 2016-06-15 16:55:50 »

15.8ms for vct fullscreen post-process (1.9ms for diffuse cone trace only, 8ms for specular tracing only (straaaange), 4 diffuse cones, 1 specular)
Not strange at all. A specular cone is thinner and therefore requires more iterations. Since it also reads from a larger mipmap, you're probably completely thrashing the texture cache, further screwing up performance.

That's not what I found strange - it's strange that the amount of time needed for diffuse and specular tracing is less then the sum of them both seperately.

@elect: Of course. Triangle count is ~260k, I'm drawing 393 entities, in the means of I use 393 vertex and index buffers to draw the scene, because no global buffer.
Offline theagentd
« Reply #35 - Posted 2016-06-15 18:37:20 »

That's not what I found strange - it's strange that the amount of time needed for diffuse and specular tracing is less then the sum of them both seperately.
This can be explained with registers. Your shaders require a certain amount of registers for each shader invocation. GPUs rely on having multiple shader invocations in registers at the same time to quickly be able to switch to another invocation if one stalls due to a texture cache miss. Merging two shaders into one can cause the register usage to increase, reducing the number of invocations your GPU can keep in registers at the same time, hence reducing texture performance if you're thrashing the cache, which you are. Keeping them separate and using blending to combine the result is probably a good idea in this case.

Myomyomyo.
Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #36 - Posted 2016-06-15 18:44:45 »

thank you for your explanation, that's what I've already guessed Smiley
Offline theagentd
« Reply #37 - Posted 2016-06-15 20:17:04 »

There's a bug in the Nvidia shader compiler causing shaders to sometimes cause suboptimal register usage. They're refusing to acknowledge the problem, so I've given up on reporting it. Basically, sometimes it decides to store all your texture samples in temorary registers, then sum them up.

Example shader to reproduce the bug:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
#version 150


//We definitely want the loop unrolled or performance is horrible
#pragma optionNV(unroll all)


//Disabling inlining makes the register count constant (2 registers) but has lots of overhead
//#pragma optionNV(inline 0)


/*
Register usage scales linearly with samples if inlining is on. Examples:
 -  64 samples: 34 registers
 - 128 samples: 66 registers
 - 256 samples: 130 registers
THIS SHOULD NOT HAPPEN. The shader is easily executed unrolled and inlined with only
2-3 registers regardless of sample count, and this increases the time it takes
to run the shader by 100-1000x longer for higher sample counts.
 */

#define SAMPLES 256



uniform sampler2D tex1;
uniform sampler2D tex2;

out vec4 fragColor;

void sample(inout vec3 sum1, inout float sum2, vec2 sampleCoords){
   vec3 v1 = texture(tex1, sampleCoords).rgb;
   float v2 = texture(tex2, sampleCoords).r;
   
   sum1 += v1 * v2;
   sum2 += v2;
}

void main(){

   vec3 sum1 = vec3(0);
   float sum2 = 0;
   
   for(float i = 0; i < SAMPLES; i++){
      sample(sum1, sum2, gl_FragCoord.xy + float(i));
   }
   
   fragColor = vec4(sum1 + sum2, 1.0);
}

Myomyomyo.
Offline Hydroque

JGO Coder


Medals: 25
Exp: 5 years


I'm always inspiring a good time.


« Reply #38 - Posted 2016-06-18 08:21:18 »

Quick question. On your void sample is that Vec2 in, out, or inout? Shocked

You think I haven't been monitoring the chat? http://pastebin.java-gaming.org/c47d35366491fHere is a compilation <3
Offline theagentd
« Reply #39 - Posted 2016-06-18 10:25:13 »

The default is in.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #40 - Posted 2016-07-16 20:10:37 »

Currently playing around with static mesh caching and multiple bounces via grid shading. Still some incorrect calculations inside and as always far from perfect, but it's on its way. Lighting is solely from the windows, which are configured as objects with emissive materials. The sphere is a perfect mirror, where you can see the maximum level of detail reflections can have with the given grid resolution.


Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #41 - Posted 2016-07-18 07:51:49 »

Does anybode have experience with smarter solutions for smoother diffuse lighting? I'm thinking of better filtering for mipmap generation, better cone tracing etc. Since I already tried to increase the count of diffuse cones traced, any experience with better 3d mipmapping filters/techniques would be helpful, thanks.
Offline theagentd
« Reply #42 - Posted 2016-07-18 09:29:56 »

You could try doing something like "tricubic" filtering manually, but other than that there's not much you can do. You can do tricubic filtering with "only" 8 samples (instead of 64).

Myomyomyo.
Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #43 - Posted 2016-07-18 10:53:47 »

Thanks for the hint. I think with 8 samples, it doesn't differ much from what I'm doing right now. Instead of assuming the weight is uniform, I have an alpha value and normalize afterwards.

I think I will try to increase my kernel to a 3*3*3 kernel, compare it with a kernel that only takes corner samples, and maybe test a modified 3*3*3 kernel of what I don't know how the technique is called.
Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #44 - Posted 2016-08-08 18:48:04 »

images are so boring!

<a href="http://www.youtube.com/v/9UGc6gn6sXA?version=3&amp;hl=en_US&amp;start=" target="_blank">http://www.youtube.com/v/9UGc6gn6sXA?version=3&amp;hl=en_US&amp;start=</a>
Offline orange451

JGO Kernel


Medals: 444
Projects: 7
Exp: 7 years


Your face? Your ass? What's the difference?


« Reply #45 - Posted 2016-08-10 16:57:35 »

Stunning...

First Recon. A java made online first person shooter!
Offline h.pernpeintner

JGO Ninja


Medals: 106



« Reply #46 - Posted 2018-11-07 21:50:46 »

Finally managed do implement a new Voxel Cone Tracing method that uses multiple grids - but not the regular cascade approach, which suffers from being view-centric (permanent revoxelization, less caching, instable voxel values).
My volumes are AABBs and my currently still very naive tracing traces against these boxes and within these boxes, does the regular 3d texture sampling. So a mixture of ray tracing and cone tracing. This gives you voxel global illumination for more (all?) kinds of level scales: Volumes can be of arbitrary resolution, which gives you more detail where you need it and less where you don't. The indirect illumination can be completely cached or updated with n grids per frame, giving the freedom of updating volumes near the camera more often than others. Also, one could as well only have n volumes that are precalculated and streamed from texture, or even having a resource pool of n grids around the player. Last but not least my method is not the regular direct voxel lighting, but a deferred approach that treats voxel grids as 3d gbuffers. So while only moving objects have to be revoxelized at all, calculating lighting works without the need to voxelize anything. This way, Recalculating two (or more) bounce global illumination on lighting changes can be done.

Here's a comparison between voxel quality of  two 256³ grids, one containing the whole sponza scene, one containing less then half the space.



And here's what a difference can lay between a single (bottom) and a second (top) bounce of lighting. Please don't look too closely, my equations aren't correct yet and in the comparison I disabled the second bounce for diffuse tracing only, which is the reason why specular lighting looks boosted in the bottom picture.

Pages: 1 [2]
  ignore  |  Print  
 
 

 
hadezbladez (45 views)
2018-11-16 13:46:03

hadezbladez (50 views)
2018-11-16 13:41:33

hadezbladez (29 views)
2018-11-16 13:35:35

hadezbladez (20 views)
2018-11-16 13:32:03

EgonOlsen (1900 views)
2018-06-10 19:43:48

EgonOlsen (1936 views)
2018-06-10 19:43:44

EgonOlsen (1286 views)
2018-06-10 19:43:20

DesertCoockie (1718 views)
2018-05-13 18:23:11

nelsongames (1412 views)
2018-04-24 18:15:36

nelsongames (2043 views)
2018-04-24 18:14:32
Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08

Deployment and Packaging
by gouessej
2018-08-22 08:03:45

Deployment and Packaging
by philfrei
2018-08-20 02:33:38

Deployment and Packaging
by philfrei
2018-08-20 02:29:55

Deployment and Packaging
by philfrei
2018-08-19 23:56:20

Deployment and Packaging
by philfrei
2018-08-19 23:54:46
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!