Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (690)
Games in Android Showcase (200)
games submitted by our members
Games in WIP (764)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
   Home   Help   Search   Login   Register   
  Show Posts
Pages: [1] 2 3 ... 117
1  Game Development / Newbie & Debugging Questions / Re: Multiple shaders or if statements? on: 2016-09-28 02:28:13
I don't have any performance problems at the moment but I'm thinking about the problem with further development. As example if I implement bump mapping and material maps (specular maps, gloss maps, etc), it will multiply the amount of shaders for each combination. Thinking even further, when I get at the stage where I want to implement my idea of skeletal animations, it would lead to even more shaders. Being at that stage, it would be a lot of work to add more light types or making other improvements because I would have to change tons of shaders.
I'm using deferred shading, so for me the bottleneck is almost always the write to the massive G-buffer (4 render targets!). Due to this, I can get away with always doing normal mapping and reading all 4 optional texture maps I support. The texture units and ALU cores are simply idle otherwise waiting for the ROP writes, so doing some extra reads from empty 4x4 textures won't affect performance at all. Doing small too small draw calls in general is also very bad for GPU performance, small being something like under ~100 triangles or so.

Normal mapping doesn't actually have that much of an overhead, so you should probably be able to get away with just using an pass-through 4x4 texture and always using it.
2  Game Development / Newbie & Debugging Questions / Re: Multiple shaders or if statements? on: 2016-09-26 18:57:14
It all depends on what you're doing .

First of all, the choice between many shaders and a few uber-shaders depends on where your bottleneck is. Each shader bind has a fairly big CPU cost, so if you have 1000 shader switches per frame this could easily be your bottleneck. In this case, switching to an uber-shader will improve performance as it reduces the CPU load by a lot for some GPU overhead instead. If your GPU is already the bottleneck, then increasing the GPU cost to eliminate a couple of shader switches will just pile on more stuff on the GPU, reducing performance.

Some shader tips:
 - An if-statement is not inherently slow in shaders. It all depends on divergence and the size of the if/else blocks. If all shader invocations in a group (= all vertex shader invocations in a certain batch or all fragment shader invocations in a certain pixel area) take the same path in the if-statement (either all true or all false), then the if-statement will be cheap and only one of the two paths will be executed. If the shader invocations diverge, both sides will have to be executed for all invocations as the group runs in lockstep. In the end, this just doesn't matter if both the if and/or else blocks don't contain a lot of code.

 - Simple conditional assignments, like
x = condition ? a : b;
generally compile to conditional assignment instructions that don't require any branching at all. You can use this to your advantage.
3  Java Game APIs & Engines / Java Sound & OpenAL / Re: Should sound be on its own thread? on: 2016-09-26 17:41:28
In addition to the points that people have brought up here, you should never place disk access in your game loop. Disk access is both slow and unpredictable. For normal hard drives, the access time is in the order of 10ms. In addition, they're a shared with all the other processes in your computer, so if your anti-virus software decides that there's no better time than right now to start a full virus scan, a 0.25 second sound effect can now take 1 second to load.
4  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-26 03:57:56
Accidentally threw together this scene while messing with a new terrain system, so I decided to pose the guy up a bit and take a screenshot.

DoF is still broken due to massive a register usage on Nvidia, causing it to be extremely slow. It also isn't compatible with like 95% of my other post processing at the moment, causing ghosting and artifacts in motion. For still shots, it looks good though as you can see.
5  Game Development / Shared Code / Re: Converting floats/doubles to 10/11/16/N bit floats on: 2016-09-23 14:24:42
Just a quick question: did you learn all of this at university? I haven't gotten to university yet, so I wouldn't know. Where did you learn all of this? Lol I'm getting desperate Tongue
We did have a lecture on two on how floating point numbers work at uni, but I just looked up the specifications of the different values on Wikipedia.

princec: memory footprint.
Yeah, the point here is to halve the bandwidth and memory usage.
6  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-23 00:15:41
me vs. unsigned long long int - 1:0


Did you know that an
unsigned long long triple double short char
can be between 1 and 573.5 bytes depending on the hardware?
7  Game Development / Shared Code / Converting floats/doubles to 10/11/16/N bit floats on: 2016-09-23 00:10:34
GPUs often use smaller floats than 32 bits to avoid having to use a full 4 bytes per color channel. There are a number of common formats on GPUs, with 16-bit floats being the most common, but 10 and 11 bit formats are fairly common too. See this page for more info:

There's no native support for <32-bit floats in Java, but it can be really useful to be able to work with smaller float values. Here are some use case examples:
 - You can store vertex attributes as 16-bit floats, especially normals and many other attributes that don't need a full 32-bit float to save a lot of space.
 - You can create 16-bit float texture data, or even R11F_G11F_B10F texture data offline and save it to a file without an OpenGL context, or something similar.
 - You can avoid some wasted memory bandwidth by reading back a 16-bit float texture in its native format and doing the unpacking on the CPU, although the driver may be faster at converting to 32-bit than my code...
 - Generally save memory when writing float binary data to files, as you can choose exactly how many bits to use for the exponent, the mantissa and even if you need a sign bit at all.

Storytime, the code is at the bottom =P
I first wrote a function to convert a 32-bit float to 16-bit floats and then back again using the Wikipedia specification, but then I realized that there are other float formats out there, so I decided to rework it a bit. I instead made two generic converter functions that take in a double value and converts it to a certain number of exponent and mantissa bits, with the sign being optional. Additionally, this also allowed me to test the system by using my functions to convert from 64-bit floats to 32-bit floats and compare that to a simple cast. So now I have a generic function that can handle any number of bits <=32, with a varying size mantissa and exponent for whatever needs you have.

 - Denormals handled correctly for all bit counts.
 - Infinity/NaN preserved.
 - Clamps negative values to zero if the output value has no sign.
 - Values too big for the small format are rounded to infinity.
 - Values too small for the small format are rounded to 0.
 - Positive/negative zeroes preserved.
 - No dependencies.
 - Static functions for everything.
 - Shortcut methods for halfs, 11-bit and 10 bit floats.
 - Good performance (~50-100 million conversions per second).

Accuracy test
From my tests, converting doubles to 32-bit floats using my conversion function (and back again) provides 100% identical result as when doing a simple double-->float cast in Java (and back again). This test consisted of converting 18 253 611 008 random double bits to floats and back again, with 100% identical result to just casting. This should mean that the conversion is 100% accurate for 16-bit values as well, but this is harder to test.

Comments and suggestions are welcome.

public class FloatConversion {

   private static final int DOUBLE_EXPONENT_BITS = 11;
   private static final long DOUBLE_EXPONENT_MASK = (1L << DOUBLE_EXPONENT_BITS) - 1;
   private static final long DOUBLE_EXPONENT_BIAS = 1023;
   private static final long DOUBLE_MANTISSA_MASK = (1L << 52) - 1;
   public static long doubleToSmallFloat(double d, boolean hasSign, int exponentBits, int mantissaBits){
      long bits = Double.doubleToRawLongBits(d);
      long s = -(bits >>> 63);
      long e = ((bits >>> 52) & DOUBLE_EXPONENT_MASK) - DOUBLE_EXPONENT_BIAS;
      long m = bits & DOUBLE_MANTISSA_MASK;
      int exponentBias = (1 << (exponentBits-1)) - 1;
      if(!hasSign && d < 0){
         //Handle negative NaN and clamp negative numbers when we don't have an output sign
         if(e == 1024 && m != 0){
            return (((1 << exponentBits) - 1) << mantissaBits) | 1; //Negative NaN
            return 0; //negative value, clamp to 0.
      long sign = s;
      long exponent = 0;
      long mantissa = 0;

      if(e <= -exponentBias){

         double abs = Double.longBitsToDouble(bits & 0x7FFFFFFFFFFFFFFFL);
         //Value is too small, calculate an optimal denormal value.
         exponent = 0;
         int denormalExponent = exponentBias + mantissaBits - 1;
         double multiplier = Double.longBitsToDouble((denormalExponent + DOUBLE_EXPONENT_BIAS) << 52);
         //Odd-even rounding
         mantissa = (long)Math.rint(abs * multiplier);
      }else if(e <= exponentBias){
         //A value in the normal range of this format. We can convert the exponent and mantissa
         //directly by changing the exponent bias and dropping the extra mantissa bits (with correct
         //rounding to minimize the error).
         exponent = e + exponentBias;
         int shift = 52 - mantissaBits;
         long mantissaBase = m >> shift;
         long rounding = (m >> (shift-1)) & 1;
         mantissa = mantissaBase + rounding;

         //Again, if we overflow the mantissa due to rounding to 1024, we want to round the result to
         //up to infinity (exponent 31, mantissa 0). Through a stroke of luck, the code below
         //is not actually needed due to how the mantissa bits overflow into the exponent bits,
         //but it's here for clarity.
         //exponent += mantissa >> 10;
         //mantissa &= 0x3FF;
         //We have 3 cases here:
         // 1. exponent = 128 and mantissa != 0 ---> NaN
         // 2. exponent = 128 and mantissa == 0 ---> Infinity
         // 3. value is to big for a small-float---> Infinity
         //So, if the value isn't NaN we want infinity.
         exponent = (1 << exponentBits) - 1;
         if(e == 1024 && m != 0){
            mantissa = 1; //NaN
            mantissa = 0; //infinity
         return (sign << (mantissaBits + exponentBits)) + (exponent << mantissaBits) + mantissa;
         return (exponent << mantissaBits) + mantissa;
   public static double smallFloatToDouble(long f, boolean hasSign, int exponentBits, int mantissaBits){

      int exponentBias = (1 << (exponentBits-1)) - 1;

      long s = hasSign ? -(f >> (exponentBits + mantissaBits)) : 0;
      long e = ((f >>> mantissaBits) & ((1 << exponentBits) - 1)) - exponentBias;
      long m = f & ((1 << mantissaBits) - 1);

      long sign = s;
      long exponent = 0;
      long mantissa = 0;

      if(e <= -exponentBias){
         //We have a float denormal value. Cheat a bit with the calculation...

         int denormalExponent = exponentBias + mantissaBits - 1;
         double multiplier = Double.longBitsToDouble((DOUBLE_EXPONENT_BIAS - denormalExponent) << 52);
         return (1 - (sign << 1)) * (m * multiplier);

      }else if(e <= exponentBias){
         //We have a normal value that can be directly converted by just changing the exponent
         //bias and shifting the mantissa.
         exponent = e + DOUBLE_EXPONENT_BIAS;
         int shift = 52 - mantissaBits;
         mantissa = m << shift;
         //We either have infinity or NaN, depending on if the mantissa is zero or non-zero.
         exponent = 2047;
         if(m == 0){
            mantissa = 0; //infinity
            mantissa = 1; //NaN
      return Double.longBitsToDouble(((sign << 63) | (exponent << 52) | mantissa));
   //Half floats
   public static short floatToHalf(float f){
      return (short) doubleToSmallFloat(f, true, 5, 10);
   public static float halfToFloat(short h){
      return (float) smallFloatToDouble(h, true, 5, 10);
   public static short doubleToHalf(double d){
      return (short) doubleToSmallFloat(d, true, 5, 10);
   public static double halfToDouble(short h){
      return smallFloatToDouble(h, true, 5, 10);
   //OpenGL 11-bit floats
   public static short floatToF11(float f){
      return (short) doubleToSmallFloat(f, false, 5, 6);
   public static float f11ToFloat(short f){
      return (float) smallFloatToDouble(f, false, 5, 6);
   public static short doubleToF11(double f){
      return (short) doubleToSmallFloat(f, false, 5, 6);
   public static double f11ToDouble(short f){
      return smallFloatToDouble(f, false, 5, 6);
   //OpenGL 10-bit floats.
   public static short floatToF10(float f){
      return (short) doubleToSmallFloat(f, false, 5, 5);
   public static float f10ToFloat(short f){
      return (float) smallFloatToDouble(f, false, 5, 5);
   public static short doubleToF10(double f){
      return (short) doubleToSmallFloat(f, false, 5, 5);
   public static double f10ToDouble(short f){
      return smallFloatToDouble(f, false, 5, 5);
8  Discussions / General Discussions / Re: Programmer jokes on: 2016-09-20 17:38:43
Not sure if posted here before...
Wouldn't work with the img tag...
9  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-20 17:22:34
This semester I have a class on scientific philosophy, which I scoffed at in the beginning. I had to turn in my first homework today, on some simple paradoxes. We also had to read a small paper on Karl Popper's theory of science as falsification, and frankly I'm a bit blown away by it. It puts words and a logical definition on something I've felt in the back of my head about a lot of things when reading and hearing stuff online and IRL. It basically deals with what separates real science and psuedo science and is a fantastic read IMO.

Here's a random link to the same paper I had to read, as my uni only lets people with account download it from them.

Uhhhh, spoiler warning in case you want to read it yourself?
  • It is easy to obtain confirmations, or verifications, for nearly every theory — if we look for confirmations.
  • Confirmations should count only if they are the result of risky predictions; that is to say, if, unenlightened by the theory in
    question, we should have expected an event which was incompatible with the theory — an event which would have refuted the
  • Every "good" scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it
  • A theory which is not refutable by any conceivable event is non-scientific. Irrefutability is not a virtue of a theory (as
    people often think) but a vice.
  • Every genuine test of a theory is an attempt to falsify it, or to refute it. Testability is falsifiability; but there are degrees of
    testability: some theories are more testable, more exposed to refutation, than others; they take, as it were, greater risks.
  • Confirming evidence should not count except when it is the result of a genuine test of the theory; and this means that it can
    be presented as a serious but unsuccessful attempt to falsify the theory. (I now speak in such cases of "corroborating
  • Some genuinely testable theories, when found to be false, are still upheld by their admirers — for example by introducing
    ad hoc some auxiliary assumption, or by reinterpreting the theory ad hoc in such a way that it escapes refutation. Such a
    procedure is always possible, but it rescues the theory from refutation only at the price of destroying, or at least lowering, its
    scientific status. (I later described such a rescuing operation as a "conventionalist twist" or a "conventionalist stratagem.")

10  Java Game APIs & Engines / OpenGL Development / Re: Weird SSAO halo on: 2016-09-20 15:58:02
I may have misunderstood but here goes.

For the background, fragPos, normal and hence tangent and bitangent will (persumably) all be zero since they've been set when clearing the G-buffer. This causes all the samples to end up at (0, 0, 0), which is turned into the texture coordinates (0.5, 0.5, 0.5). In other words, every single background pixel will read the center of the screen and compare the depth of that pixel with 0.5. As you move the camera around, the background will change/flicker as the center pixel it's compared to changes depth.
11  Java Game APIs & Engines / OpenGL Development / Re: Weird SSAO halo on: 2016-09-20 11:11:01
Your algorithm places a couple of samples in 3D around the pixel in question, then compares the depth of the sample with the depth in the depth buffer. For the white background the depth is 1.0, so if anything is nearby it'll obviously fail the depth test. You can prevent this by not applying SSAO when the depth is exactly 1.0. However, this problem occurs when you have any large differences in depth, as an object 10km away will get occluded by an ant 10cm from the camera. This this is an inherent flaw of your algorithm.
12  Java Game APIs & Engines / OpenGL Development / Re: Vertex cache shenanigans on: 2016-09-17 02:27:00
I've messaged and added Jono and ClaasJG on Skype, but I haven't gotten any responses yet. If anyone with an AMD card has time to do some vertex cache testing I'd really appreciate the help! It all basically amounts to setting up the program I've posted here in an IDE, add LWJGL3 as a dependency, then modifying it a bit to some more fine-grained testing on when exactly the values change. My ultimate goal would be to write a per-vendor mesh optimizer!
13  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-17 00:26:11
Wow, that's amazing!!! Did you talk with the guy? =P

No, sorry Sad
Was cramming study for a test, didn't really have time to.
If I see him again I'll talk to him Smiley

This just made my day.

Gotta say I was quite surprised, but thought it was pretty cool.
If you see him again, tell him the devs said hi. =P
14  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-16 12:58:29
I saw a guy at uni who had a We Shall Wake desktop background. @theagentd
Wow, that's amazing!!! Did you talk with the guy? =P
15  Java Game APIs & Engines / OpenGL Development / Re: Vertex cache shenanigans on: 2016-09-13 15:41:06
Radeon HD 290X for the sake of completeness:

Batch size test invocations: 131072 / 50331648
Calculated vertex cache batch size: 384

Cache size 1 invocation test: 131072 / 50331648
Cache size 2 invocation test: 262144 / 50331648
Cache size 3 invocation test: 393216 / 50331648
Cache size 4 invocation test: 524288 / 50331648
Cache size 5 invocation test: 655360 / 50331648
Cache size 6 invocation test: 786432 / 50331648
Cache size 7 invocation test: 917504 / 50331648
Cache size 8 invocation test: 1048576 / 50331648
Cache size 9 invocation test: 1179648 / 50331648
Cache size 10 invocation test: 1310720 / 50331648
Cache size 11 invocation test: 1441792 / 50331648
Cache size 12 invocation test: 1572864 / 50331648
Cache size 13 invocation test: 1703936 / 50331648
Cache size 14 invocation test: 1835008 / 50331648
Cache size 15 invocation test: 6422528 / 50331648
Cache size 16 invocation test: 11927552 / 50331648
Cache size 17 invocation test: 50331648 / 50331648

  Renderer: AMD Radeon R9 200 Series
  Calculated vertex cache batch size: 384
  Cache size: 16

Oooooof couuuurse. The newer 200 series turns out to exactly 384. Wooh. The plot thickens. Well, I guess that partly confirms my 387 --> 384 hypothesis at least.
16  Java Game APIs & Engines / OpenGL Development / Re: Vertex cache shenanigans on: 2016-09-13 14:29:41
Indeed, despite some shader errors here and there, the data gathered is really good. Thanks everyone!

The test has two parts. The first part just uses a massive 0-filled index buffer and draws it, checking how many times the vertex shader is executed. The second part tries to figure out the cache size by trying a bigger and bigger repeated list of indices (0, 1, ..., n, 0, 1, ..., n, 0, 1, ..., n, ........), where n is increased by 1 between each test. At some point, this will start thrashing the cache, as when the number of unique indices is bigger than the cache, it'll have lost vertex 0 by the time the list of indices repeats, causing every single entry in the index buffer to require a new vertex shader execution.

Let's go through the results:

Intel seems to be the most straightforward, and literally the only vertex cache that actually works as expected. The GPU loops through the index list and keeps a 128-entry FIFO vertex cache. When drawing an index buffer of length 3*1024*1024 filled completely with 0s, it only runs the vertex shader once, then never again. When the number of vertices exceeds the vertex cache, thrashing occurs and every single index needed a vertex shader execution, which is exactly what I had predicted based on "public knowledge" of the vertex cache. This is what people optimize meshes for.

Nvidia's solution is more complicated. Even if you render an index buffer filled with 0s, the vertex shader will be executed more than once. What is happening here is that the GPU is splitting up the index buffer into chunks of <num vertices in each primitive>*32, which in the case of triangles is 96. For lines it's 64 and for points it's 32. This is what I call the "batch size" in the test results. There seems to be a different vertex cache for each of the batches, so even if the index buffer contains only zeroes the vertex shader will be executed once per batch. This severely limits the usefulness of the cache, as it greatly increases the chance of having to run a vertex shader multiple times as reuse only works within the same 96-index block. In addition, there is a 32-entry FIFO cache within each block as well, so it's still possible to overflow the cache within each block if it contains more than 32 unique indices. Most likely, this choice was made by Nvidia to allow for more parallelism in hardware, as it allows each 96-index block to be processed completely independently. Intel needs to go through the entire index buffer linearly.
This has major implications on how a mesh should be optimized, as the mesh optimizer needs to be aware of the 96-index blocks to be able to make the best decisions. Otherwise it may assume that a vertex will be reused for two triangles, but the triangles may turn out to be in different 96-index blocks, so the vertex won't be in the cache there.

AMD's technique is...... very weird. It seems similar to Nvidia's solution, but the results don't perfectly match that. The calculated batch size is 387, which is 384+3, which is 32*3*4+3, so the batch size seems to be roughly 4x as big as for Nvidia. That's a pretty uneven number that I really wasn't expecting. Most likely, the actual batch size is 384, with some additional weird behavior in there. As for the actual cache size within each size, it's most likely 16 both for the HD7800 and the KAVERI APU, but the results are again inconsistent. In addition, the results are off by one between the two (8130 vs 8129 invocations). =___= There's definitely something fishy and complicated going on here. To get anything conclusive that would actually be useful information for a mesh optimizer, I'd need to run more tests. I don't really have a guess for why the batch size seems so random, but the the discrepancy for the HD7800 not being completely cache thrashed at 16-23 entries could be explained by the GPU updating the cache is small batches (most likely 8 ) instead of one by one. This would explain why the GPU kiiiinda manages to do at least some caching up to 24 entries. There could also be some ordering weirdness here as well.

We really need to do more testing on AMD hardware. If either ClaasJG, Jono or Abuse have time for it, I'd love it if we could continue testing a bit using IRC or Skype to be able to do some more rapid iterations of the test program. Feel free to either PM me or respond in this thread if any of you are interested!

Thanks a lot for all the help, guys!
17  Java Game APIs & Engines / OpenGL Development / Re: *AMD testing needed!* Vertex cache shenanigans on: 2016-09-12 17:16:48
ClaesJG, can you try this jar out: It has an increased test size to hopefully give more accurate results, but it may take some time to complete. Thanks a lot for testing!!! This is extremely interesting!
18  Java Game APIs & Engines / OpenGL Development / Re: *AMD testing needed!* Vertex cache shenanigans on: 2016-09-12 17:04:54
Hmm, the results look inconsistent. Are they identical on each run?
19  Java Game APIs & Engines / OpenGL Development / Re: *AMD testing needed!* Vertex cache shenanigans on: 2016-09-12 16:57:39
Here's a new version with a proper shader!


EDIT: There is no need to rerun this benchmark on Nvidia hardware, as it will give the exact same results. =P
20  Java Game APIs & Engines / OpenGL Development / Re: *AMD testing needed!* Vertex cache shenanigans on: 2016-09-12 16:43:40
Ohhh!! Thanks a lot for testing. Looks like the AMD driver is smart enough to just not run the vertex shader. I'll have to expand the test to include a proper shader. Give me a couple of minutes!
21  Java Game APIs & Engines / OpenGL Development / Re: *testing needed!* Vertex cache shenanigans on: 2016-09-12 16:22:45
I started the gfx driver utility, to figure out the version - then my W10 system BSODed. You're welcome. Emo
I am so sorry. Friends don't let friends buy Intel GPUs.
22  Game Development / Newbie & Debugging Questions / Re: Improving lighting on: 2016-09-12 16:19:54
The easiest way to make sure they are all correct is to first set all tiles that are known to be bright to their correct values. Then do multiple passes where you check all neighboring tiles to "expand" the light to neighbors.

Example: You have a single bright block with a light value of 8 surrounded by dark (light value 0) blocks. You do a pass to spread the light to nearby tiles, where each tile checks its neighboring tile and sets its own light to the maximum of the neighbors minus one (or itself if it's already brighter than the neighbors). This pass will cause the blocks next to the bright block to gain a light value of 7. Then you can run the exact same pass again, causing the next set of neighbors to gain a light value of 6. You can repeat this until no tiles were updated, at which point you stop.

Obviously you'd not want to do this each game update, but to update the lighting every time a block changes. If your world is big, you can split up the world into chunks of, say, 16x16 tiles and only update the tiles that are affected by a given block change.
23  Java Game APIs & Engines / OpenGL Development / Re: *testing needed!* Vertex cache shenanigans on: 2016-09-12 15:08:57
weird, I recall I read somewhere it was 24 on nvidia..
It's definitely 32, but there's more to it than that. That's why I made this test. I'll explain once I have a bit more data. I'm really curious if my findings are the same for Intel and AMD.
24  Java Game APIs & Engines / OpenGL Development / Re: *testing needed!* Vertex cache shenanigans on: 2016-09-12 14:57:12
Error: Pipeline statistics are not supported. Aborting.
  Renderer: Mesa DRI Intel(R) Broadwell


I should mention this is a chromebook

Hmm, that's weird. According to, it should be supported in certain drivers. =/ See if you can update to one of the supported drivers there. I'd be extremely interesting in the result on Intel cards as well.
25  Java Game APIs & Engines / OpenGL Development / Re: *testing needed!* Vertex cache shenanigans on: 2016-09-12 14:41:04
Thanks everyone! I'd really love to have someone test this one AMD since it seems like Nvidia's 700-series cards and up are all the same. =P
26  Java Game APIs & Engines / OpenGL Development / Vertex cache shenanigans on: 2016-09-12 14:22:33

I wrote a small test the other day which was supposed to calculate the size of the vertex cache of the GPU, but I got some very surprising results which indicate that the vertex cache isn't working as, well, everyone expects. I've thrown together a small test program which does some indexed draw calls and uses ARB_pipeline_statistics_query to check the number of resulting vertex shader invocations, and then outputs its findings to a log file. I am EXTREMELY interested in knowing what kind of results people get on other hardware than my GTX 770, especially on AMD cards.

Here's the entirety of the test source code (only requires LWJGL3):
Here's a precompiled jar (may not run on Mac):

Please run the jar (or compile the test yourself) and post the contests of the generated log file in this thread! Although the program prints the GL_RENDERER string the driver returns, it may not show the exact GPU you have, so if possible include that information as well.

Thanks for your attention! The results of this test could heavily impact how meshes should be optimized for vertex caches!
27  Game Development / Newbie & Debugging Questions / Re: glDrawElements producing INVALID_OPERATION or segfault on: 2016-09-07 23:40:37
You're most likely somehow causing the driver to read out-of-bounds data. Either the vertex attribute pointers or the index data pointers are telling the driver to read an invalid memory address, or the indices your index data holds are causing the driver to read vertex data data out of bounds.
28  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-06 16:47:12
...I've switched my focus onto PBR and lighting again.

Do you do image based lighting along with PBR?
Not yet, since we don't have a real map editor yet. I was thinking of doing some simple prebaked stuff once we have that up and running.
29  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-09-06 00:59:38
Today I continued my efforts to create a fully accurate resource workflow for model creation. With my Blender plugin mostly feature complete (at least for models, no animations yet) and normal maps now being interpreted exactly as bakers produce them, I've switched my focus onto PBR and lighting again. My goal is to take the textures produced by Substance Designer, import them into my engine and get as close results as possible to Substance. At first it looked really wrong, but after figuring out how Substance uses its textures and a whole lot of Googling I managed to get something very similar to the preview in Substance. In this process, I ended up switching from Cook-Torrance to GGX for specular lighting, which seems to be the by far most dominant way of handling specular lighting in physical based rendering. It's a fair bit lighter in math after some optimizations, but it probably won't matter when using shadows as they're by far the heaviest part of lighting.  I can't really show any screenshots until a certain press release though. Tongue

Previously we've tried to create our own model converters and material editors, but in the end we simply don't have time to code, maintain and expand that kind of stuff with mostly just me working on them. Therefore, we've been working on getting compatibility with at least some mainstream tools and adhering to standards regarding PBR, normal mapping, etc. It's all really starting to take shape, which is really exciting! It's all so f**king hard to research though, because all technical details are drowned by artist buzzwords and layman explanations of how to use a certain checkbox in a certain program. Example: Substance was exporting "Metallic" maps, and I had no idea what they were for. I googled and just found ten variations of "metallic+roughness is a different workflow from specular+gloss" and then another ten variations of "X uses Y, but you can pretty much use whichever you prefer". Took me a long time before I found the following on
Metalness is a replacement for the "specular map", or the "F0" value -- it's an easier way of authoring specular maps. It also ties in with the main colour texture though.
Metalness of 0/black means that you should use an F0 value of 0.03 (8/255 in a specular map).
Metalness of 1/white means that you should read the main "color" texture and use that as F0 (and also replace the "diffuse color" with black).
Metalness of grey means you do something in-between those two extremes (diffuse color becomes party black, specular color is a lerp of 0.03 and the color map).
So frigging simple, yet completely buried, just like the technical details of tangent space normal mapping. Anyway, I'm thinking of switching to storing metalness in my G-buffer instead, since it allows you to get colored specular reflections. Before, I stored a diffuse color and just a single specular intensity, which means that he specular reflection is always white. There are some metallic objects that have colored specular reflections though (gold and copper for example) which won't look good with this scheme. "Metalness" essentially switches between two "modes", one for metals and one for non-metals. Non-metals have a hardcoded specular intensity of around 0.03 and the diffuse texture is used for diffuse lighting. Metals instead interpret the diffuse texture as the specular intensity and have zero diffuse lighting. TL;DR: Metalness decides if the diffuse texture is used as diffuse or specular color.

In addition, I have an idea for doing decal blending of normals and lighting parameters so that decals can modify the lighting of a surface and not just the color. The problem for me was that I use a tight packing in the G-buffer:
Texture1: Diffuse RGB, alpha unused.
Texture2: Packed normal xy, roughness, specular
Texture3: Emissive RGB, primitive ID (for SRAA).

The problem here is that the normal isn't blendable, and in addition the specular isn't easily modifyable either since it's the alpha channel. If I write an alpha value from the shader, it'll also get written as the specular intensity! We also don't want to modify the primitive ID stored in SRAA, as it is bitwise compared with the ID of a second pass, so any modification here will break SRAA. However, I think that there's a really complicated solution to this problem:
 - Per-render-target blending settings aren't supported on OGL3 hardware, but per-target color masks are! We can use those to prevent the primitive ID from being modified by the alpha value written there!
 - Instead of packing the normal using spheremap projection, I can encode a normal with all of XYZ and store roughness as the length of the normal. This doesn't use any extra memory, but has the benefit of being somewhat linearly blendable! It will have some normal distortion due to the normals having different lengths, and the blended vector may have a shorter length which will affect roughness a bit, but it's all 100x better than getting pure garbage by blending encoded values.
 - Lastly, we need to fix the specular blending. Since I can just mask out the alpha channel of of Texture1 and Texture3, I will try to use blend constants to at least allow me to place a single specular value for the entire decal, which would be better than nothing.

EDIT: When I move over to storing a metallic value instead, it will simply be a plug-in replacement for the specular value. It should work very well, since most decals will either add a metallic OR a non-metallic object, so the single-value-for-entire-decal limitation won't be a big deal at all in that case.

Errr, this turned into a bit of a rant, but ehhh....

Sure, screen-space ambient occlusion is a pretty old technique, but it's my very first time doing global illumination!
AO is actually the opposite of global illumination, although it is often grouped together with GI. With GI, you start with zero ambient lighting and add up sources of ambient light. With AO, you assume a certain amount of light and check if something nearby would be blocking that light.
30  Discussions / Miscellaneous Topics / Re: What I did today on: 2016-08-30 15:12:44
I upgraded my Blender plugin a bit. It can now export the skeleton/bindpose of a model, and vertices with normals, tangents, UVs and bone bindings.
Pages: [1] 2 3 ... 117
xTheGamerPlayz (68 views)
2016-09-26 21:26:27

Wave Propagation (266 views)
2016-09-20 13:29:55

steveyg90 (378 views)
2016-09-15 20:41:23

steveyg90 (378 views)
2016-09-15 20:13:52

steveyg90 (418 views)
2016-09-14 14:44:42

steveyg90 (442 views)
2016-09-14 14:42:13

theagentd (457 views)
2016-09-12 16:57:14

theagentd (389 views)
2016-09-12 14:18:31

theagentd (302 views)
2016-09-12 14:14:46

Nihilhis (734 views)
2016-09-01 13:36:54
List of Learning Resources
by elect
2016-09-09 09:47:55

List of Learning Resources
by elect
2016-09-08 09:47:20

List of Learning Resources
by elect
2016-09-08 09:46:51

List of Learning Resources
by elect
2016-09-08 09:46:27

List of Learning Resources
by elect
2016-09-08 09:45:41

List of Learning Resources
by elect
2016-09-08 08:39:20

List of Learning Resources
by elect
2016-09-08 08:38:19

Rendering resources
by Roquen
2016-08-08 05:55:21 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!