Java-Gaming.org Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (769)
Games in Android Showcase (230)
games submitted by our members
Games in WIP (855)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Improving normal map precision using texture compression?! Yep!  (Read 4443 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Posted 2016-08-25 10:05:50 »

Hello, everyone!

As you may have seen in the "What I did today"-thread, I recently worked a bit with normal maps. I finally managed to exactly replicate the standard inputs and algorithm of 3D modeling programs, normal map baking software and all major engines out there, allowing me to use generated (baked) normal maps. My test object is a cube with smoothed normals, which uses a normal map to get back its blockiness with just some smoothed edges and corners. It all looks good...  at a distance. Zoom in and you get this:



Ouch. The 8-bit normal map simply doesn't have enough precision to exactly "unsmooth" the smoothed normal back to a flat surface, causing that blocking artifact. This is even with bilinear filtering enabled! The problem is simply that the gradient stored in the normal map to counteract the smoothness simply doesn't have good enough precision, causing aliasing (adjacent pixels get rounded to the same value), so even filtering doesn't even help if the input is already aliased.

The only real solution here is to generate a 16-bit normal map and upload it to VRAM at 16-bit precision too. That's a massive cost though. I'm currently using RGTC/BC5, which allows you to store two channels (X and Y) of the normal map compressed to half the size, with Z reconstructed as sqrt(1 - x^2 + y^2). BC5 compresses a block of 16 pixels to just 16 bytes, meaning that each normal map texel only uses 1 byte! This is a massive saving, allowing me to have more normal maps and/or higher resolution normal maps in memory. Going to 16-bit normal map precision (using the same compression trick as above) would force me to drop the compression as there is no 16-bit compression formats (at least not on OGL3 level hardware), meaning I'd need 4 times as much memory for a normal map. That instantly cuts off the first mipmap level if I want to stay at about the same memory usage as before. That's not really something I can afford. However, I read somewhere it it is possible to cram out some extra precision out of BC5, so I decided to run some experiments.

Let's first go through some theory about how BC4 and BC5 work. BC5 is just the two-channel version of the single-channel BC4, so I will be using BC4 for this example. BC4 divides the texture into 4x4 texel blocks. It then stores two 8-bit reference values in each block and 3-bit indices for each texel. BC5 is then decoded using some special logic depending on the order of the reference values. If the first value is bigger than the second one, the index describes a linear blend between the two values. If the first value is smaller, the index is partly used as a blending factor, but can also represent the constants 0.0 and 1.0. Here's some pseudo code:
1  
2  
3  
4  
5  
6  
7  
8  
index = <value between 0 and 7>
if(reference1 > reference2){
    result = (index*reference1 + (7-index)*reference2) / 7.0f;
}else{
    result = (index*reference1 + (5-index)*reference2) / 5.0f;
    if(index == 6) result = 0.0f;
    if(index == 7) result = 1.0f;
}

NOTE: This is not the exact algorithm used! The indices don't linearly map to values exactly like this! See https://msdn.microsoft.com/en-us/library/windows/desktop/bb694531%28v=vs.85%29.aspx#BC4 for exact info on the specification!

This is the basis for how we can gain more precision out of a BC4 and BC5 than 8 bits in some special cases. If we look at the exact math done here, we see that in the first case we blend together the two reference values based on a 3-bit blending factor. This gives us 8 possible colors: one of the two original colors or one of six values evenly spaced between them. So in theory, even if the reference values themselves are only stored at 8-bit precision, since we can access 6 values inbetween the two reference values we can actually get a result that has a higher than 8-bit precision in some cases, up to something inbetween 10 and 11 bits. This would be a pretty major gain for absolutely zero cost!

However, this makes one grave assumption: When a compressed texture is read, the uncompressed result is stored in the GPU's hardware texture cache. The specification of BC4 and BC5 do not require the decompressed result to be stored at float precision, meaning that the GPU is technically allowed to simply decompress to 8-bit values, throwing away any the extra precision. However, when ATI came up with BC5 it was specifically tailored for normal map compression, and they explicitly stated that they stored the decompressed values at 16-bit precision, more than enough to accurately store the decompressed 11-bit-ish result!

I decided to try check out how Nvidia had implemented using a simple trick. If decompressed values are only stored in 8-bit precision, the resulting values when sampling the BC5 compressed texture will be exactly n/255f. Hence, I disabled all texture filtering and wrote a tiny shader that calculated textureColor*255.0, and checked how far that was from a multiple of 255. If all values are exact multiples of 255 the precision would only be 8 bits, but during testing I found that certain parts of my compressed normal map had values that were completely different from multiples of 255! So, it seems Nvidia too stores the decompressed result at higher-than-8-bit precision, which is exactly what I was hoping for! However, when I tried to upload 16-bit texture data to a BC5 texture I was unable to gain any extra precision. It seems like the driver's texture compressor converts the 16-bit inputs to 8-bit inputs before compressing the texture, so it can't be relied on for this. Drats!

I plan on writing a small offline texture compressor that brute forces the best reference values and indices for each block in the texture to create an optimally compressed normal map. I have a feeling that this will improve the quality a lot, especially for gradients like the ones my test cube has.



Myomyomyo.
Offline theagentd
« Reply #1 - Posted 2016-08-25 19:25:31 »

Minor update: Turns out that brute-forcing 4 reference values and 8 different indices for each of the 16 pixels is 255^4 * 8*16 iterations, or around 550 billion combinations..... for each 4x4 block of pixels... and my normal maps have around 1 million of those blocks each. I had this nice little "0.002% done, 8.8 hours left" part after optimizing it quite a bit. Let's just say a complete brute force is not realistic. >___>

Myomyomyo.
Offline theagentd
« Reply #2 - Posted 2016-08-25 21:43:19 »

Damn. That worked far better than I had ever hoped.

Original 8-bit normal map:



16-bit normal map compressed with BPTC5:



With bloom enabled, the bottom picture looks 100% perfect. That is TRULY insane! It's pretty crazy that I can get a that big quality improvement using half as much memory. o___o Really cool that it worked out! Now I just need to optimize the converter a bit (took around 5 minutes to convert that texture, and that was with a fairly low search space).

Myomyomyo.
Pages: [1]
  ignore  |  Print  
 
 

 
EgonOlsen (1572 views)
2018-06-10 19:43:48

EgonOlsen (1632 views)
2018-06-10 19:43:44

EgonOlsen (1144 views)
2018-06-10 19:43:20

DesertCoockie (1569 views)
2018-05-13 18:23:11

nelsongames (1173 views)
2018-04-24 18:15:36

nelsongames (1638 views)
2018-04-24 18:14:32

ivj94 (2395 views)
2018-03-24 14:47:39

ivj94 (1605 views)
2018-03-24 14:46:31

ivj94 (2691 views)
2018-03-24 14:43:53

Solater (882 views)
2018-03-17 05:04:08
Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08

Deployment and Packaging
by gouessej
2018-08-22 08:03:45

Deployment and Packaging
by philfrei
2018-08-20 02:33:38

Deployment and Packaging
by philfrei
2018-08-20 02:29:55

Deployment and Packaging
by philfrei
2018-08-19 23:56:20

Deployment and Packaging
by philfrei
2018-08-19 23:54:46
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!