Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (808)
Games in Android Showcase (239)
games submitted by our members
Games in WIP (872)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Vertex cache shenanigans 2 - WTF, Nvidia???  (Read 12765 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Posted 2016-10-26 00:30:32 »

This is a follow-up of this thread.

Now that my exam is over I've taken a bit of time to look into this stuff again.

I wrote a simple vertex cache emulator that when given an index list can compute the number of cache misses that would occur if the list was used to render something. It has tweakable cache and batch sizes, so it should work decently, minus possibly the weird results of old-ish AMD cards. However, when comparing the emulator against the number of cache misses reported by using pipeline statistics I noticed discrepancies, sometimes massive ones. I quickly narrowed it down to which older entry to overwrite when the cache is full.

I tried numerous methods:
  • A FIFO cache: the element that was ADDED the longest time ago to the cache was overwritten by the new index.
  • Last-use cache: the element that was USED the longest time ago to the cache was overwritten by the new index.
  • CPU cache-ish: the cache element to overwrite was indexed with (indexValue%cachesize).
  • CPU cache-ish 2: the cache element to overwrite was indexed with (indexElement%cachesize).

All of these systems were incorrect in some case. Hence I developed a small "algorithm" for checking which values were in the hardware cache:
1. Draw N triangles of 3 indices each and check how many cache misses occurs using pipeline statistics.
2. For each possible index I, draw the original N triangles plus the triangle (I, I, I) (same index 3 times). If the number of cache misses is the same, I was in the cache.

My GPU has a cache size of 32 entries. Drawing 10 triangles (30 vertices) with the indices (0, 1, 2, ..., 29) makes the cache results look like this:
GPU:      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
Emulated: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, -1, -1]

What happens when we draw 11 triangles with 33 different indices in total?
GPU:      [30, 31, 32]
Emulated: [32, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

Wow. So basically, if <number of cache misses in triangle (0 to 3)> is more than what would fit in the cache, the entire cache is cleared, then the new entries are inserted. It doesn't even f**king try to overwrite an old index. It just starts a new cache. Wow. Just wow.

EDIT: This just in: It DOES clear the vertex cache. If a vertex isn't touched for 16 triangles (48 indices), it is cleared from the cache. This is getting hard as hell to code.

Offline theagentd
« Reply #1 - Posted 2016-10-26 01:56:27 »

Ooooook... So there doesn't actually seem to be a batch size at all. In other words, Nvidia doesn't process 96 indices per block, then clears the cache between each block. Basically, forget everything I've said so far. Here is the pattern I've found:

 - Nvidia's cache size is 32.

 - Writing is done from the first element in the cache to the last element in order.

 - If the cache misses caused by a triangle (0 to 3) would cause it to write beyond the end of the cache, the entire cache is cleared. All unique vertices become cache misses and are then placed as elements 0 to (unique_verts - 1) in the cleared cache.
        - Example 1: If the cache is full and all 3 vertices of the next triangle are in the cache, it will not reset the cache.
        - Example 2: If only one free element is left in the cache and a triangle with 2 misses is drawn, then the cache is reset, all 3 vertices become cache misses and are placed in the cache.
        - Example 3: If the cache is full, contains vertex 0 and the degenerate triangle (0, 0, 1) is drawn, then the cache is cleared, 0 and 1 become cache misses and are placed in the cache.

 - Cache entries that are not used by any triangle become invalid after 16 triangles (48 indices) after their last use.

 - All cache entries have a maximum lifetime of 32 triangles (96 indices) after they were originally placed in the cache, no matter how many times they are used within that time.

 - Note that when a cache entry becomes invalid, it is not possible to write a new vertex to it until the entire cache is reset due to overflow. The entries simply are not usable for matching anymore, making the cache effectively smaller until reset.

 - The lifetimes of the vertices in the cache is based on the number of vertices of the geometry being drawn. For points, the entries become invalid after 16 points (16 indices) after their most recent use, or 32 points (32 indices) after they were first used. For lines, it's 16/32 lines (32/64 indices). For triangles, it's 16/32 triangles (48/96 indices).


If you draw 100 identical degenerate (5, 5, 5) triangles, 5 will be placed in the first slot (slot 0) of the cache. After 32 triangles, slot 0 becomes invalid. For the 33rd triangle, slot 0 will be unusable and 5 will again be placed in the cache, this time in slot 1. After 32*32=1024 triangles, the entire cache has slowly been filled with 5s that have gone invalid, and the last slot finally becomes invalid. When the 1025th triangle is drawn, the entire cache is reset and 5 is placed in slot 0 again.

If you draw the triangle (0, 1, 2), then repeatedly draw (0, 0, 0) triangles, the entries for 1 and 2 will become invalid after only 16 triangles as they are not used. This 16-triangle counter is reset each time the vertex is reused in a triangle, but regardless of if they're used or not they will become invalid after 32 triangles. In other words, 1 and 2 become invalid by the 17th triangle, and 0 becomes invalid by the 32th triangle.

Following all these rules I seem to be able to exactly simulate the results of every single index list I've tested so far.

Offline thedanisaur

JGO Knight

Medals: 59

« Reply #2 - Posted 2016-10-26 19:37:37 »

Do you have any insight as to why they implemented it this way?  Huh

Every village needs an idiot Cool
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven

« JGO Overlord »

Medals: 1371
Projects: 4
Exp: 16 years

Hand over your head.

« Reply #3 - Posted 2016-10-27 21:26:09 »

I'd say that vertex-caches are never overwritten, because the GPU is this async machine that will be reading that vertex data long after the indexed geometry is processed by the vertex-shader. Allocating a new vertex-cache upon filling one seems almost the only option...?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Pages: [1]
  ignore  |  Print  

Riven (846 views)
2019-09-04 15:33:17

hadezbladez (5790 views)
2018-11-16 13:46:03

hadezbladez (2602 views)
2018-11-16 13:41:33

hadezbladez (6206 views)
2018-11-16 13:35:35

hadezbladez (1498 views)
2018-11-16 13:32:03

EgonOlsen (4733 views)
2018-06-10 19:43:48

EgonOlsen (5791 views)
2018-06-10 19:43:44

EgonOlsen (3275 views)
2018-06-10 19:43:20

DesertCoockie (4174 views)
2018-05-13 18:23:11

nelsongames (5500 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!