Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (764)
Games in Android Showcase (229)
games submitted by our members
Games in WIP (852)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  Vertex cache shenanigans  (Read 11186 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Posted 2016-09-12 14:22:33 »

Hello.

I wrote a small test the other day which was supposed to calculate the size of the vertex cache of the GPU, but I got some very surprising results which indicate that the vertex cache isn't working as, well, everyone expects. I've thrown together a small test program which does some indexed draw calls and uses ARB_pipeline_statistics_query to check the number of resulting vertex shader invocations, and then outputs its findings to a log file. I am EXTREMELY interested in knowing what kind of results people get on other hardware than my GTX 770, especially on AMD cards.

Here's the entirety of the test source code (only requires LWJGL3): http://www.java-gaming.org/?action=pastebin&id=1475
Here's a precompiled jar (may not run on Mac): https://drive.google.com/open?id=0B0dJlB1tP0QZbTc5ZExJeENOMWM

Please run the jar (or compile the test yourself) and post the contests of the generated log file in this thread! Although the program prints the GL_RENDERER string the driver returns, it may not show the exact GPU you have, so if possible include that information as well.

Thanks for your attention! The results of this test could heavily impact how meshes should be optimized for vertex caches!

Myomyomyo.
Offline princec

« JGO Spiffy Duke »


Medals: 1042
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #1 - Posted 2016-09-12 14:29:02 »

Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 960/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32

Cas Smiley

Offline Phased
« Reply #2 - Posted 2016-09-12 14:39:58 »

Exact same results as cas.

Renderer: GeForce GTX 1080/PCIe/SSE2

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #3 - Posted 2016-09-12 14:41:04 »

Thanks everyone! I'd really love to have someone test this one AMD since it seems like Nvidia's 700-series cards and up are all the same. =P

Myomyomyo.
Offline Ecumene

JGO Kernel


Medals: 197
Projects: 4
Exp: 8 years


I did not hit her! I did not!


« Reply #4 - Posted 2016-09-12 14:42:39 »

1  
2  
Error: Pipeline statistics are not supported. Aborting.
  Renderer: Mesa DRI Intel(R) Broadwell


;(

I should mention this is a chromebook

Offline theagentd
« Reply #5 - Posted 2016-09-12 14:57:12 »

1  
2  
Error: Pipeline statistics are not supported. Aborting.
  Renderer: Mesa DRI Intel(R) Broadwell


;(

I should mention this is a chromebook

Hmm, that's weird. According to http://feedback.wildfiregames.com/report/opengl/feature/GL_ARB_pipeline_statistics_query, it should be supported in certain drivers. =/ See if you can update to one of the supported drivers there. I'd be extremely interesting in the result on Intel cards as well.

Myomyomyo.
Offline SHC
« Reply #6 - Posted 2016-09-12 14:59:47 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 750 Ti/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32

Offline orange451

JGO Kernel


Medals: 433
Projects: 7
Exp: 7 years


Your face? Your ass? What's the difference?


« Reply #7 - Posted 2016-09-12 15:00:20 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 970/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32

First Recon. A java made online first person shooter!
Offline elect

JGO Knight


Medals: 59



« Reply #8 - Posted 2016-09-12 15:08:03 »

weird, I recall I read somewhere it was 24 on nvidia..
Offline theagentd
« Reply #9 - Posted 2016-09-12 15:08:57 »

weird, I recall I read somewhere it was 24 on nvidia..
It's definitely 32, but there's more to it than that. That's why I made this test. I'll explain once I have a bit more data. I'm really curious if my findings are the same for Intel and AMD.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
Administrator

« JGO Overlord »


Medals: 1342
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #10 - Posted 2016-09-12 16:18:45 »


1  
2  
3  
java -jar VertexCacheTest.jar
Error: Pipeline statistics are not supported. Aborting.
  Renderer: Intel(R) HD Graphics 530


I started the gfx driver utility, to figure out the version - then my W10 system BSODed. You're welcome. Emo

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline theagentd
« Reply #11 - Posted 2016-09-12 16:22:45 »

I started the gfx driver utility, to figure out the version - then my W10 system BSODed. You're welcome. Emo
I am so sorry. Friends don't let friends buy Intel GPUs.

Myomyomyo.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1342
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #12 - Posted 2016-09-12 16:24:17 »

I am so sorry. Friends don't let friends buy Intel GPUs.
Nah, it's a work laptop. I blame my boss. Anyhoo - enough derailing.


Oh, these might help:

Intel HD 530 driver:
   Version: 10.18.15.4271
   Release date: 2015-08-11

Installing the latest drivers now - let's see if I can brick this thing.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline ClaasJG

JGO Coder


Medals: 42



« Reply #13 - Posted 2016-09-12 16:35:31 »

Using Crimson 16.30.2311-160718a-305077c :

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
Batch size test invocations: 0 / 3145728
Calculated vertex cache batch size: 2147483647

Cache size 1 invocation test: 0 / 3145728
Cache size 2 invocation test: 0 / 3145728
Cache size 3 invocation test: 0 / 3145728
[...]
Cache size 32765 invocation test: 0 / 3145728
Cache size 32766 invocation test: 0 / 3145728
Cache size 32767 invocation test: 0 / 3145728
Error, failed to detect cache size.

Results:
  Renderer: AMD Radeon HD 7800 Series
  Calculated vertex cache batch size: 2147483647
  Cache size: -1


-ClaasJG

My english has to be tweaked. Please show me my mistakes.
Offline Abuse

JGO Ninja


Medals: 69


falling into the abyss of reality


« Reply #14 - Posted 2016-09-12 16:38:05 »

<snip>

1  
2  
3  
4  
Results:
  Renderer: AMD Radeon HD 5800 Series
  Calculated vertex cache batch size: 2147483647
  Cache size: -1


Yep, same failure on my AMD HD 5870 1GB. (Crimson 16.2.1)
Offline theagentd
« Reply #15 - Posted 2016-09-12 16:43:40 »

Ohhh!! Thanks a lot for testing. Looks like the AMD driver is smart enough to just not run the vertex shader. I'll have to expand the test to include a proper shader. Give me a couple of minutes!

Myomyomyo.
Offline theagentd
« Reply #16 - Posted 2016-09-12 16:57:39 »

Here's a new version with a proper shader!

Source: http://www.java-gaming.org/?action=pastebin&id=1476
Jar: https://drive.google.com/open?id=0B0dJlB1tP0QZN3gtcTVKalFqdU0

EDIT: There is no need to rerun this benchmark on Nvidia hardware, as it will give the exact same results. =P

Myomyomyo.
Offline ClaasJG

JGO Coder


Medals: 42



« Reply #17 - Posted 2016-09-12 17:00:40 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
Batch size test invocations: 8130 / 3145728
Calculated vertex cache batch size: 387

Cache size 1 invocation test: 8130 / 3145728
Cache size 2 invocation test: 16260 / 3145728
Cache size 3 invocation test: 24390 / 3145728
Cache size 4 invocation test: 32520 / 3145728
Cache size 5 invocation test: 40650 / 3145728
Cache size 6 invocation test: 48780 / 3145728
Cache size 7 invocation test: 56910 / 3145728
Cache size 8 invocation test: 65040 / 3145728
Cache size 9 invocation test: 73170 / 3145728
Cache size 10 invocation test: 81300 / 3145728
Cache size 11 invocation test: 89430 / 3145728
Cache size 12 invocation test: 97560 / 3145728
Cache size 13 invocation test: 105690 / 3145728
Cache size 14 invocation test: 113820 / 3145728
Cache size 15 invocation test: 395752 / 3145728
Cache size 16 invocation test: 699074 / 3145728
Cache size 17 invocation test: 3088412 / 3145728
Cache size 18 invocation test: 3121164 / 3145728
Cache size 19 invocation test: 3100694 / 3145728
Cache size 20 invocation test: 3096600 / 3145728
Cache size 21 invocation test: 3108882 / 3145728
Cache size 22 invocation test: 3137540 / 3145728
Cache size 23 invocation test: 3088412 / 3145728
Cache size 24 invocation test: 3145728 / 3145728

Results:
  Renderer: AMD Radeon HD 7800 Series
  Calculated vertex cache batch size: 387
  Cache size: 23


-ClaasJG

My english has to be tweaked. Please show me my mistakes.
Offline theagentd
« Reply #18 - Posted 2016-09-12 17:04:54 »

Hmm, the results look inconsistent. Are they identical on each run?

Myomyomyo.
Offline SHC
« Reply #19 - Posted 2016-09-12 17:10:50 »

With the new JAR:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 750 Ti/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32

Offline ClaasJG

JGO Coder


Medals: 42



« Reply #20 - Posted 2016-09-12 17:11:26 »

The results are identical on each run.

-ClaasJG

My english has to be tweaked. Please show me my mistakes.
Offline theagentd
« Reply #21 - Posted 2016-09-12 17:16:48 »

ClaesJG, can you try this jar out: https://drive.google.com/open?id=0B0dJlB1tP0QZUDk0QW1xSHNsRXM. It has an increased test size to hopefully give more accurate results, but it may take some time to complete. Thanks a lot for testing!!! This is extremely interesting!

Myomyomyo.
Offline ClaasJG

JGO Coder


Medals: 42



« Reply #22 - Posted 2016-09-12 18:30:36 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
Batch size test invocations: 130056 / 50331648
Calculated vertex cache batch size: 387

Cache size 1 invocation test: 130056 / 50331648
Cache size 2 invocation test: 260112 / 50331648
Cache size 3 invocation test: 390168 / 50331648
Cache size 4 invocation test: 520224 / 50331648
Cache size 5 invocation test: 650280 / 50331648
Cache size 6 invocation test: 780336 / 50331648
Cache size 7 invocation test: 910392 / 50331648
Cache size 8 invocation test: 1040448 / 50331648
Cache size 9 invocation test: 1170504 / 50331648
Cache size 10 invocation test: 1300560 / 50331648
Cache size 11 invocation test: 1430616 / 50331648
Cache size 12 invocation test: 1560672 / 50331648
Cache size 13 invocation test: 1690728 / 50331648
Cache size 14 invocation test: 1820784 / 50331648
Cache size 15 invocation test: 6331592 / 50331648
Cache size 16 invocation test: 11184834 / 50331648
Cache size 17 invocation test: 49414172 / 50331648
Cache size 18 invocation test: 49938444 / 50331648
Cache size 19 invocation test: 49610774 / 50331648
Cache size 20 invocation test: 49545240 / 50331648
Cache size 21 invocation test: 49741842 / 50331648
Cache size 22 invocation test: 50200580 / 50331648
Cache size 23 invocation test: 49414172 / 50331648
Cache size 24 invocation test: 50331648 / 50331648

Results:
  Renderer: AMD Radeon HD 7800 Series
  Calculated vertex cache batch size: 387
  Cache size: 23


here we go  Smiley

-ClaasJG

My english has to be tweaked. Please show me my mistakes.
Offline Kefwar
« Reply #23 - Posted 2016-09-12 19:21:06 »

Integrated Graphics of i7 4720HQ:
1  
2  
Error: Pipeline statistics are not supported. Aborting.
  Renderer: Intel(R) HD Graphics 4600

GTX 980M:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 980M/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32

I don't have an AMD  Cool

Offline Jono
« Reply #24 - Posted 2016-09-12 20:16:43 »

Some errors but seemed to run on an AMD APU

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
Vertex shader log:
0:1(10): error: GLSL 3.30 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.00 ES, and 3.00 ES

Fragment shader log:
0:1(10): error: GLSL 3.30 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.00 ES, and 3.00 ES

Link log:
error: linking with uncompiled shadererror: linking with uncompiled shader
Batch size test invocations: 8129 / 3145728
Calculated vertex cache batch size: 387

Cache size 1 invocation test: 8129 / 3145728
Cache size 2 invocation test: 16258 / 3145728
Cache size 3 invocation test: 24387 / 3145728
Cache size 4 invocation test: 32516 / 3145728
Cache size 5 invocation test: 40645 / 3145728
Cache size 6 invocation test: 48774 / 3145728
Cache size 7 invocation test: 56903 / 3145728
Cache size 8 invocation test: 65032 / 3145728
Cache size 9 invocation test: 73161 / 3145728
Cache size 10 invocation test: 81290 / 3145728
Cache size 11 invocation test: 89419 / 3145728
Cache size 12 invocation test: 97548 / 3145728
Cache size 13 invocation test: 105677 / 3145728
Cache size 14 invocation test: 113806 / 3145728
Cache size 15 invocation test: 398303 / 3145728
Cache size 16 invocation test: 699062 / 3145728
Cache size 17 invocation test: 3145728 / 3145728

Results:
  Renderer: Gallium 0.4 on AMD KAVERI (DRM 2.43.0, LLVM 3.8.0)
  Calculated vertex cache batch size: 387
  Cache size: 16

Offline Abuse

JGO Ninja


Medals: 69


falling into the abyss of reality


« Reply #25 - Posted 2016-09-13 01:57:25 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
Batch size test invocations: 130056 / 50331648
Calculated vertex cache batch size: 387

Cache size 1 invocation test: 130056 / 50331648
Cache size 2 invocation test: 260112 / 50331648
Cache size 3 invocation test: 390168 / 50331648
Cache size 4 invocation test: 520224 / 50331648
Cache size 5 invocation test: 650280 / 50331648
Cache size 6 invocation test: 780336 / 50331648
Cache size 7 invocation test: 910392 / 50331648
Cache size 8 invocation test: 1040448 / 50331648
Cache size 9 invocation test: 1170504 / 50331648
Cache size 10 invocation test: 1300560 / 50331648
Cache size 11 invocation test: 1430616 / 50331648
Cache size 12 invocation test: 1560672 / 50331648
Cache size 13 invocation test: 1690728 / 50331648
Cache size 14 invocation test: 1820784 / 50331648
Cache size 15 invocation test: 6372742 / 50331648
Cache size 16 invocation test: 11184822 / 50331648
Cache size 17 invocation test: 50331648 / 50331648

Results:
  Renderer: AMD Radeon HD 5800 Series
  Calculated vertex cache batch size: 387
  Cache size: 16


AMD HD 5870 1GB. (Crimson 16.2.1)
Offline ziozio
« Reply #26 - Posted 2016-09-13 05:21:52 »

Some results from Intel on Linux

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
Vertex shader log:
0:1(10): error: GLSL 3.30 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.00 ES, 3.00 ES, and 3.10 ES

Fragment shader log:
0:1(10): error: GLSL 3.30 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.00 ES, 3.00 ES, and 3.10 ES

Link log:
error: linking with uncompiled shadererror: linking with uncompiled shader
Batch size test invocations: 1 / 3145728
Calculated vertex cache batch size: 3145728

Cache size 1 invocation test: 1 / 3145728
Cache size 2 invocation test: 2 / 3145728
Cache size 3 invocation test: 3 / 3145728
Cache size 4 invocation test: 4 / 3145728
Cache size 5 invocation test: 5 / 3145728
Cache size 6 invocation test: 6 / 3145728
Cache size 7 invocation test: 7 / 3145728
Cache size 8 invocation test: 8 / 3145728
Cache size 9 invocation test: 9 / 3145728
Cache size 10 invocation test: 10 / 3145728
Cache size 11 invocation test: 11 / 3145728
Cache size 12 invocation test: 12 / 3145728
Cache size 13 invocation test: 13 / 3145728
Cache size 14 invocation test: 14 / 3145728
Cache size 15 invocation test: 15 / 3145728
Cache size 16 invocation test: 16 / 3145728
Cache size 17 invocation test: 17 / 3145728
Cache size 18 invocation test: 18 / 3145728
Cache size 19 invocation test: 19 / 3145728
Cache size 20 invocation test: 20 / 3145728
Cache size 21 invocation test: 21 / 3145728
Cache size 22 invocation test: 22 / 3145728
Cache size 23 invocation test: 23 / 3145728
Cache size 24 invocation test: 24 / 3145728
Cache size 25 invocation test: 25 / 3145728
Cache size 26 invocation test: 26 / 3145728
Cache size 27 invocation test: 27 / 3145728
Cache size 28 invocation test: 28 / 3145728
Cache size 29 invocation test: 29 / 3145728
Cache size 30 invocation test: 30 / 3145728
Cache size 31 invocation test: 31 / 3145728
Cache size 32 invocation test: 32 / 3145728
Cache size 33 invocation test: 33 / 3145728
Cache size 34 invocation test: 34 / 3145728
Cache size 35 invocation test: 35 / 3145728
Cache size 36 invocation test: 36 / 3145728
Cache size 37 invocation test: 37 / 3145728
Cache size 38 invocation test: 38 / 3145728
Cache size 39 invocation test: 39 / 3145728
Cache size 40 invocation test: 40 / 3145728
Cache size 41 invocation test: 41 / 3145728
Cache size 42 invocation test: 42 / 3145728
Cache size 43 invocation test: 43 / 3145728
Cache size 44 invocation test: 44 / 3145728
Cache size 45 invocation test: 45 / 3145728
Cache size 46 invocation test: 46 / 3145728
Cache size 47 invocation test: 47 / 3145728
Cache size 48 invocation test: 48 / 3145728
Cache size 49 invocation test: 49 / 3145728
Cache size 50 invocation test: 50 / 3145728
Cache size 51 invocation test: 51 / 3145728
Cache size 52 invocation test: 52 / 3145728
Cache size 53 invocation test: 53 / 3145728
Cache size 54 invocation test: 54 / 3145728
Cache size 55 invocation test: 55 / 3145728
Cache size 56 invocation test: 56 / 3145728
Cache size 57 invocation test: 57 / 3145728
Cache size 58 invocation test: 58 / 3145728
Cache size 59 invocation test: 59 / 3145728
Cache size 60 invocation test: 60 / 3145728
Cache size 61 invocation test: 61 / 3145728
Cache size 62 invocation test: 62 / 3145728
Cache size 63 invocation test: 63 / 3145728
Cache size 64 invocation test: 64 / 3145728
Cache size 65 invocation test: 65 / 3145728
Cache size 66 invocation test: 66 / 3145728
Cache size 67 invocation test: 67 / 3145728
Cache size 68 invocation test: 68 / 3145728
Cache size 69 invocation test: 69 / 3145728
Cache size 70 invocation test: 70 / 3145728
Cache size 71 invocation test: 71 / 3145728
Cache size 72 invocation test: 72 / 3145728
Cache size 73 invocation test: 73 / 3145728
Cache size 74 invocation test: 74 / 3145728
Cache size 75 invocation test: 75 / 3145728
Cache size 76 invocation test: 76 / 3145728
Cache size 77 invocation test: 77 / 3145728
Cache size 78 invocation test: 78 / 3145728
Cache size 79 invocation test: 79 / 3145728
Cache size 80 invocation test: 80 / 3145728
Cache size 81 invocation test: 81 / 3145728
Cache size 82 invocation test: 82 / 3145728
Cache size 83 invocation test: 83 / 3145728
Cache size 84 invocation test: 84 / 3145728
Cache size 85 invocation test: 85 / 3145728
Cache size 86 invocation test: 86 / 3145728
Cache size 87 invocation test: 87 / 3145728
Cache size 88 invocation test: 88 / 3145728
Cache size 89 invocation test: 89 / 3145728
Cache size 90 invocation test: 90 / 3145728
Cache size 91 invocation test: 91 / 3145728
Cache size 92 invocation test: 92 / 3145728
Cache size 93 invocation test: 93 / 3145728
Cache size 94 invocation test: 94 / 3145728
Cache size 95 invocation test: 95 / 3145728
Cache size 96 invocation test: 96 / 3145728
Cache size 97 invocation test: 97 / 3145728
Cache size 98 invocation test: 98 / 3145728
Cache size 99 invocation test: 99 / 3145728
Cache size 100 invocation test: 100 / 3145728
Cache size 101 invocation test: 101 / 3145728
Cache size 102 invocation test: 102 / 3145728
Cache size 103 invocation test: 103 / 3145728
Cache size 104 invocation test: 104 / 3145728
Cache size 105 invocation test: 105 / 3145728
Cache size 106 invocation test: 106 / 3145728
Cache size 107 invocation test: 107 / 3145728
Cache size 108 invocation test: 108 / 3145728
Cache size 109 invocation test: 109 / 3145728
Cache size 110 invocation test: 110 / 3145728
Cache size 111 invocation test: 111 / 3145728
Cache size 112 invocation test: 112 / 3145728
Cache size 113 invocation test: 113 / 3145728
Cache size 114 invocation test: 114 / 3145728
Cache size 115 invocation test: 115 / 3145728
Cache size 116 invocation test: 116 / 3145728
Cache size 117 invocation test: 117 / 3145728
Cache size 118 invocation test: 118 / 3145728
Cache size 119 invocation test: 119 / 3145728
Cache size 120 invocation test: 120 / 3145728
Cache size 121 invocation test: 121 / 3145728
Cache size 122 invocation test: 122 / 3145728
Cache size 123 invocation test: 123 / 3145728
Cache size 124 invocation test: 124 / 3145728
Cache size 125 invocation test: 125 / 3145728
Cache size 126 invocation test: 126 / 3145728
Cache size 127 invocation test: 127 / 3145728
Cache size 128 invocation test: 128 / 3145728
Cache size 129 invocation test: 3145728 / 3145728

Results:
  Renderer: Mesa DRI Intel(R) Haswell Mobile
  Calculated vertex cache batch size: 3145728
  Cache size: 128
[/code[
Offline Roquen

JGO Kernel


Medals: 517



« Reply #27 - Posted 2016-09-13 07:48:22 »

http://www.joshbarczak.com/blog/?p=1231
Offline theagentd
« Reply #28 - Posted 2016-09-13 14:29:41 »

Indeed, despite some shader errors here and there, the data gathered is really good. Thanks everyone!

The test has two parts. The first part just uses a massive 0-filled index buffer and draws it, checking how many times the vertex shader is executed. The second part tries to figure out the cache size by trying a bigger and bigger repeated list of indices (0, 1, ..., n, 0, 1, ..., n, 0, 1, ..., n, ........), where n is increased by 1 between each test. At some point, this will start thrashing the cache, as when the number of unique indices is bigger than the cache, it'll have lost vertex 0 by the time the list of indices repeats, causing every single entry in the index buffer to require a new vertex shader execution.


Let's go through the results:

Intel seems to be the most straightforward, and literally the only vertex cache that actually works as expected. The GPU loops through the index list and keeps a 128-entry FIFO vertex cache. When drawing an index buffer of length 3*1024*1024 filled completely with 0s, it only runs the vertex shader once, then never again. When the number of vertices exceeds the vertex cache, thrashing occurs and every single index needed a vertex shader execution, which is exactly what I had predicted based on "public knowledge" of the vertex cache. This is what people optimize meshes for.

Nvidia's solution is more complicated. Even if you render an index buffer filled with 0s, the vertex shader will be executed more than once. What is happening here is that the GPU is splitting up the index buffer into chunks of <num vertices in each primitive>*32, which in the case of triangles is 96. For lines it's 64 and for points it's 32. This is what I call the "batch size" in the test results. There seems to be a different vertex cache for each of the batches, so even if the index buffer contains only zeroes the vertex shader will be executed once per batch. This severely limits the usefulness of the cache, as it greatly increases the chance of having to run a vertex shader multiple times as reuse only works within the same 96-index block. In addition, there is a 32-entry FIFO cache within each block as well, so it's still possible to overflow the cache within each block if it contains more than 32 unique indices. Most likely, this choice was made by Nvidia to allow for more parallelism in hardware, as it allows each 96-index block to be processed completely independently. Intel needs to go through the entire index buffer linearly.
This has major implications on how a mesh should be optimized, as the mesh optimizer needs to be aware of the 96-index blocks to be able to make the best decisions. Otherwise it may assume that a vertex will be reused for two triangles, but the triangles may turn out to be in different 96-index blocks, so the vertex won't be in the cache there.

AMD's technique is...... very weird. It seems similar to Nvidia's solution, but the results don't perfectly match that. The calculated batch size is 387, which is 384+3, which is 32*3*4+3, so the batch size seems to be roughly 4x as big as for Nvidia. That's a pretty uneven number that I really wasn't expecting. Most likely, the actual batch size is 384, with some additional weird behavior in there. As for the actual cache size within each size, it's most likely 16 both for the HD7800 and the KAVERI APU, but the results are again inconsistent. In addition, the results are off by one between the two (8130 vs 8129 invocations). =___= There's definitely something fishy and complicated going on here. To get anything conclusive that would actually be useful information for a mesh optimizer, I'd need to run more tests. I don't really have a guess for why the batch size seems so random, but the the discrepancy for the HD7800 not being completely cache thrashed at 16-23 entries could be explained by the GPU updating the cache is small batches (most likely 8 ) instead of one by one. This would explain why the GPU kiiiinda manages to do at least some caching up to 24 entries. There could also be some ordering weirdness here as well.

We really need to do more testing on AMD hardware. If either ClaasJG, Jono or Abuse have time for it, I'd love it if we could continue testing a bit using IRC or Skype to be able to do some more rapid iterations of the test program. Feel free to either PM me or respond in this thread if any of you are interested!


Thanks a lot for all the help, guys!

Myomyomyo.
Offline EgonOlsen
« Reply #29 - Posted 2016-09-13 14:43:38 »

Radeon HD 290X for the sake of completeness:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
Batch size test invocations: 131072 / 50331648
Calculated vertex cache batch size: 384

Cache size 1 invocation test: 131072 / 50331648
Cache size 2 invocation test: 262144 / 50331648
Cache size 3 invocation test: 393216 / 50331648
Cache size 4 invocation test: 524288 / 50331648
Cache size 5 invocation test: 655360 / 50331648
Cache size 6 invocation test: 786432 / 50331648
Cache size 7 invocation test: 917504 / 50331648
Cache size 8 invocation test: 1048576 / 50331648
Cache size 9 invocation test: 1179648 / 50331648
Cache size 10 invocation test: 1310720 / 50331648
Cache size 11 invocation test: 1441792 / 50331648
Cache size 12 invocation test: 1572864 / 50331648
Cache size 13 invocation test: 1703936 / 50331648
Cache size 14 invocation test: 1835008 / 50331648
Cache size 15 invocation test: 6422528 / 50331648
Cache size 16 invocation test: 11927552 / 50331648
Cache size 17 invocation test: 50331648 / 50331648

Results:
  Renderer: AMD Radeon R9 200 Series
  Calculated vertex cache batch size: 384
  Cache size: 16

Pages: [1] 2
  ignore  |  Print  
 
 

 
EgonOlsen (591 views)
2018-06-10 19:43:48

EgonOlsen (694 views)
2018-06-10 19:43:44

EgonOlsen (478 views)
2018-06-10 19:43:20

DesertCoockie (831 views)
2018-05-13 18:23:11

nelsongames (1026 views)
2018-04-24 18:15:36

nelsongames (1083 views)
2018-04-24 18:14:32

ivj94 (1664 views)
2018-03-24 14:47:39

ivj94 (590 views)
2018-03-24 14:46:31

ivj94 (1501 views)
2018-03-24 14:43:53

Solater (569 views)
2018-03-17 05:04:08
Deployment and Packaging
by philfrei
2018-08-20 02:33:38

Deployment and Packaging
by philfrei
2018-08-20 02:29:55

Deployment and Packaging
by philfrei
2018-08-19 23:56:20

Deployment and Packaging
by philfrei
2018-08-19 23:54:46

Deployment and Packaging
by philfrei
2018-08-19 23:53:08

Deployment and Packaging
by philfrei
2018-08-19 23:50:04

Java Gaming Resources
by philfrei
2017-12-05 19:38:37

Java Gaming Resources
by philfrei
2017-12-05 19:37:39
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!