Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (710)
Games in Android Showcase (212)
games submitted by our members
Games in WIP (784)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
    Home     Help   Search   Login   Register   
Pages: 1 ... 7 8 [9] 10
 on: 2017-02-17 18:45:58 
Started by Catharsis - Last post by princec
Zulu is the real deal - full speed, Hotspotted JIT for ARMhf. It's fast-as, I've tried it Smiley

Cas Smiley

 on: 2017-02-17 16:40:02 
Started by BurntPizza - Last post by theagentd
Today, I wrote a mesh optimizer in Java to be used on the output of my Blender export plugin.

I'm not too proficient in Python and it's just so goddamn slow at everything, so instead of completely ruining the fast export times I've managed to get so far, I opted for writing a simple optimizer that takes in a model file, optimizes it, then writes it out again. This is done in 3 steps:

1. In the first step, I simply eliminate all duplicated vertices and leave just one of each, then reference the same one with an index buffer. The mesh I output from Blender is mostly unoptimized with lots of duplicated vertices, and this step reduces the vertex count by about 75% percent. This saves a lot of memory on the harddrive and reduces loadtimes a lot, but it actually does not improve performance very much. For example, in one of the submeshes of the giant robots of Robot Farm, the vertex count goes from 38544 --> 9248, a 76% reduction, but the number of vertex shader invocations only drop from 38544 --> 26902, a 30% drop. This leads to the next step...

2. In the next step, the triangles in the mesh is reordered to improve the effectiveness of the hardware vertex cache. As some people might remember, I've done a lot of experiments on the hardware vertex caches of Nvidia, AMD and Intel GPUs, and basically concluded that Nvidia's is f**king weird, AMD's is kinda weird and Intel's is the only sane one. In the end, I concluded that optimizing the meshes for each graphics vendor was too much work, so I simply went with a mainstream algorithm. I ended up implementing Tom Forsyth's simple algorithm, and it works really well. This algorithm simply reorders the triangles in the mesh (by reordering the indices in the index buffer) to increase the chance that vertices are reused as much as possible without specializing it too much to a specific cache size. It's pretty much a general-purpose optimization that should be better than nothing. Anyway, this step further reduced the vertex shader invocation count from 26902 --> 11847, which together with step 1 was a total reduction of 70%! Now, optimally, the number of invocations should be equal to the number of vertices in the mesh, which would mean that there is perfect reuse of vertices and no vertex had to be transformed more than once, but it may not be possible to achieve this. In the end, for this particular mesh the number of invocations is 28% more than optimal, which is acceptable to me.

3. Now, the meshes are optimized for the vertex cache and need a minimal number of vertex shader invocations, but there is one last thing that can be optimized. Inspecting the actual index buffer generated after step 2 reordered the triangles, it's pretty damn random.
..., 6031, 5989, 5991, 5992, 5993, 5991, 6032, 5992, 5994, 5995, 5996, 5994, 6033, 5995, 8387, 8388, 8389, 8387, 8409, 8388, 8390, 8391, 8392, 8390...
There's a lot of good reuse happening here, but also very huge jumps in which vertex each triangle uses. It essentially reads a completely random vertex from the vertex buffer each execution. This is bad for the GPU, as the vertex data from the VBO is read in with a certain cache line size depending on the GPU. In short, if you reference vertex 1000 and 1001, there's a big chance that the memory read for the data of vertex 1000 will cause the data for vertex 1001 to also be loaded into the cache. Therefore, it is a good idea to try to minimize the "distance" between each subsequent index in the index buffer to increase the chance of the input data of the vertex shader already being in the cache. I did this by simply reordering the vertices in the data to so that they are in the order that the index buffer references them. This does not change the triangle order; it simply changes where the vertices of each triangle lies in memory. Here's an excerpt from the output index buffer:
..., 2571, 2565, 2571, 2568, 2566, 2567, 2572, 2571, 2566, 2572, 2572, 2567, 2573, 2571, 2574, 2568, 2569, 2568, 2574, 2569, 2575, 2570, 2569, 2574, 2575, 2575, 2576, 2570, 2571, 2572, ...
Much better! This works super well with the optimized triangle order from step 2, as the algorithm tries to place all uses of a given vertex in one place so that it can be reused, so there's a very low risk of a very old vertex being referenced again. To measure the effectiveness of this step, I measured the average "index distance", which is the average
abs(currentIndex - previousIndex)
over the entire index buffer. The average index distance went from 42.82251 --> 5.490958, a very significant improvement to cache coherency!

All in all, here are the final results for the mesh:
Eliminating duplicate vertices...
Optimizing for post-transform vertex cache...
Optimizing for pre-transform vertex cache...
Vertex count: 38544 --> 9248 (76.00664% reduction)
Invocations: 38544 --> 26902 --> 11847 (69.2637% reduction, 28.103378% more than optimal)
Average index distance: 42.82251 --> 5.490958

Placing 500 of those high-poly robots with no LOD or anything in a test program, my FPS went from 12 to 32 when I switched to the optimized mesh, a ~62.5% reduction in frame time, which matches very well with the ~~70% invocation reduction when you add postprocessing and everything else taking up a bit of time in the first place.

 on: 2017-02-17 16:25:46 
Started by Optimo - Last post by Optimo
Try the user's home folder.
Like try writing to there? I do presently (attempt to) write to a folder that is in with the game's installation in Program Files, so I could see that being an issue if another folder doesn't mind being written to.

 on: 2017-02-17 16:15:14 
Started by Optimo - Last post by 65K
Try the user's home folder.

 on: 2017-02-17 16:08:23 
Started by Optimo - Last post by Optimo
I'm nearing completion of a distribution-worthy copy of my game. I can install it on my machine (Windows 7) and another machine (Windows 10) and run it successfully, however, I appear to need read/write privileges.

The only way my game will run properly is if I run as administrator. If not, I can never save anything from the game. I would think I should be able to run the game and save things without running as administrator. Is that wrong? Can anyone advise on the appropriate way to tackle this issue? I can give more information if needed.

 on: 2017-02-17 15:06:35 
Started by Catharsis - Last post by kappa
The main thing would be if it supports the Hotspots JIT on ARM.

There is the OpenJDK 9 Android Port but seems like it only supports the Zero Interpreter on ARM. Not sure if the speed hit will make it unusable but might get round some of ART specific issues raised by Spasi. They also have a similar port to iOS.

 on: 2017-02-17 14:58:12 
Started by Catharsis - Last post by princec
Zulu basically runs atop Linux. It's armhf native code, which means with the tiniest bit of glue it should be invokable as a standard embedded JVM using JNI and the Android NDK. Presumably headless.

Cas Smiley

 on: 2017-02-17 14:47:23 
Started by Catharsis - Last post by kappa
Licensing wank :/
Ah right, looking at the pricing for the licence, it seems pretty expensive and not worth it for distributing games even if it did work on android.

Zulu Embedded does look nice, but can't find any information about it running on Android.

 on: 2017-02-17 12:31:40 
Started by Catharsis - Last post by princec
Licensing wank :/

Hence Zulu, which is nice GPL2 code and probably a lot less bother in general.

Cas Smiley

 on: 2017-02-17 12:20:45 
Started by Catharsis - Last post by kappa
What about OpenJDK (Zulu) on Android?

Cas Smiley
or what about Java SE Embedded, they already have ARM builds, if it can be made to run on Android devices, performance should be much better than ART's.

Pages: 1 ... 7 8 [9] 10
numerical (32 views)
2017-02-21 07:32:16

numerical (32 views)
2017-02-21 07:31:46

theagentd (140 views)
2017-02-18 13:42:33

theagentd (143 views)
2017-02-18 13:35:16

h.pernpeintner (1306 views)
2017-01-24 22:39:11

h.pernpeintner (1294 views)
2017-01-24 22:38:32

Galdo (1854 views)
2017-01-12 13:44:09

Archive (1951 views)
2017-01-02 05:31:41

0AndrewShepherd0 (2490 views)
2016-12-16 03:58:39

0AndrewShepherd0 (2305 views)
2016-12-15 21:50:57
List of Learning Resources
by elect
2016-09-09 09:47:55

List of Learning Resources
by elect
2016-09-08 09:47:20

List of Learning Resources
by elect
2016-09-08 09:46:51

List of Learning Resources
by elect
2016-09-08 09:46:27

List of Learning Resources
by elect
2016-09-08 09:45:41

List of Learning Resources
by elect
2016-09-08 08:39:20

List of Learning Resources
by elect
2016-09-08 08:38:19

Rendering resources
by Roquen
2016-08-08 05:55:21 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!