Time for benchmarks!
I have written six performance benchmarks, that I have implemented in Java 3D, Ardor3D and jMonkeyEngine3. These benchmark tries to test various aspects of the 3D scene graph APIs.
In short, there are benchmarks for dynamic geometry, frustrum culling, rapid add and removal of nodes to the scene graph, picking, state sorting, and transparency sorting
I have tried as best as possible to implement the benchmarks identical in each of the APIs. The implementation uses a base class, which contains all the core functionality for the benchmarks. Such as collection of data, writing to file, increasing number counts etc. (Ardor3DBenchmarkBase.java
, and Jme3BenchmarkBase.java
). For each benchmark, these base classes are extended, so that only benchmark specific code is necessary.
The benchmarks all start at 32 objects, then increase by the power of two. I have normally ran each benchmark up to 16384 or 8192 objects, but lower object counts have been chosen for some of the benchmarks due to problems with some of the APIs (more on this later).
The execution of the benchmarks are separated into two periods: First there is a warmup period, in which no data is collected. This is to give time to initialize. The duration can be specified both in terms of seconds, and frames. Whichever is longest will dictate the duration of the warmup period. For my recordings I used a warmup time of 2 seconds, and 3200 frames (3600/60 = 60). The warmup period is followed by a data collection period. At the end of the data collection period, the data is written to output files.
The benchmarks collects data per frame. The time per frame (tpf) is calculated, spikes in the tpf (here a threshold is used, and a spike is defined as an increase in tpf >= 40%), and the current memory usage (taken from the heap, total minus free). In addition to this, additional statistics are calculated, such as the total time, average values (tpf, fps, memory). It is also registered how many of the frames lie within different tpf groupings (ie. 0.1ms >= x < 0.5ms). Nanoseconds are used when calculating the statistics for precision, however this is converted to milliseconds in the output files (for readability).
Each benchmark produces five output files. One file that is meant to be read by humans, called BenchmarkName_APIName_human_readable_results.txt
, which contains summary information. Then there are four csv files created; one containing general results, which basicly is the human readable output in csv format (BenchmarkName_APIName_general_results.csv
). Second csv file contains the tpf for each frame in milliseconds (BenchmarkName_APIName_tpf_results.csv
). Third file contains the memory for each frame (BenchmarkName_APIName_mem_results.csv
). Last csv file contains the results regarding which tpf sections the frames belonged to (BenchmarkName_APIName_tpfsections_results.csv
). Entries in the csv files are separated by comma (,), and each period (object count) by newline.
Note that it is hard to measure the memory usage in Java. However by monitoring the heap, it is possible to extract some data. When plotting the recorded memory data into a graph, in most cases a pointer towards the memory usage is given by looking at the lowest values recorded. As the graphs increases, it shows the garbage created by the APIs. A sudden drop, would show us what is removed by the garbage collector, which in turn gives us the new base usage.
In order for the code to be identical, I use custom geometry in the benchmarks. The underlying geometry used is a ported version of the gluSphere, using triangle strips (Sphere.java
), with an implementation on top in each of the APIs (Ardor3DSphere.java
, and Jme3Sphere.java
). In addition to the spheres, some of the benchmarks use cubes, where indexed triangles are used (Box.java
, and Jme3Box.java
The benchmarks were run on the following specifications:
- ASUS GeForce GTX 580 1536MB
* Using GeForce 295.73
* Settings are default, except force vsync off
- Intel Core i7 2600K Quad Prosessor 3.4GHz
- Kingston HyperX 8 GB 1600MHz DDR3
- ASUS P8Z68-V PRO, Socket-1155 ATX
- Windows 7 Professional 64-bit, Service Pack 1
- Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
I have imported each of the csv files that is generated from the benchmarks to excel, and created the graphs there.
I have created runnable jars, with a starter GUI so that it is easy for you to replicate the benchmarks on your own. Just run them with java -jar benchmark.jar
- (OLD) ardor3d-runnable-benchmark.zip
The source code can be downloaded here, the necessary libraries can be obtained by downloading the runnable jars linked to above (they are not included with the source):
- jme3-src.zipDynamicGeometry benchmark:
Consists of rotating spheres with triangle counts between 58 and 218 tris. Vertices are dynamicly changed, by shrinking/expanding the radius during runtime. Screenshot can be seen here
- DynamicGeometry.java (Ardor3D)
- DynamicGeometry.java (Java 3D)
- DynamicGeometry.java (jME3)
I ran this benchmark from 32 to 512 spheres. Higher object counts were especially slow in jME3. The tpf in all APIs were low, with best results in jME3 and Ardor3D. There are however some major issues with these two APIs in this benchmark. At fixed intervals their tpf spikes, especially long for jME3. The longest spike in jME3 lasted 21 seconds(!), and this is with 512 spheres. Ardor3D spikes similarily, however several orders of magnitude less than jME3. The longest spike for Ardor3D were at 0.14 seconds. The reason for the spikes are unclear, especially the really long ones. I would initially think this would be time spent creating static lists for example, however the duration is very extreme. Looking at the memory overview for Ardor3D and jME3, there is a correspondence between the spikes, and a drop in the memory garbage created. I am not sure if this is a mere coincidence, or if there is something going on here? This can be invesitgated by looking at the memory overview per frame, and the tpf for each API.
I ran this benchmark from 32 to 8192 spheres. The results show that jME3 is the fastest of the three. Java 3D has the second best tpf, followed lastly by Ardor3D. We see that all APIs has a relatively stable tpf, however there are large spikes at regular time intervals with Ardor3D. The spikes corresponds with drops in the base memory usage, and might be connected to deletion of buffers (something similar to what were the case with jME3?).
Graphs (images):TimePerFrame2048_full_allMemory2048_allMemory2048_partialSpikes_allTotal TimeAvgFPS_partial_allAvgFPS_all
The results can be downloaded here: DynamicGeometry.zip
A cluster of rotating cubes (one cube = 12 triangles) is moved along an elliptic trajectory, that takes it mostly outside of the far plane, and fully behind the near plane. This should effectively force frustrum culling of objects outside of the frustrum. Some screenshots along the path: 0
- Frustrum.java (Ardor3D)
- Frustrum.java (Java 3D)
- Frustrum.java (jME3)
Benchmark were run from 32 to 16384 objects. Results indicate effective culling in all APIs. jME3 is the fastest, followed by Ardor3D, and then Java 3D.
Graphs (images):TimePerFrame16384_full_allMemory16384_allSpikes_allTotal TimeAvgFPS_partial_all
The results can be downloaded here: Frustrum.zip
Cubes (one cube = 12 triangles) are added to, and removed from the scene graph every frame. This is to look at how they handle and scale add/remove operations to the scene graph. Screenshot can be seen here
- NodeStressAddAndRemoval.java (Ardor3D)
- NodeStressAddAndRemoval.java (Java 3D)
- NodeStressAddAndRemoval.java (jME3)
The results show spikes in the tpf for all APIs, but this is expected. Ardor3D had the best tpf, with the smallest spikes. The tpf in jME3 is similar to that of Ardor3D, however it spikes a little higher (~3.5 ms). Java 3D is much slower than both APIs (diff in spikes: ~54ms). The amount of spikes seem to stabilize for Ardor3D and Java 3D at higher object counts, indicating that their algorithms scale well. The number of spikes in jME3 only increases.
Graphs (images):TimePerFrame2048_full_allTimePerFrame2048_partial_allMemory2048_allSpikes_allTotal TimeAvgFPS_partial_allAvgFPS_all
The results can be downloaded here: NodeStressAddAndRemoval.zip
Consists of rotating spheres with triangle counts between 218 and 478 tris. Every frame 81 rays are picked into the scene, with origin at the camera. This is to test how well the APIs handle rapid picking operations. Picking is done at the primitive level. Screenshot can be seen here
- Picking.java (Ardor3D)
- Picking.java (Java 3D)
- Picking.java (jME3)
The results show that Java 3D was the fastest API in this benchmark. It had an avg tpf of 31 ms at 4096 objects, opposed to jME3 who had an avg tpf of 91 ms. Ardor3D were the slowest of all the APIs, with an avg tpf of 176ms.
The results for Ardor3D have been ommitted from this test. There seems to be some serious problems when running picking at the primitive level. At 64 objects it used 20 minutes to complete the benchmark, with an average tpf of 156 ms. No higher object counts were run. It is unclear why it is so slow, looking at the source code it seems to do everything right, with a per bounding volume check first, then for affected objects per primitive checking is done.
Edit: After setting the maxElements for the CollisionTreeManager to appropriate values I were able to run the benchmark for Ardor3D as well, as suggested by renanse. This was due to a default max value of 25 being used prior.
Note: Due to incapabilities with Java 7, I had to run this benchmark in jME3 with the option "-Djava.util.Arrays.useLegacyMergeSort=true". Look at RFE: 6804124 here
Note2: In the output files the number of picks are written, and the valued varies between each of the APIs. I have verified that the actual number of hits are correct, however the counting done in the benchmarks does not take into account additional results hidden in layers beneath.
Graphs (images):TimePerFrame4096_full_allMemory4096_allSpikes_allTotal TimeAvgFPS_partial_allAvgFPS_all
The results can be downloaded here: Picking.zip
Consists of rotating cubes (one cube = 12 triangles). Different states are used for series of cubes, varying between the three main states (lighting, texturing and shaders). For example with 32 cubes: 32 cubes / 3 state types = 10.667. Then the square root of this is used as the number of different state permutations within each state type. As the number of objects increases, the number of different states increases. This tests which APIs have the best state sorting algorithms. The textures used are generated during runtime. The shader programs simply take a uniform color that is set in the fragment shader. Screenshot can be seen here
- StateSort.java (Ardor3D)
- StateSort.java (Java 3D)
- StateSort.java (jME3)
The results show that jME3 is much faster than the other two, with a tpf almost twice as fast as Ardor3D (~45ms opposed to ~80 ms). There is a difference in tpf between Ardor3D and Java 3D of about ~20ms.
Note: Some might claim this an unfair comparison, since jME3 features a fully shader based architecture, which eliminates much of the purpose of this benchmark (sorting of GL calls). Nevertheless I dont think this should be held against jME3, but keep it in mind.
Graphs (images):TimePerFrame16384_full_allMemory16384_allSpikes_allTotal TimeAvgFPS_partial_allAvgFPS_all
The results can be downloaded here: StateSort.zip
Consists of rotating transparent spheres with triangle counts between 58 and 110 tris. This is to test how well the APIs handle sorting of transparent objects. Screenshot can be seen here
- TransparencySort.java (Ardor3D)
- TransparencySort.java (Java 3D)
- TransparencySort.java (jME3)
The results show that jME3 were much faster than the other two. With a difference between jME3 and Ardor3D of ~30ms. The tpf difference between Ardor3D and Java 3D is similar, however Java 3D has a much more unstable tpf, spiking much more.
Graphs (images):TimePerFrame16384_full_allMemory16384_allSpikes_allTotal TimeAvgFPS_partial_allAvgFPS_all
The results can be downloaded here: TransparencySort.zip
Performance benchmarks testing various aspects of 3D scene graph APIs. The ones being tested are Java 3D, Ardor3D and jMonkeyEngine3. The results show that overall jME3 is the fastest API, followed by Ardor3D. The slowest is Java 3D. There is also a much more stable and consistent time per frame in jME3 and Ardor3D compared to Java 3D. Memory usage in jME3 and Ardor3D is much lower, and less garbage is created as opposed to Java 3D. Having mentioned this, it is worth noting that the results are different in some of the benchmarks.
Picking results for Ardor3D were ommitted because it were simply too slow (20 minutes at 64 objects).
Java 3D was the fastest doing picking. Stress testing add/removal operations to the scene graph showed that Ardor3D was the fastest, followed by jME3 and Java 3D. Dynamicly changing the vertices of objects
also cause major headaches especially for jME3, and somewhat
causes some problems for Ardor3D. Where spikes at fixed intervals occured, but they seem to be related to the garbage collection done, which takes place at the same frequency as the spikes
in both APIs, where they lasted up to 20 seconds in jME3 and 0.14 seconds in Ardor3D
I think the results are interesting. The performance difference between jME3 and the others were expected, however personally I would have thought Ardor3D would have performed better. Improvements can be made
if the issues for Ardor3D in the picking benchmark are corrected
, for both Ardor3D and jME3 in the dynamic geometry benchmark.
Due to time constraints I do not have time to implement this in other APIs, but if people are interested it could be done with relative ease. Most of the work lies in the base class, then it is trivial to implement each benchmark.
* Last updated 20.04.2012