Some of this Data orientated programing still feels like premature optimization to me. Packing things into floats? Perhaps --*after* profiling.
Also you don't need the JVM to layout objects in memory perfectly. Just pretty well. It may not do that. But from some of my tests it must be mostly doing it. I get too close to optimal performance to be cache thrashing that much.
Packing is mostly useful when you want easy interaction with GL/CL. You don't have a choice there, if you want to do something on the GPU you need to pack your data. If you want your code to look nice and be easy to refactor, you either work on POJOs and do a copy or pack/unpack step (paying the performance and memory penalty), or you use mapped objects.
For normal stuff, DOP is still useful without packing data. A loop on a continuously allocated Vector3f[]? Of course, that will be very fast in Java. Assuming the objects have matured altogether, 4 byte overhead per object, 2 MB cache per CPU core, 128 byte cache lines, you only "waste" 1 cache line every 32 objects compared to doing the same with packed data. Doing the same on an array of objects that look like this?
1 2 3 4 5 6 7 8 9 10
| class Unit { Vector3f localPos; Matrix4f globalTransform; Vector3f globalPos; }
for ( int i = 0; i < units.length; i++ ) { Vector3f pos = units[i].localPos; } |
That's when you have problems.