I did it! I managed to get my GPU skinning working!!! =D My small test featuring Bob the dwarf is now running very nicely!
- Individually animated Bobs! They can be at different animation frames and even entirely different animations too, but I only have one animation to use at the moment...
- Instancing! Each part of Bob (head, helmet, lamp, e.t.c) is drawn with a single OpenGL command no matter how many instances I have.
- Bone interpolation is done on the CPU and uploaded per instance into a VBO. In my vertex shader this VBO is then accessed through a Texture Buffer Object (TBO).
- Instance positions / model matrices are uploaded to a VBO and is marched over per instance using GL33.glVertexAttribDivisor(index, 1).
Sadly this program is still CPU-bottlenecked, with my GPU being able to process around 2.5x the instances my CPU can interpolate bones for. The above screenshot runs with 600 instances of Bob, has 16xQ CSAA (= 8x MSAA + 8 coverage samples) enabled since this does not affect performance due to the CPU bottleneck and runs smoothly at 60-61 FPS. With threading (and less anti-aliasing
) this could be improved to twice the FPS which would enable me to have over 1000 instances of Bob at the same time! I believe the ultimate solution though is OpenCL. That way I can just upload all the animation frame data on startup and interpolate bones for each instance on the GPU. This would offload everything to the GPU and I estimate that it would run at around 120-150 FPS with no CPU load at all.