Java-Gaming.org    
Featured games (91)
games approved by the League of Dukes
Games in Showcase (580)
games submitted by our members
Games in WIP (500)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Instancing with skeletal animation  (Read 4353 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Posted 2012-02-09 16:17:43 »

I did it! I managed to get my GPU skinning working!!! =D My small test featuring Bob the dwarf is now running very nicely!



Features:
 - Individually animated Bobs! They can be at different animation frames and even entirely different animations too, but I only have one animation to use at the moment...
 - Instancing! Each part of Bob (head, helmet, lamp, e.t.c) is drawn with a single OpenGL command no matter how many instances I have.
 - Bone interpolation is done on the CPU and uploaded per instance into a VBO. In my vertex shader this VBO is then accessed through a Texture Buffer Object (TBO).
 - Instance positions / model matrices are uploaded to a VBO and is marched over per instance using GL33.glVertexAttribDivisor(index, 1).

Sadly this program is still CPU-bottlenecked, with my GPU being able to process around 2.5x the instances my CPU can interpolate bones for. The above screenshot runs with 600 instances of Bob, has 16xQ CSAA (= 8x MSAA + 8 coverage samples) enabled since this does not affect performance due to the CPU bottleneck and runs smoothly at 60-61 FPS. With threading (and less anti-aliasing  Wink) this could be improved to twice the FPS which would enable me to have over 1000 instances of Bob at the same time! I believe the ultimate solution though is OpenCL. That way I can just upload all the animation frame data on startup and interpolate bones for each instance on the GPU. This would offload everything to the GPU and I estimate that it would run at around 120-150 FPS with no CPU load at all.  Cool

Myomyomyo.
Offline ra4king

JGO Kernel


Medals: 322
Projects: 2
Exp: 4 years


I'm the King!


« Reply #1 - Posted 2012-02-11 00:06:11 »

I wonder what the performance would be on my GTX 580 hmmmmmm...... Wink

Online Danny02
« Reply #2 - Posted 2012-02-11 00:32:27 »

couldn't you somehow update it with an vertex outputstream, so you won't need opencl
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #3 - Posted 2012-02-11 03:38:50 »

couldn't you somehow update it with an vertex outputstream, so you won't need opencl
Probably over 9000.

couldn't you somehow update it with an vertex outputstream, so you won't need opencl
You make it sound like OpenCL is something bad. xD

Myomyomyo.
Offline kaffiene
« Reply #4 - Posted 2012-02-11 07:00:17 »

Brilliant work!  Well done!

Are we going to see this turn into a cool game at some point?
Offline theagentd
« Reply #5 - Posted 2012-02-11 10:24:42 »

Brilliant work!  Well done!

Are we going to see this turn into a cool game at some point?
I've been working on the same game for half a year now. My answer is "Yes, it will". It's an RTS, but I won't show any screenshots or anything since I'm not 100% sure. I won't announce any specific information about it since I don't like the pressure of having said "I'm gonna release this game in x months"...

Myomyomyo.
Offline Roquen
« Reply #6 - Posted 2012-02-11 11:18:14 »

I can't think of a good reason for using OpenCL vs. a shader in this instance.  I only see downsides.
Online Danny02
« Reply #7 - Posted 2012-02-11 12:34:46 »

nah opencl is probably great, but you already are very familiar with shaders and the opengl pipeline.
Why learn something new when you can do it with something you already have, you probably won't have the infrastructure ready in your codebase for opencl also.

ps: I don't want to say something against learning new things of course^^. Just thought of a development processing view.
Offline theagentd
« Reply #8 - Posted 2012-02-11 13:54:36 »

I think OpenCL is better than OpenGL for this, if only because it makes a lot more sense to read frame data from a buffer to fill another buffer with the per instance data instead of emulating the whole process with shaders, texture objects and transform feedback. OpenCL is meant for general purpose computing (= bone interpolation in my book), OpenGL is meant for graphics (= skinning).

Myomyomyo.
Offline theagentd
« Reply #9 - Posted 2012-02-13 08:27:58 »

Hehe, I just switched to a better slerp function which uses a threshold to avoid expensive trigonometric functions if the interpolated angle is too small and got a 3-4x speed in CPU performance. xD Now the CPU and GPU are almost equally busy, but now it's almost impossible to not be fragment limited. Bone interpolation (CPU) and skinning (GPU) performance is at 2 000 instances at 60 FPS, but if they are going to actually cover more than a pixel or so per instance (or if I want MSAA) I'll have to reduce the number of instances to around 1 500. Anyway, the point is that I've pretty much maxed out the performance gain from instancing. I'm pushing 2 million triangles per frame with skinning and I haven't even done any heavy optimizations yet. Well, I guess I won't be needing OpenCL for a while then... Off to actually being able to load other 3D models than Bob! xD

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Roquen
« Reply #10 - Posted 2012-02-13 12:47:29 »

SLERP that uses trig functions should only be used if the end points are changing each frame (if then).  What method are you using?  Bisection is very fast and has little error.
Offline theagentd
« Reply #11 - Posted 2012-02-13 13:16:27 »

SLERP that uses trig functions should only be used if the end points are changing each frame (if then).  What method are you using?  Bisection is very fast and has little error.
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
   private static final float DOT_THRESHOLD = 0.99975f;

   private static void slerp(Quaternion q0, Quaternion q1,
         Quaternion resultOrientation, float t) {
      float dot = Quaternion.dot(q0, q1);

      float scale0 = 1 - t;
      float scale1 = t;
      if (dot < DOT_THRESHOLD) {
         double theta = Math.acos(dot);
         double invSinTheta = 1f / Math.sin(theta);
         scale0 = (float) (Math.sin((1 - t) * theta) * invSinTheta);
         scale1 = (float) (Math.sin(t * theta) * invSinTheta);
      }

      float x = (scale0 * q0.x) + (scale1 * q1.x);
      float y = (scale0 * q0.y) + (scale1 * q1.y);
      float z = (scale0 * q0.z) + (scale1 * q1.z);
      float w = (scale0 * q0.w) + (scale1 * q1.w);
      resultOrientation.set(x, y, z, w);
      resultOrientation.normalise();
   }

I can't say I understand exactly how quaternions work, but I do understand the theory of slerp and how it interpolates along the surface of a sphere... Like I said, this is lighting fast, so I don't see any need to optimize this further at the moment... xd

Myomyomyo.
Offline Roquen
« Reply #12 - Posted 2012-02-13 13:49:59 »

Ouch.  If that's the fast version, I'd hate to see what the slow version looks like.  Wink
Offline theagentd
« Reply #13 - Posted 2012-02-13 13:55:49 »

Let's just say that the calculation of theta and invSinTheta was outside the if-statement. >_> So what's so bad about this one then? Do you know an even faster one?

Myomyomyo.
Offline pitbuller
« Reply #14 - Posted 2012-02-13 14:52:24 »

Let's just say that the calculation of theta and invSinTheta was outside the if-statement. >_> So what's so bad about this one then? Do you know an even faster one?
If you want make that faster you could try using SIN/COS lookup tables. Riven has done great work with those. Accuracy should be enough.
Offline theagentd
« Reply #15 - Posted 2012-02-13 15:51:14 »

Let's just say that the calculation of theta and invSinTheta was outside the if-statement. >_> So what's so bad about this one then? Do you know an even faster one?
If you want make that faster you could try using SIN/COS lookup tables. Riven has done great work with those. Accuracy should be enough.
Well, that's a little bit too low level at the moment. I'll add it if I need it later.

Myomyomyo.
Offline jezek2
« Reply #16 - Posted 2012-02-13 16:38:51 »

About the SLERP, I once stumbled upon this article: Understanding Slerp, Then Not Using It

Also here is a post explaining it's (non-)usage for skeleton animations.
Offline Roquen
« Reply #17 - Posted 2012-02-13 17:09:48 »

So what's so bad about this one then? Do you know an even faster one?

The trig and inverse trig functions aren't strictly needed. There are tons of possible implementations. The problem, if you will, with the fastest versions is that they require pre-computation (so multiple usages of starting & end points + auxiliary data) and/or some added constraints (like max angle between end points, only forward moving 't' and/or fixed step 't').  I'm assuming that you don't want to bother with any of that.  I do have a really old untested version without any constraints that I could pull out and test.

WRT: trig look-up table...the problem is the relative error is huge for small angles and we're mostly interested in small angle. (Well not really, but that's the way most animation data works out in practice.)

About the SLERP, I once stumbled upon this article: Understanding Slerp, Then Not Using It

Man, why do people insist on making easy stuff hard.  SLERP (as a primitive) is freaking awesome.

One easy thing you could do is lose the normalization. The resultant quaternion will be very near one, so that can be replaced by a single step of some renormalization method (like Newton/Ralphson).  So you'll trade a sqrt & divide for a couple of multiplies.
Offline pitbuller
« Reply #18 - Posted 2012-02-13 21:22:09 »

http://code.google.com/p/libgdx/source/browse/trunk/gdx/src/com/badlogic/gdx/math/MathUtils.java

With this you can use LUT just like you use normal sin and cos.

Ps. Just in case if other methods fails.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (50 views)
2014-04-15 18:08:23

BurntPizza (46 views)
2014-04-15 03:46:01

UprightPath (62 views)
2014-04-14 17:39:50

UprightPath (44 views)
2014-04-14 17:35:47

Porlus (61 views)
2014-04-14 15:48:38

tom_mai78101 (84 views)
2014-04-10 04:04:31

BurntPizza (142 views)
2014-04-08 23:06:04

tom_mai78101 (242 views)
2014-04-05 13:34:39

trollwarrior1 (201 views)
2014-04-04 12:06:45

CJLetsGame (208 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!