Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (808)
Games in Android Showcase (239)
games submitted by our members
Games in WIP (872)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 ... 8 9 [10] 11 12 13
  ignore  |  Print  
  Java OpenGL Math Library (JOML)  (Read 219238 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Reply #270 - Posted 2015-08-18 11:19:45 »

I did some accuracy tests of different slerp() implementations. All tests run at double precision.

q1 and q2 generated as
1  
2  
         q0.set(r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1).normalize();
         q1.set(r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1).normalize();


Reference implementation: recursive nlerp with fixed 128 iterations. Should be numerically correct. Way more precision than doubles can handle.

JOML: latest implementation from GitHub.
LibGDX: LibGDX (possibly outdated implementation) converted to use doubles.
nlerp: fixed(?) nlerp from JOML (note the "t1=" line).
Recursive nlerp: while(abs(dot) > 0.5) replace q1 or q2 with nlerp(q1, q2, 0.5) depending on alpha and update alpha. Finally nlerp(q1, q2, alpha).

Error is calculating as toDegrees(acos(abs(referenceSlerpResult.dot(algorithmSlerpResult)))), e.g. how many degrees the slerp result differs from the reference slerp implementation's result.

ImplementationAverage error (degrees)Maximum error (degrees)Note
JOML5.190123758213539E-73.1945284701301985E-6Best accuracy
GDX0.316325459193656712.920452042561234High dot threshold, doesn't normalize when using nlerp, hence inflated error
nlerp1.10611194589098454.074459794082495Surprisingly low error
Recursive nlerp0.23988281079099011.1169005730338715Average lerps: 1.6144019

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 820



« Reply #271 - Posted 2015-08-18 11:47:17 »

Does that not mean that the nlerp of JOML is flawed...
Yes, I guess so. It did not swap the sign of the dot product. I fixed the nlerp implementation by taking the implementation from the article http://fabiensanglard.net/doom3_documentation/37725-293747_293747.pdf mentioned by @Spasi.
Offline Roquen

JGO Kernel


Medals: 518



« Reply #272 - Posted 2015-08-18 11:50:26 »

As implemented (not the math):
slerp( a,b) = slerp( a, -b) = r
slerp(-a,b) = slerp(-a,-b) = -r

the results r and -r represent the same rotation.

dot(a,b) = dot(-a,-b) = d
dot(a,-b) = dot(-a,b) = -d

if the dot product is negative you're really asking for the largest angle about the implied between the two input, but implementation swap it the to smallest automagically.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Roquen

JGO Kernel


Medals: 518



« Reply #273 - Posted 2015-08-18 12:45:31 »

Note the error is 3D is going to be 2*acos(abs(dot(ref,approx))).

Let me try to answer your question in a different way.  Most slerps do this (implicitly):

slerp(a,b,t) { d=dot(a, b); if (d<0) {d=-d; b=-b;} XXXX }

We have input a & b and assuming b != a && b != -a, then there is a plane that contains a,b and the origin.  We define 'a' to be our reference direction so we turn the plane so it points straight to the right (becomes the X axis).  We now define the Y axis to be the direction orthogonal to X that contains b.  This means that 'b' is by definition above X (is positive in Y).  If 'b' were in the second quadrant (say 3pi/4), we'd get a negative dot product which means it's the longer path...-b is closer to X (-pi/4).  So the entire domain is on [0,pi/2].  Negating 'b' will give us the opposite direction for Y in the example and in that plane it's angle (with respect to X) is pi/4.  Flipping the plane about it's X axis and looking at the opposite side.

Or another way from the example.  Negate a, now our new 'a' is point straight to the left.  We flip about the Y axis and it's point to the right again and 'b is in the first quadrant.

In this example in 3D the angle is pi/2 or 90 degrees, so if the dot is negative we're asking the math for a 270 degree rotation.
 





Offline theagentd
« Reply #274 - Posted 2015-08-18 13:08:48 »

Benchmark results! Measured in operations per second.


JOML: 10 140 633 (max error: 0.000002 degrees)
GDX:    5 010 566 (max error: 12.9 degrees)
nlerp:  77 098 823 (max error: 4.1 degrees)
rlrp50: 45 055 037 (max error: 1.1 degrees, threshold: 0.50)
rlrp80: 31 719 745 (max error: 0.25 degrees, threshold: 0.80)
rlrp90: 28 103 211 (max error: 0.085 degrees, threshold: 0.90)
rlrp95: 22 463 962 (max error: 0.029 degrees, threshold: 0.95)
rlrp99: 16 707 044 (max error: 0.0026 degrees, threshold: 0.99)

Myomyomyo.
Offline princec

« JGO Spiffy Duke »


Medals: 1147
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #275 - Posted 2015-08-18 13:17:21 »

Solution: provide all of them (except gdx), document accuracy vs speed, allow user to select according to requirements.

Cas Smiley

Offline theagentd
« Reply #276 - Posted 2015-08-18 13:18:50 »

Low quality: nlerp
Medium/variable quality: recursive nlerp
High quality: slerp

Myomyomyo.
Offline princec

« JGO Spiffy Duke »


Medals: 1147
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #277 - Posted 2015-08-18 13:24:34 »

btw I love Roquen's detailing of stuff, and always stare intently at all of his cryptic messages... though I barely understand one symbol in 10. I stopped "doing" maths at around age 16 - it's totally over my head. He is a treasure trove of computer science knowledge. We must pickle his brain before it is too late and extract what we can.

Cas Smiley

Offline KaiHH

JGO Kernel


Medals: 820



« Reply #278 - Posted 2015-08-18 13:48:28 »

In this example in 3D the angle is pi/2 or 90 degrees, so if the dot is negative we're asking the math for a 270 degree rotation...
Awesome intuitive explanation, Roquen! Smiley
But there is one thing that I still do not quite understand, namely the fact that the maximum rotation angle is only PI/2. My current intuition tells me that it should be PI.
Because:...
If a quaternion were to actually represent a rotation about let's say 110 degrees. First of all, this is possible, right? I mean, I can define a quaternion (possibly from a rotation matrix) that does represent a rotation about the Z axis of 110 degrees.
Now, my second quaternion is the identity and thus does not represent any rotation. How would it be then that I can just have 110 - 90 = 20 degrees of rotation in slerp? I mean, I would still have to interpolate a rotation about the whole 110 degrees arc. Or don't I?

Solution: provide all of them (except gdx), document accuracy vs speed, allow user to select according to requirements.
Yes, that's what would be reasonable. I have however the impression that people just don't read JavaDocs. Smiley
So, I would name the method accordingly long and print on sysout and syserr for the first time a warning, whether the user really wanted to call this method. Smiley
But yes, people should first express the need for approximation functions, then we can add them.

Benchmark results! Measured in operations per second.
Since you provided some number of benchmarks with different threshold values of the dot product to compromise speed vs. accuracy, would it be good for you to have an additional overload (or differently named) slerp that provides that threshold as parameter? I think that was initially proposed by @Roquen earlier.

EDIT: Okay, I just implemented the overload of slerp() taking the "threshold" parameter of the dot product between 'this' and 'target' below which the method performs non-spherical linear interpolation as per nlerp():
1  
Quaterniond slerp(Quaterniond target, double alpha, double nlerpDotThreshold, Quaterniond dest)

The "old" existing method without that parameter uses the previous 1E-6 threshold, and makes it clear in the JavaDocs.
This now feels alot better, because it is made explicit that the method does this anyways and gives the user a lever to control accuracy vs. performance.
Offline theagentd
« Reply #279 - Posted 2015-08-18 15:50:30 »

Benchmark results! Measured in operations per second.
Since you provided some number of benchmarks with different threshold values of the dot product to compromise speed vs. accuracy, would it be good for you to have an additional overload (or differently named) slerp that provides that threshold as parameter? I think that was initially proposed by @Roquen earlier.

EDIT: Okay, I just implemented the overload of slerp() taking the "threshold" parameter of the dot product between 'this' and 'target' below which the method performs non-spherical linear interpolation as per nlerp():
1  
Quaterniond slerp(Quaterniond target, double alpha, double nlerpDotThreshold, Quaterniond dest)

The "old" existing method without that parameter uses the previous 1E-6 threshold, and makes it clear in the JavaDocs.
This now feels alot better, because it is made explicit that the method does this anyways and gives the user a lever to control accuracy vs. performance.

You are mistaken. I did not test the JOML slerp method with different thresholds. I tested my own recursive nlerp function which nlerps the source quaternions until they are above a certain dot treshold, at which point a simple nlerp is done between them. Each iteration halves the angle between the quaternions, so it can do it pretty quickly, averaging 1.5 to 4 nlerps per call for sub-degree precision.

Modifying the threshold of JOML's slerp is not a good idea, as the accuracy test of LibGDX's version shows. Not normalizing when falling back to lerp has a massive precision impact.

 - You should keep the original slerp with a fixed threshold, as it is a good reference implementation with excellent accuracy. Reducing the threshold without normalizing the result has a big precision impact, and normalizing the result costs some extra cycles.

 - For nlerp(), it's good to have an overloaded version that accepts a precomputed dot-product, as that value is often available when working with nlerp().

 - Add rnlerp(), recursive nlerp(), which takes in threshold value and refines the result until the final nlerp is done between two quaternions over the given threshold.

Source code:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
    
    public Quaternion rlerp(Quaternion q, double alpha, double dotThreshold){
       
       //Both these are eliminated by escape analysis
       Quaternion q1 = new Quaternion().set(this); //This is faster than using <this> as temporary variable
       Quaternion q2 = new Quaternion().set(q);
       
       double dot = q1.dot(q2);
       
       while(dot < dotThreshold && dot > -dotThreshold){
          if(alpha < 0.5){
             q2.nlerp(q1, 0.5, dot);
             alpha = alpha * 2;
          }else{
             q1.nlerp(q2, 0.5, dot);
             alpha = alpha * 2 - 1;
          }

           dot = q1.dot(q2);
       }
       
       return set(q1).nlerp(q2, alpha, dot);
    }

You could replace the temporary quaternions with local variables. Using this as a temporary variable is slower as they're writes to RAM and can't be moved to registers, and we can't (and shouldn't) modify the input quaternion. This one generates no garbage as escape analysis replaces them with stack variables.

EDIT:
This recursive nlerp function is faster than slerp() up to threshold 0.99992, at which point it has almost exactly the same precision as slerp():

1  
2  
slerp(): average = 2.057235254503539E-7, max = 2.4148365394514667E-6
rlerp(): average = 3.688761597702983E-7, max = 2.4148365394514667E-6


I vote for completely replacing slerp() with rlerp(), as it is just a more tweakable version of slerp().

EDIT 2:
In addition, those performance values are when using completely random quaternions. As you can see from the data from WSW, almost all interpolation is done between dot>0.99, in which case rlerp() is just nlerp(). With less random quaternions rlerp() is 2.5x faster than slerp() while providing the same max error.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline KaiHH

JGO Kernel


Medals: 820



« Reply #280 - Posted 2015-08-18 16:58:04 »

Okay, then. That's fine by me.

One thing, though. I tried your nlerpRecursive with `this` = (x=0, y=0, z=0, w=1) and `q` = (0, 1, 0, -4.371E-8).
The -4.371E-8 is due to inaccuracies when building a rotation of 180 degrees around the Y axis, like so: `Quaternionf q = new Quaternionf().rotateY((float) Math.PI)`
This resulted in the normalization factor `s` (that `1.0/sqrt(...)` in nlerp) to become NaN in the first iteration of your while loop and then the whole result became NaN.


EDIT: Ah, it was an error on my side when translating and inlining that.
Now, the final version is:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
    public Quaternionf nlerpIterative(Quaternionf q, float alpha, float dotThreshold, Quaternionf dest) {
        float q1x = x, q1y = y, q1z = z, q1w = w;
        float q2x = q.x, q2y = q.y, q2z = q.z, q2w = q.w;
        float dot = q1x * q2x + q1y * q2y + q1z * q2z + q1w * q2w;
        float alphaN = alpha;
        while (Math.abs(dot) < dotThreshold) {
            float scale0 = 0.5f;
            float scale1 = dot >= 0.0f ? alphaN : -alphaN;
            if (alphaN < 0.5f) {
                q2x = scale0 * q2x + scale1 * q1x;
                q2y = scale0 * q2y + scale1 * q1y;
                q2z = scale0 * q2z + scale1 * q1z;
                q2w = scale0 * q2w + scale1 * q1w;
                float s = (float) (1.0 / Math.sqrt(q2x * q2x + q2y * q2y + q2z * q2z + q2w * q2w));
                q2x *= s;
                q2y *= s;
                q2z *= s;
                q2w *= s;
                alphaN = alphaN * 2.0f;
            } else {
                q1x = scale0 * q1x + scale1 * q2x;
                q1y = scale0 * q1y + scale1 * q2y;
                q1z = scale0 * q1z + scale1 * q2z;
                q1w = scale0 * q1w + scale1 * q2w;
                float s = (float) (1.0 / Math.sqrt(q1x * q1x + q1y * q1y + q1z * q1z + q1w * q1w));
                q1x *= s;
                q1y *= s;
                q1z *= s;
                q1w *= s;
                alphaN = alphaN * 2.0f - 1.0f;
            }
            dot = q1x * q2x + q1y * q2y + q1z * q2z + q1w * q2w;
        }
        float scale0 = 1.0f - alphaN;
        float scale1 = dot >= 0.0f ? alphaN : -alphaN;
        dest.x = scale0 * q1x + scale1 * q2x;
        dest.y = scale0 * q1y + scale1 * q2y;
        dest.z = scale0 * q1z + scale1 * q2z;
        dest.w = scale0 * q1w + scale1 * q2w;
        float s = (float) (1.0 / Math.sqrt(dest.x * dest.x + dest.y * dest.y + dest.z * dest.z + dest.w * dest.w));
        dest.x *= s;
        dest.y *= s;
        dest.z *= s;
        dest.w *= s;
        return dest;
    }

Seems to work.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #281 - Posted 2015-08-18 17:48:58 »

What about this optimisation:
https://en.m.wikipedia.org/wiki/Fast_inverse_square_root

Would it be applicable, or would it affect accuracy too much?
Maybe it is not even faster in HotSpot and/or on modern CPUs.



As another note on performance: I wouldn't write into each field of dest twice. Only write the final 4 values into the object. You can use localvars to allow HotSpot to be more aggressive.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Roquen

JGO Kernel


Medals: 518



« Reply #282 - Posted 2015-08-18 20:15:08 »

Quote
But there is one thing that I still do not quite understand, namely the fact that the maximum rotation angle is only PI/2. My current intuition tells me that it should be PI.
Your intuition is correct.  The missing link is half the angle going into quaternion space and double it coming out:  an angle of Pi/2 in quaternion space translates into Pi in 3D, like you expected.

On fast 1/sqrt.  Hardware can do this faster, but java doesn't give access to it.  lerp always is too small except at the endpoints.  Worst case at t=.5.  Newton's method with reasonable guess.

Lerp and nlerp have the same angular error.  You can have to change how you measure it.  BUT likewise whatever takes the output likewise needs to be aware of non-unit input.  Let's leave this for another day.
Offline theagentd
« Reply #283 - Posted 2015-08-19 06:02:10 »

@KaiHH
Your implementation has a bug. Line 8 should be:
1  
            float scale1 = dot >= 0.0f ? 0.5 : -0.5;

Myomyomyo.
Offline theagentd
« Reply #284 - Posted 2015-08-19 07:04:52 »

Optimizations:

1.
In the float versions of classes during normalization, you often do this:
1  
float inverseLength = (float)(1.0 / Math.sqrt(...));

As far as I know, the only operation that is actually slower on doubles than floats is division, so
1  
float inverseLength = 1.0f / (float)Math.sqrt(...);

should be faster. HotSpot might even be able to compute sqrt() at float precision, but don't quote me on that.

2.
I noticed that for dots extremely close to 1.0 or -1.0, slerp() was actually 50% faster than both nlerp and rlerp due to not doing a normalization. Since this is a pretty common case in skeleton animation (a bone that simply isn't animated), it seems worth to optimize for. However, it is pretty pointless to even lerp at such small angles, so simply returning itself is both faster and has almost no error in the first place:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
    public Quaternion nlerpIterative(Quaternion q, double alpha, double dotThreshold, Quaternion dest) {
        double q1x = x, q1y = y, q1z = z, q1w = w;
        double q2x = q.x, q2y = q.y, q2z = q.z, q2w = q.w;
        double dot = q1x * q2x + q1y * q2y + q1z * q2z + q1w * q2w;
       
        if(1 - 1E-6 < Math.abs(dot)){
           return dest.set(this);
        }

        ...


When slerping between two (almost) equal quaternions, this results in 35% higher performance than slerp() and 100% higher performance than nlerp and iterative nlerp without this check.


3.
Field writes are more expensive than writes to local variables. At the end of nlerp() and nlerpIterative() this is very slightly faster:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
        /*dest.x = scale0 * q1x + scale1 * q2x;
        dest.y = scale0 * q1y + scale1 * q2y;
        dest.z = scale0 * q1z + scale1 * q2z;
        dest.w = scale0 * q1w + scale1 * q2w;
        double s = 1.0 / Math.sqrt(dest.x * dest.x + dest.y * dest.y + dest.z * dest.z + dest.w * dest.w);
        dest.x *= s;
        dest.y *= s;
        dest.z *= s;
        dest.w *= s;
        return dest;*/


        double x = scale0 * q1x + scale1 * q2x;
        double y = scale0 * q1y + scale1 * q2y;
        double z = scale0 * q1z + scale1 * q2z;
        double w = scale0 * q1w + scale1 * q2w;
        double s = 1.0 / Math.sqrt(x*x + y*y + z*z + w*w);
        dest.x = x * s;
        dest.y = y * s;
        dest.z = z * s;
        dest.w = w * s;
        return dest;



Benchmarks:

Completely random quaternions
quaternion.set(r.nextDouble()*2+1, r.nextDouble()*2+1, r.nextDouble()*2+1, r.nextDouble()*2+1).normalize();


  nlerp:  84 664 566
  slerp:  10 610 483
  rlrp50: 51 548 933 (threshold: 0.50)
  rlrp80: 36 185 362 (threshold: 0.80)
  rlrp90: 31 419 572 (threshold: 0.90)
  rlrp95: 24 704 936 (threshold: 0.95)
  rlrp99: 10 869 289 (threshold: 0.99992)

Conclusion: iterative nlerp is more tweakable than slerp and equally fast for the same error.


Not so random quaternions
quaternion.set(r.nextDouble()+5, r.nextDouble()+5, r.nextDouble()+5, r.nextDouble()+5).normalize();


  nlerp:  110 896 134
  slerp:  13 177 334
  rlrp50: 105 594 541 (threshold: 0.50)
  rlrp80: 106 932 808 (threshold: 0.80)
  rlrp90: 106 247 803 (threshold: 0.90)
  rlrp95: 106 819 391 (threshold: 0.95)
  rlrp99: 23 367 739 (threshold: 0.99992)

Conclusion: iterative nlerp is almost as fast as nlerp for most thresholds, and still 75% faster than slerp for similar error.


Identical quaternions
quaternion.set(1, 1, 1, 1).normalize();


nlerp:  120 392 068
slerp:  162 841 905
rlrp50: 217 787 085 (threshold: 0.50)
rlrp80: 219 101 292 (threshold: 0.80)
rlrp90: 219 481 200 (threshold: 0.90)
rlrp95: 219 058 424 (threshold: 0.95)
rlrp99: 219 265 376 (threshold: 0.99992)

Conclusion: iterative nlerp is 30% faster than slerp() for a negligable increase in error.


Will do performance tests in WSW when I get home tomorrow.

Myomyomyo.
Offline Roquen

JGO Kernel


Medals: 518



« Reply #285 - Posted 2015-08-19 10:04:59 »

@Riven - Ya know?  My comment about 1/sqrt is BS.  I'm assuming that rawBits and reverse are still slow.  That may not be the case.  Haven't looked in a long time.  If they're both intrinsics and produce sane results..that'd be the way to go (at least if you need to deal with the larger end of the range).

HotSpot does (or has) produces sqrtss (single sqrt) instructions..don't recall what patterns it matches.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
  /**
   * Given the dot product between two quaternions 'd' which
   * are going to be lerped, find the 't' of the maximum
   * abs angular error and the error.
   * <p>
   * The error is symmetric about t=.5, returns results from [0,.5]
   * <p>
   * r[0] = expected t of max error
   * r[1] = expected max abs error
   * <p>
   * r[0] is on 1/2-[1/(2 sqrt(3)), 1/2 sqrt(4/pi-1),  ] ~= [.211325, .238638]
   * r[0] = ~.238638 for angle = Pi/2
   * r[0] = ~.217459 for angle = Pi/4
   * r[0] = ~.211325 for angle = 0 (in the limit)
   * <p>
   * r[1] = abs error in degrees
   */

  public static void maxLerpAngleAbsError(float[] r, float d)
  {
    final double bias = Double.MIN_NORMAL;
   
    // This could be reduced, but can't see a reason to bother.
    double ca = Math.abs(d)+bias;        // cos(a)  
    double sa = Math.sqrt(1.0-ca*ca);    // sin(a)
    double a  = Math.atan2(sa,ca);       // a
    double t0 = ca-1.0;                  // cos(a)-1
    double tn = a*sa*(sa*sa-a*sa+t0*t0);
    double td = 2.0*(a*ca-a);
    double t  = 0.5+Math.sqrt(tn)/td;
    double e  = Math.atan2(t*sa, 1.0+t*(ca-1.0)) - t*a;
   
    e = 2.0*e;                           // Quat measure -> 3D
    e = Math.toDegrees(e);               // sane measure -> insane ;)
   
    r[0] = (float)t;
    r[1] = (float)e;
  }
Offline KaiHH

JGO Kernel


Medals: 820



« Reply #286 - Posted 2015-08-19 13:17:55 »

@KaiHH
Your implementation has a bug. Line 8 should be:
1  
            float scale1 = dot >= 0.0f ? 0.5 : -0.5;

Thanks! Fixed.

1.
In the float versions of classes during normalization, you often do this:
Yeah, I would really like to have empirical evidence for this whether it is faster. I did that because of (maybe) better accuracy. But neither do I know. Smiley

2.
I noticed that for dots extremely close to 1.0 or -1.0, slerp() was actually 50% faster than both nlerp and rlerp due to not doing a normalization.
Done.

3.
Field writes are more expensive than writes to local variables. At the end of nlerp() and nlerpIterative() this is very slightly faster:
Yeah, that was also noted by @Riven and I changed it right after he noticed it. Or in other words, the better version was actually the first commit that added that method.
Offline theagentd
« Reply #287 - Posted 2015-08-19 14:02:42 »

I was unable to find a good proof or benchmark for why double division is slower than float division online, but I'll benchmark it when I can.

I had an interesting idea. LibGDX has a Matrix4.avg(Matrix4, float) function that extracts the translation, scale and rotation of two matrices and lerps/slerps between them and finally outputs the result as a new matrix. It was horribly coded, using static temp vectors and quaternions, and performance was horrible. However, if that could be implemented in a thread-safe and fast way, it could possibly massively improve performance of my skeleton animation.

Myomyomyo.
Offline KaiHH

JGO Kernel


Medals: 820



« Reply #288 - Posted 2015-08-19 15:03:32 »

I had an interesting idea. LibGDX has a Matrix4.avg(Matrix4, float) function that extracts the translation, scale and rotation of two matrices and lerps/slerps between them and finally outputs the result as a new matrix. It was horribly coded, using static temp vectors and quaternions, and performance was horrible. However, if that could be implemented in a thread-safe and fast way, it could possibly massively improve performance of my skeleton animation.
This sounds like a nice idea!
I created a new pastebin http://pastebin.java-gaming.org/df83f3934371d (maybe first test before I do a Git commit).
This implementation is basically just an inline of all the pieces, from extracting the translation, scaling factors and building the rotation quaternions and then nlerpIterative the rotation quaternions and lerp'ing the scale and translation factors.
This makes no assumptions on the matrix. We can however introduce some assumptions, such as you probably only doing uniform scaling (x, y, z being the same scaling factors).
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #289 - Posted 2015-08-19 15:08:19 »

@theagentd: isn't the conceptual issue that matrices are a rather poor fit to hold this information in the first place, and it's just that they are convenient (relatively graspable, as opposed to quaternions) and GPUs are optimized for dot-products, which multiplying a vector by a matrix basically boils down to. To summarize: matrices are nice as a final result, but should not be used as *inputs* for operations like interpolation.

Basically, to allow for interpolation between matrices, you'd need to decontruct them into a translation component (vec3), a scale component and a orientational component (quaternion), lerp these components, then construct a new matrix from these.

The original data (translation, scale, orientation) is likely to be available elsewhere, so the conversion to and from a matrix (to allow for interpolation) is potentially redundant.

In conclusion: it certainly won't hurt anybody, but... is it the right thing to do? (and should we care about doing the right thing?)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline KaiHH

JGO Kernel


Medals: 820



« Reply #290 - Posted 2015-08-19 15:12:47 »

isn't the conceptual issue that matrices are a rather poor fit to hold this information in the first place, and it's just that they are convenient (relatively graspable, as opposed to quaternions) and GPUs are optimized for dot-products, which multiplying a vector by a matrix basically boils down to. To summarize: matrices are nice as a final result, but should not be used to directly perform operations like lerp on.
I totally agree. At various occasions I had the impression that JOML needed a TranslationRotateScale (or somehow abbreviated) class to hold the individual pieces and perform all sorts of operations on them and then finally convert to a Matrix for OpenGL to consume.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #291 - Posted 2015-08-19 15:18:28 »

@KaiHH: given your latest pastebin, it seems almost trivial to extract the code from that method to make it happen.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
public static void interpolateTransform(
   Vec3 position1,
   Quaternion orientation1,
   Vec3 scale1,

   Vec3 position2,
   Quaternion orientation2,
   Vec3 scale2,

   Matrix4 dest
) {
   // this is where the magic happens...
}


1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
public static void interpolateTransform( // assumes both scales are 1.0
   Vec3 position1,
   Quaternion orientation1,

   Vec3 position2,
   Quaternion orientation2,

   Matrix4 dest
) {
   // this is where the magic happens, once again...
}

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline KaiHH

JGO Kernel


Medals: 820



« Reply #292 - Posted 2015-08-19 15:22:00 »

Yes, that method was certainly a no-brainer Smiley
Being able to accept a decomposition of the components as parameters is also good, but it still converts to a matrix in the end.
If that is indeed the last step in the whole transformation chain of @theagentd, then this would be fine.
However, I still feel the need for a separate class that holds these three components separately.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #293 - Posted 2015-08-19 15:36:14 »

So you wanna keep it pure, eh? Pointing

1  
2  
TranslationRotationScale.interpolate(trs1, trs2, alpha, trsDest);
trsDest.toMatrix(matDest);

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline KaiHH

JGO Kernel


Medals: 820



« Reply #294 - Posted 2015-08-19 15:40:39 »

Yes. I like pure things. Grin
I like the symmetry between input and output of the matrix.interpolate(matrix) method.
But the only reason that would justify having an additional class (which in itself is also a complexity booster, and I would want to avoid adding new classes as much as pssible) would be if someone needs to perform more than one such interpolation.
As I said, if one interpolation is the final step in a transformation which after that does not need to do any interpolations anymore, we can get away with your proposed method perfectly.
Offline pitbuller
« Reply #295 - Posted 2015-08-19 17:34:33 »

Matrix deconstruction without hefty assumptions is never going to be high performant. I needed to code this without corner cases for slicing game(Kinghunt) where I needed to dynamically reparent stuff and transformation chains were concatenated to matrices and didn't have time to change that. It was a very long day. If you have GPU Pro 5 in hand there is good article about ways to manage object hierarchies. http://www.crcnetbase.com/doi/abs/10.1201/b16721-29
Offline Roquen

JGO Kernel


Medals: 518



« Reply #296 - Posted 2015-08-19 19:08:43 »

for why double division is slower than float division online,
Most double operations have longer latency than single precision...division is one example, multiply is not.

EDIT: http://www.agner.org/optimize/#manuals (Instruction tables). as an example
Offline theagentd
« Reply #297 - Posted 2015-08-19 21:42:13 »

@Riven and @KaiHH

I know that interpolating matrices is a bad ideain this sense, but for Java it might actually be faster. When interpolating between two animation I read from 6 different objects and store the result in 3 others. I have a feeling that interpolating between two matrices and writing to a third could be faster due to better memory locality. In addition, if I store animation bones as matrices instead I would be able to precompute the bind pose multiplication as well. Now that I think about it, that might be possible even without matrix bones. Hmm.

Another solution would be a Transform class with inlined translation, scale and rotation. It would pretty much only need interpolation support and be convertable to a matrix.

Another suggestion: Matrix4*.getTranslationRotationScale(Vector3, Quaternion, Vector3), inverse of translationRotateScale().

Myomyomyo.
Offline Roquen

JGO Kernel


Medals: 518



« Reply #298 - Posted 2015-08-20 07:50:28 »

"One problem..."  (implies more than one) "...is decomposing..." (yes, yes) "...matrix back to..." (linear algebra is a really nice hammer) "...ill defined problem and no robust..." (and every problem is a nail) "...approximate solution..." (cringe) "...works reasonably well in the majority of cases." (is that like:  the letter's in the mail?).

In 3-space general rotations: 3 degrees of freedom, translation: 3 degrees of freedom, scale 1 degree: 7 total degrees of freedom. scale+quat+vect = 8.  4x3 LA=12, 4x4 LA=16.  Inverting: scale = multiplicative inverse, rotation=3 logical sign flips, translation=3 logical sign flips.  Reverse the order of composition and carry through the derivation.  The only place where matrices is a logical win is for bulk transforms since it is automatically composing common sub-expressions. However if you're bulk transforming then manually collecting the common sub-expressions is equivalent to converting the entire transform into a matrix (in local variables) at the point where you actually need them.

And no I'm not saying anyone is an idiot if they use matrices...just to get that out of the way.
Offline ags1

JGO Kernel


Medals: 367
Projects: 7


Make code not war!


« Reply #299 - Posted 2015-08-20 10:22:48 »

I'd like to use JOML for 2D transformations, to get away from the rather naive calculations I do with trig functions... My maths is rather spotty, so is there, or will there be, a JOML For Dummies tutorial?

Pages: 1 ... 8 9 [10] 11 12 13
  ignore  |  Print  
 
 

 
mercenarius (12 views)
2020-06-04 19:26:01

mercenarius (20 views)
2020-06-04 19:13:43

Riven (856 views)
2019-09-04 15:33:17

hadezbladez (5846 views)
2018-11-16 13:46:03

hadezbladez (2655 views)
2018-11-16 13:41:33

hadezbladez (6271 views)
2018-11-16 13:35:35

hadezbladez (1508 views)
2018-11-16 13:32:03

EgonOlsen (4746 views)
2018-06-10 19:43:48

EgonOlsen (5802 views)
2018-06-10 19:43:44

EgonOlsen (3296 views)
2018-06-10 19:43:20
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!