Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (538)
Games in Android Showcase (132)
games submitted by our members
Games in WIP (600)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2 3
  ignore  |  Print  
  Arithmictic Performance  (Read 17246 times)
0 Members and 1 Guest are viewing this topic.
Offline kevglass

« JGO Spiffy Duke »


Medals: 211
Projects: 24
Exp: 18 years


Coder, Trainee Pixel Artist, Game Reviewer


« Posted 2009-08-04 19:07:06 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
                Ints    Longs    Floats    Doubles
Addition                1    2,12    4,9    6,22
Subtraction    0,98    2,12    5,04    6,31
Multiplication    1,08    2,8    4,14    4,98
Division             2,49    16,08    3,55    4,73
Comparison    7,18    7,33    9,73    11,92
Cast to int       0,69    2,22    2,1
random            41,86    78,59    52,88    91,04
Math.sin/cos             807,1


http://www.rbgrn.net/content/286-how-to-test-android-performance-using-fps

Kev

Offline Ranger
« Reply #1 - Posted 2009-08-04 23:11:53 »

Thanks for the info Kev.

Should also point out that these were taken running on the G1, which has no FPU.

Time to replace my sin/cos calls with table lookups!
Offline oNyx

JGO Coder


Medals: 2


pixels! :x


« Reply #2 - Posted 2009-08-05 23:28:33 »

>no FPU

The FP factors do look a lot better than expected though. Well, except for that sin/cos stuff, that is. Wink

弾幕 ☆ @mahonnaiseblog
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Nate

« JGO Bitwise Duke »


Medals: 158
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #3 - Posted 2009-08-06 08:27:59 »

Benchmarking Math.sin/cos isn't very useful. They should benchmark FloatMath.sin/cos, since that is what you would use on Android.

On a related note, I have experimented with fixed point and found it doesn't really make a difference. Just go ahead and use floats. If you are writing code that is going to be slow with float, it is going to be slow fixed, and your only recourse is to move it to native code (which is very easy, kudos to the Android team!).

Offline kevglass

« JGO Spiffy Duke »


Medals: 211
Projects: 24
Exp: 18 years


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #4 - Posted 2009-08-06 08:34:50 »

Lookup tables still the way to go using FastMath - just got a stable framerate after dropping them.

Kev

Offline Roquen
« Reply #5 - Posted 2009-08-06 15:56:01 »

In hardware and software a big part of the cost will be argument reduction.  So if your on a limited range you might want to try an approximation.
Offline Nate

« JGO Bitwise Duke »


Medals: 158
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #6 - Posted 2009-08-06 18:15:39 »

Lookup tables still the way to go using FastMath - just got a stable framerate after dropping them.

Kev

I'm not clear on this, did you find lookup tables faster than FloatMath for sin/cos?

Offline kevglass

« JGO Spiffy Duke »


Medals: 211
Projects: 24
Exp: 18 years


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #7 - Posted 2009-08-06 18:21:23 »

Yes, considerbly.

Kev

Offline DzzD
« Reply #8 - Posted 2009-08-07 01:33:08 »

about cos and sin it should be doable for float in about 40/100 using the taylor serie ( http://www.java-gaming.org/topics/3dzzd-mathx-class-accurate-and-fast-cos-sin/16296/view.html ) depending on the precision (lenght of the serie) you will requiere.


high precision :
1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20)))))))));

10 multiplicaton and 10 addition => 49+41 = 90 (for floats)


low precision
1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8))));

4 multiplicaton and 4 addition => 19.6 + 16.5 = 36.5 (for float)

if ou know that your angle value are in the correct range you can inline it and get best efficiency

Offline Nate

« JGO Bitwise Duke »


Medals: 158
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #9 - Posted 2009-08-07 03:52:11 »

Thanks DzzD. A lookup table is accurate enough for my purposes in this case though.

Yes, considerbly.

I see a sin lookup table being about 1.7x faster than FloatMath.sin. Does this correlate to your experience?

How about for sqrt? I find even a lookup table to be about the same speed as FloatMath.sqrt.

FWIW, I found this to be about 2x faster than Math.atan2 (on my G1):
1  
2  
3  
4  
5  
6  
7  
8  
9  
   static public float atan2 (float y, float x) {
      float abs_y = y < 0 ? -y : y;
      float angle;
      if (x >= 0)
         angle = 0.7853981633974483f - 0.7853981633974483f * (x - abs_y) / (x + abs_y);
      else
         angle = 2.356194490192345f - 0.7853981633974483f * (x + abs_y) / (abs_y - x);
      return y < 0 ? -angle : angle;
   }

Source: http://dspguru.com/comp.dsp/tricks/alg/fxdatan2.htm

Gah, my sin lookup stuff sucks. Can anyone share a good implementation?

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Ranger
« Reply #10 - Posted 2009-08-07 04:36:31 »

Gah, my sin lookup stuff sucks. Can anyone share a good implementation?

My sin/cos lookup tables are super basic.  But do the job for me:
1  
2  
3  
4  
5  
6  
sinVals = new float[360];
cosVals = new float[360];
for (i=0; i<360; i++) {
   sinVals[i] = (float)Math.sin(i*TO_RADIANS);
   cosVals[i] = (float)Math.cos(i*TO_RADIANS);
}
Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 840
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #11 - Posted 2009-08-07 04:41:24 »

What about this one:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
   private static final int     SIN_BITS              = 12; // adjust these .......
   private static final int     ATAN2_BITS            = 7; // adjust these .......

   private static final float   RAD                   = (float) Math.PI / 180.0f;
   private static final float   DEG                   = 180.0f / (float) Math.PI;

   private static final int     SIN_MASK              = ~(-1 << SIN_BITS);
   private static final int     SIN_COUNT             = SIN_MASK + 1;

   private static final float   radFull               = (float) (Math.PI * 2.0);
   private static final float   degFull               = (float) (360.0);
   private static final float   radToIndex            = SIN_COUNT / radFull;
   private static final float   degToIndex            = SIN_COUNT / degFull;

   private static final float[] sin                   = new float[SIN_COUNT];
   private static final float[] cos                   = new float[SIN_COUNT];

   private static final int     ATAN2_BITS2           = ATAN2_BITS << 1;
   private static final int     ATAN2_MASK            = ~(-1 << ATAN2_BITS2);
   private static final int     ATAN2_COUNT           = ATAN2_MASK + 1;
   private static final int     ATAN2_DIM             = (int) Math.sqrt(ATAN2_COUNT);

   private static final float   INV_ATAN2_DIM_MINUS_1 = 1.0f / (ATAN2_DIM - 1);

   private static final float[] atan2                 = new float[ATAN2_COUNT];

   static
   {
      for (int i = 0; i < SIN_COUNT; i++)
      {
         sin[i] = (float) Math.sin((i + 0.5f) / SIN_COUNT * radFull);
         cos[i] = (float) Math.cos((i + 0.5f) / SIN_COUNT * radFull);
      }

      for (int i = 0; i < ATAN2_DIM; i++)
      {
         for (int j = 0; j < ATAN2_DIM; j++)
         {
            // 0 .. 1
            float x0 = (float) i / ATAN2_DIM;
            float y0 = (float) j / ATAN2_DIM;

            atan2[j * ATAN2_DIM + i] = (float) Math.atan2(y0, x0);
         }
      }
   }

   public static final float sin(float rad)
   {
      return sin[(int) (rad * radToIndex) & SIN_MASK];
   }

   public static final float cos(float rad)
   {
      return cos[(int) (rad * radToIndex) & SIN_MASK];
   }

   public static final float atan2(float y, float x)
   {
      float add, mul;

      if (x < 0.0f)
      {
         if (y < 0.0f)
         {
            x = -x;
            y = -y;

            mul = 1.0f;
         }
         else
         {
            x = -x;
            mul = -1.0f;
         }

         add = -3.141592653f;
      }
      else
      {
         if (y < 0.0f)
         {
            y = -y;
            mul = -1.0f;
         }
         else
         {
            mul = 1.0f;
         }

         add = 0.0f;
      }

      float invDiv = 1.0f / (((x < y) ? y : x) * INV_ATAN2_DIM_MINUS_1);

      int xi = (int) (x * invDiv);
      int yi = (int) (y * invDiv);

      return (atan2[yi * ATAN2_DIM + xi] + add) * mul;
   }

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline pjt33
« Reply #12 - Posted 2009-08-07 08:27:42 »

Gah, my sin lookup stuff sucks. Can anyone share a good implementation?
What kind of error do you need? How much memory are you willing to use?

Approaches can be roughly classified as:
1. Single array lookup after quantisation - Ranger's and Riven's implementations
2. Multiple array lookups with interpolation (i.e. two lookups with linear interpolation, 3 with quadratic, ...)
3. Multiple array lookups with angle addition formulae - I'll come back to this
4. Direct approximation by polynomial / rational function - Dzzd's implementation


On #3: sin(A + B) = sin(A) cos(B) + cos(A) sin(B), cos(A + B) = cos(A) cos(B) - sin(A) sin(B)
One approach would be to multiply the input x by 2 * Pi * (1 << 24), mask off the bottom three bytes individually and do six lookups (total table space required: 256 * 6 entries, probably optimisable to 256*3 if you want to trade memory for some subtractions). 8 multiplications and 4 additions gets you sin x and cos x (or 6 multiplications and 3 additions if you only want one of them).
With a bit more memory (16k entries if you're using 24 bits of the input) you can reduce that to four lookups, two multiplications, and one addition for sin, same for cos (although obv. if doing both you don't double the lookups).
Offline Roquen
« Reply #13 - Posted 2009-08-07 08:35:32 »

I'd suggest timing random table reads (if possible) on the lowest end target.  Many modern processors (including those used in embedded systems) will have large timing hits for reading from big tables and an approximation will be faster. 

For approximations, truncated power series are pretty bad (bang for your buck) in float or fixed point.  Think minimax, generalized pade, et al.

(edit: fixed typo)
Offline DzzD
« Reply #14 - Posted 2009-08-07 13:39:03 »

Nb: I think that you would be surprised on how precise and fast (addition and multiplication are usually very fast on any cpu)  can be the taylor serie (the long version ) especially if you inline it (this avoid a method call), it can be nearly as fast as a lookup table (due to random array cache trouble and other performance problem with access to array), you should really give it a try.


EDIT : also another problem you may have with the lookup table is casting from floating point value to int (to index the lookup array) as casting from floating to fixed is damnly slow


EDIT2 :
For approximations, truncated power series are pretty bad (bang for your buck) in float or fixed point.  Think minimax, generalized pade, et al.

(edit: fixed typo)

about the min max you should have looked the original source code

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
public static final double cos(double x)
    {
       
       //return Math.cos(x);
       
       if(x<0.0) x=-x;
       
       
       if(x<PI2)
       {
          if(x<PI)
          {  
            double x2=x*x;
            return 1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16))))))));
            //MORE PRECISE BUT NOT COMPATIBLE JVM MS => return 1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20)))))))));      
         }
         else
         {
            x-=PI;
            double x2=x*x;
            return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16)))))))));
            //MORE PRECISE BUT NOT COMPATIBLE JVM MS => return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20))))))))));      
         }
       }
       
       
       x%=PI2;
       x-=PI;
      double x2=x*x;
     
      return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*f16))))))));
      //MORE PRECISE BUT NOT COMPATIBLE JVM MS => return -(1.0+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16+x2*(f18+x2*f20))))))))));
     
   }  

Offline Roquen
« Reply #15 - Posted 2009-08-07 14:40:43 »

Humm.  I'm only seeing truncated series in the other thread.
Offline DzzD
« Reply #16 - Posted 2009-08-07 15:04:12 »

Humm.  I'm only seeing truncated series in the other thread.

yes sure series are truncated (they must be) but the good thing is that cosinus is symetrical and periodical, so you can always convert an input angle into the range 0 to 0.5*PI and found the result (using for example a modulo) and then it will just give you a smart result as you are always using something like the 'best range' for that serie.

Offline Roquen
« Reply #17 - Posted 2009-08-07 15:52:34 »

Sorry, I'm not being clear.  Truncated power series have exploding error away from the expansion point.  Here are a couple of quick references:

minimax
Remez

In an attempt to clairify, here's a quick example for sine.

All these numbers are all bad, I'm mucking around in a computer algebra program.

F1:  x*(1         + x2*(-0.166667f + 0.00833333f * x2))
F2:  x*(1         + x2*(-0.16605f  + 0.00761f    * x2))
F3:  x*(0.999892f + x2*(-0.165961f + 0.00760319f * x2))


"F1" is a truncated power series (again numbers are bad and only for visual comparison) and the other two are minimax approximations with the following design:

input range = +/- Pi/2
minimize absolute error

"F2" was simply given one less coefficent to play around with.

AbsErrorMax(F1) = 0.00452486   @ Pi/2
AbsErrorMax(F2) = 0.000121018  @ 0.833326
AbsErrorMax(F3) = 0.0000809254 @ 0.868416


Offline DzzD
« Reply #18 - Posted 2009-08-07 17:04:57 »

Sorry, I'm not being clear.  Truncated power series have exploding error away from the expansion point.  Here are a couple of quick references:

minimax
Remez

In an attempt to clairify, here's a quick example for sine.

All these numbers are all bad, I'm mucking around in a computer algebra program.

F1:  x*(1         + x2*(-0.166667f + 0.00833333f * x2))
F2:  x*(1         + x2*(-0.16605f  + 0.00761f    * x2))
F3:  x*(0.999892f + x2*(-0.165961f + 0.00760319f * x2))


"F1" is a truncated power series (again numbers are bad and only for visual comparison) and the other two are minimax approximations with the following design:

input range = +/- Pi/2
minimize absolute error

"F2" was simply given one less coefficent to play around with.

AbsErrorMax(F1) = 0.00452486   @ Pi/2
AbsErrorMax(F2) = 0.000121018  @ 0.833326
AbsErrorMax(F3) = 0.0000809254 @ 0.868416





ok you are right at first I missunderstood and read min & max and not minimax but .... this is finally the same in the particular case of cosinus as those minimax error will be more and more noticable if you get farther than 0, so clamping input into 0 to 0.5PI make them so small that the real Math.cos may introduce bigger error than the taylor serie, an example is given in the original source code where :

MathX.cos(-Math.PI*0.25)=0.7071067811865476
Math.cos(-Math.PI*0.25) =0.7071067811865476

MathX.cos(-Math.PI*0.5) =0.0
Math.cos(-Math.PI*0.5)  =6.123233995736766E-17

what I finally mean is just that Math.cos will also introduce relative error in comparaison of the true cosinus

and the taylor series of cosinus (and because cosinus is periodical) will have very very tiny error and anyway will give smarter result than lookup for the same time cost.

EDIT : as you can see in your sample even with only three polynomial coeficient it give quite smart result so imagine with 10






Offline Nate

« JGO Bitwise Duke »


Medals: 158
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #19 - Posted 2009-08-07 19:50:14 »

Riven, thanks for the lookup code (I was using modulus instead of masking...).

Your atan2 lookup, run on my desktop, takes about 1.6 times longer than the atan2 I posted above. However, yours is quite a bit more accurate. It is good to have options! Both are waaay faster (5+ times) than Math.atan2. That was the desktop, on the G1 I see (10k iterations with 20k precalculated random numbers):
Math.atan2: 159.66797
Riven's atan2: 58.86841
Nate's atan2: 48.49243
So, a smaller difference all around.

I played with Riven's lookup table and DzzD's low precision forumula. Benchmark code:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
for (int i = 0; i < 10; i++)
   test(false);
test(true);
test(true);
test(true);

public void test (boolean log) {
   int count = 50000;

   long s, e;
   Random random = new Random();

   float[] numbers = new float[count];
   for (int i = 0; i < count; i++)
      numbers[i] = random.nextFloat();

   s = System.nanoTime();
   for (int i = 0; i < count; i++)
      Math.cos(numbers[i]);
   e = System.nanoTime();
   if (log) System.out.println("Java: " + (e - s) / 1000000f);

   s = System.nanoTime();
   for (int i = 0; i < count; i++)
      FastMath2.cos(numbers[i]); // Riven
   e = System.nanoTime();
   if (log) System.out.println("Riven: " + (e - s) / 1000000f);

   s = System.nanoTime();
   for (int i = 0; i < count; i++)
      FastMath.cos(numbers[i]); // DzzD
   e = System.nanoTime();
   if (log) System.out.println("DzzD: " + (e - s) / 1000000f);

   if (log) {
      System.out.println("Input: " + numbers[3]);
      System.out.println("DzzD: " + FastMath.cos(numbers[3]));
      System.out.println("Riven: " + FastMath2.cos(numbers[3]));
      System.out.println("Java: " + Math.cos(numbers[3]));
      System.out.println();
   }
}


Results on my G1:
Quote
Speed:

Java: 327.54517
Riven: 80.47485
DzzD: 149.93286

Java: 330.07813
Riven: 79.10156
DzzD: 150.08545

Java: 331.29883
Riven: 80.62744
DzzD: 150.23804

Accuracy:

Input: 0.16849637
DzzD: 0.9169282
Riven: 0.98592603
Java: 0.9858380402606572

Input: 0.068103254
DzzD: 0.96614116
Riven: 0.99767107
Java: 0.9976818695836023

Input: 0.99370223
DzzD: 0.5429537
Riven: 0.5459677
Java: 0.5455909445089435
Accuracy probably varies more when the input is not between 0 and 1.

Edit: Anyone have a good solution for sqrt?

Offline DzzD
« Reply #20 - Posted 2009-08-07 20:13:14 »

thanks for the test, any chance to get a high resolution test plz ? Smiley (up to F16)

but wow, I am realy suprised by lookup fastness


EDIT : hum seems that the rules have changed on Android with lookup table, I run your test again (but on a laptop, sry but dont have any Android device yet) using Riven lookup/Java/ and original taylor (but converted from double to float and up to F20)

Here is the whole test project : http://demo.dzzd.net/testCos.zip

Quote
//Time
Java: 0.045886
Riven: 1.163765
DzzD: 0.166013

//Results
Input: 0.84525967
DzzD: 0.66299015
Riven: 0.663537
Java: 0.6635370371144133

//Time
Java: 0.046026
Riven: 1.100349
DzzD: 0.166362

//Results
Input: 0.37585977
DzzD: 0.92992324
Riven: 0.93019235
Java: 0.9301923689294419

//Time
Java: 0.045956
Riven: 1.101327
DzzD: 0.166292

//Results
Input: 0.35580623
DzzD: 0.937606
Riven: 0.93736595
Java: 0.9373659458091854

Offline pjt33
« Reply #21 - Posted 2009-08-07 20:43:10 »

Nate, Dzzd, what are you accuracy / results respectively measuring?
Offline DzzD
« Reply #22 - Posted 2009-08-07 20:47:04 »

Nate, Dzzd, what are you accuracy / results respectively measuring?

it show original Java Math.cos result and results of the two others methods, this is not a percentage

EDIT : modifing the random generator ( numbers = (float)(Math.random()*7919 ); )  to use a higher range of values give the following results:

Quote
//times
Java: 0.049936
Riven: 1.734927
DzzD: 0.164825

//results
Input: 4377.789
DzzD: -0.020707069
Riven: -0.020298883110866583
Java: -0.02029888311062082

//times
Java: 0.050845
Riven: 1.757276
DzzD: 0.16755

//results
Input: 2180.3757
DzzD: 0.9939912
Riven: 0.9939086387101796
Java: 0.9939087098649356

//times
Java: 0.049937
Riven: 1.734369
DzzD: 0.161892

//results
Input: 4083.068
DzzD: 0.53823364
Riven: 0.5383364782146423
Java: 0.538336478350398

two things to point out :

Math.cos have become a lot faster on recent JVM
lookup/array are a lot faster on Android than it is on a Desktop (wich is a shame....)


all test was done on an Intel Core Duo T7500 2.2Ghz with Java 6u14[/s]

EDIT :  really need to have some rest

Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 840
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #23 - Posted 2009-08-07 22:08:21 »

Quote from: DzzD
//times
Java: 0.049937
Riven: 1.734369
DzzD: 0.161892

Are you measuring showing my code to be 34.6x slower than pure Java? Shocked persecutioncomplex

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 840
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #24 - Posted 2009-08-07 22:14:24 »

Here is the whole test project : http://demo.dzzd.net/testCos.zip

You seem to have swapped the implementations Smiley

FastMath is mine. FastMath2 is yours.


Your benchmark was fundamentally flawed. It allowed the JVM to completely remove the Math.cos() call.

I renamed the classes (see next post)

Bench.java
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
   public static void main(String[] args)
   {
      for (int i = 0; i < 64; i++)
         testCos(false);
      for (int i = 0; i < 64; i++)
         testCos(true);
   }

   public static void testCos(boolean log)
   {
      int count = 50000;

      long s, e;

      float[] numbers = new float[count];
      for (int i = 0; i < count; i++)
         numbers[i] = (float) (Math.random() * Math.PI * 4.0 - Math.PI * 2.0);

      // ensure the JVM doesn't optimize those silly calls away!!
      double[] java_dst = new double[count];
      float[] dzzd_dst = new float[count];
      float[] riven_dst = new float[count];

      s = System.nanoTime();
      for (int i = 0; i < count; i++)
         java_dst[i] = Math.cos(numbers[i]);
      e = System.nanoTime();
      if (log)
         System.out.println("Java.cos:  " + (e - s) / 1000 + "us");

      s = System.nanoTime();
      for (int i = 0; i < count; i++)
         dzzd_dst[i] = FastMathDzzD.cos(numbers[i]);
      e = System.nanoTime();
      if (log)
         System.out.println("DzzD.cos:  " + (e - s) / 1000 + "us");

      s = System.nanoTime();
      for (int i = 0; i < count; i++)
         riven_dst[i] = FastMathRiven.cos(numbers[i]);
      e = System.nanoTime();
      if (log)
         System.out.println("Riven.cos: " + (e - s) / 1000 + "us");

      double dzzdAvgErr = 0.0;
      double rivenAvgErr = 0.0;
      double dzzdMaxErr = 0.0;
      double rivenMaxErr = 0.0;

      for (int i = 0; i < count; i++)
      {
         double dzzdErr = Math.abs(Math.cos(numbers[i]) - FastMathDzzD.cos(numbers[i]));
         double rivenErr = Math.abs(Math.cos(numbers[i]) - FastMathRiven.cos(numbers[i]));

         dzzdAvgErr += dzzdErr;
         if (dzzdErr > dzzdMaxErr)
            dzzdMaxErr = dzzdErr;

         rivenAvgErr += rivenErr;
         if (rivenErr > rivenMaxErr)
            rivenMaxErr = rivenErr;
      }
      dzzdAvgErr /= count;
      rivenAvgErr /= count;

      if (log)
      {
         System.out.println("Input: " + numbers[3]);
         System.out.println("DzzD: " + FastMathDzzD.cos(numbers[3]) + ", avg.error=" + dzzdAvgErr + ", max.error=" + dzzdMaxErr);
         System.out.println("Riven: " + FastMathRiven.cos(numbers[3]) + ", avg.error=" + rivenAvgErr + ", max.error=" + rivenMaxErr);
         System.out.println("Java: " + Math.cos(numbers[3]));
         System.out.println("~~~prevent opt. ~~~" + dzzd_dst[13] + "~~~" + java_dst[13] + "~~~" + riven_dst[13]);
         System.out.println();
      }
   }

   public static void testAtan2(boolean log)
   {
      int count = 50000;

      long s, e;

      float[] xNumbers = new float[count];
      float[] yNumbers = new float[count];
      for (int i = 0; i < count; i++)
      {
         xNumbers[i] = (float) (Math.random() * Math.PI * 2.0 - Math.PI);
         yNumbers[i] = (float) (Math.random() * Math.PI * 2.0 - Math.PI);
      }

      // ensure the JVM doesn't optimize those silly calls away!!
      double[] java_dst = new double[count];
      float[] nate_dst = new float[count];
      float[] riven_dst = new float[count];

      s = System.nanoTime();
      for (int i = 0; i < count; i++)
         java_dst[i] = Math.atan2(yNumbers[i], xNumbers[i]);
      e = System.nanoTime();
      if (log)
         System.out.println("Java.atan2:  " + (e - s) / 1000 + "us");

      s = System.nanoTime();
      for (int i = 0; i < count; i++)
         nate_dst[i] = FastMathNate.atan2(yNumbers[i], xNumbers[i]);
      e = System.nanoTime();
      if (log)
         System.out.println("Nate.atan2:  " + (e - s) / 1000 + "us");

      s = System.nanoTime();
      for (int i = 0; i < count; i++)
         riven_dst[i] = FastMathRiven.atan2(yNumbers[i], xNumbers[i]);
      e = System.nanoTime();
      if (log)
         System.out.println("Riven.atan2: " + (e - s) / 1000 + "us");

      double nateAvgErr = 0.0;
      double rivenAvgErr = 0.0;
      double nateMaxErr = 0.0;
      double rivenMaxErr = 0.0;

      for (int i = 0; i < count; i++)
      {
         double nateErr = Math.abs(Math.atan2(yNumbers[i], xNumbers[i]) - FastMathNate.atan2(yNumbers[i], xNumbers[i]));
         double rivenErr = Math.abs(Math.atan2(yNumbers[i], xNumbers[i]) - FastMathRiven.atan2(yNumbers[i], xNumbers[i]));

         nateAvgErr += nateErr;
         if (nateErr > nateMaxErr)
            nateMaxErr = nateErr;

         rivenAvgErr += rivenErr;
         if (rivenErr > rivenMaxErr)
            rivenMaxErr = rivenErr;
      }
      nateAvgErr /= count;
      rivenAvgErr /= count;

      if (log)
      {
         System.out.println("Input: " + xNumbers[3]);
         System.out.println("Nate: " + FastMathDzzD.cos(xNumbers[3]) + ", avg.error=" + nateAvgErr + ", max.error=" + nateMaxErr);
         System.out.println("Riven: " + FastMathRiven.cos(xNumbers[3]) + ", avg.error=" + rivenAvgErr + ", max.error=" + rivenMaxErr);
         System.out.println("Java: " + Math.cos(xNumbers[3]));
         System.out.println("~~~prevent opt. ~~~" + nate_dst[13] + "~~~" + java_dst[13] + "~~~" + riven_dst[13]);
         System.out.println();
      }
   }

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 840
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #25 - Posted 2009-08-07 22:21:11 »

FastMathDzzD.java
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
public class FastMathDzzD
{
   public static float  PI   = (float) Math.PI;

   private static float f2   = (float) (-0.5);
   private static float f4   = (float) (-f2 / (3.0 * 4.0));
   private static float f6   = (float) (-f4 / (5.0 * 6.0));
   private static float f8   = (float) (-f6 / (7.0 * 8.0));
   private static float f10  = (float) (-f8 / (9.0 * 10.0));
   private static float f12  = (float) (-f10 / (11.0 * 12.0));
   private static float f14  = (float) (-f12 / (13.0 * 14.0));
   private static float f16  = (float) (-f14 / (15.0 * 16.0));
   //MORE PRECISE BUT NOT COMPATIBLE JVM MS =>//
   private static float f18  = (float) (-f16 / (17.0 * 18.0));
   //MORE PRECISE BUT NOT COMPATIBLE JVM MS =>//
   private static float f20  = (float) (-f18 / (19.0 * 20.0));
   private static float PI2  = (float) (2.0 * PI);
   private static float PI05 = (float) (0.5 * PI);

   /**
   * Compute and return sinus of its parameter using taylor serie
   * @param x angle in radian to
   * @return sinus value for the given parameter
   */

   public static final float sin(float x)
   {
      //return Math.sin(x);
      return cos(x - PI05);
   }

   /**
   * Compute and return cosinus of its parameter using taylor serie
   * @param x angle in radian to
   * @return cosinus value for the given parameter
   */

   public static final float cos(float x)
   {

      //return Math.cos(x);

      if (x < 0.0)
         x = -x;

      if (x < PI2)
      {
         if (x < PI)
         {
            float x2 = x * x;
            //return 1.0f+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16))))))));
            //MORE PRECISE BUT NOT COMPATIBLE JVM MS =>
            return 1.0f + x2 * (f2 + x2 * (f4 + x2 * (f6 + x2 * (f8 + x2 * (f10 + x2 * (f12 + x2 * (f14 + x2 * (f16 + x2 * (f18 + x2 * f20)))))))));
         }
         else
         {
            x -= PI;
            float x2 = x * x;
            //return -(1.0f+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*(f16)))))))));
            //MORE PRECISE BUT NOT COMPATIBLE JVM MS =>
            return -(1.0f + x2 * (f2 + x2 * (f4 + x2 * (f6 + x2 * (f8 + x2 * (f10 + x2 * (f12 + x2 * (f14 + x2 * (f16 + x2 * (f18 + x2 * f20))))))))));
         }
      }

      x %= PI2;
      x -= PI;
      float x2 = x * x;

      //return -(1.0f+x2*(f2+x2*(f4+x2*(f6+x2*(f8+x2*(f10+x2*(f12+x2*(f14+x2*f16))))))));
      //MORE PRECISE BUT NOT COMPATIBLE JVM MS =>
      return -(1.0f + x2 * (f2 + x2 * (f4 + x2 * (f6 + x2 * (f8 + x2 * (f10 + x2 * (f12 + x2 * (f14 + x2 * (f16 + x2 * (f18 + x2 * f20))))))))));

   }

}


FastMathRiven.java
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
public class FastMathRiven
{

   private static final int     SIN_BITS              = 12;                          // adjust these .......
   private static final int     ATAN2_BITS            = 7;                           // adjust these .......

   private static final float   RAD                   = (float) Math.PI / 180.0f;
   private static final float   DEG                   = 180.0f / (float) Math.PI;

   private static final int     SIN_MASK              = ~(-1 << SIN_BITS);
   private static final int     SIN_COUNT             = SIN_MASK + 1;

   private static final float   radFull               = (float) (Math.PI * 2.0);
   private static final float   degFull               = (float) (360.0);
   private static final float   radToIndex            = SIN_COUNT / radFull;
   private static final float   degToIndex            = SIN_COUNT / degFull;

   private static final float[] sin                   = new float[SIN_COUNT];
   private static final float[] cos                   = new float[SIN_COUNT];

   private static final int     ATAN2_BITS2           = ATAN2_BITS << 1;
   private static final int     ATAN2_MASK            = ~(-1 << ATAN2_BITS2);
   private static final int     ATAN2_COUNT           = ATAN2_MASK + 1;
   private static final int     ATAN2_DIM             = (int) Math.sqrt(ATAN2_COUNT);

   private static final float   INV_ATAN2_DIM_MINUS_1 = 1.0f / (ATAN2_DIM - 1);

   private static final float[] atan2                 = new float[ATAN2_COUNT];

   static
   {
      for (int i = 0; i < SIN_COUNT; i++)
      {
         sin[i] = (float) Math.sin((i + 0.5f) / SIN_COUNT * radFull);
         cos[i] = (float) Math.cos((i + 0.5f) / SIN_COUNT * radFull);
      }

      for (int i = 0; i < ATAN2_DIM; i++)
      {
         for (int j = 0; j < ATAN2_DIM; j++)
         {
            // 0 .. 1
            float x0 = (float) i / ATAN2_DIM;
            float y0 = (float) j / ATAN2_DIM;

            atan2[j * ATAN2_DIM + i] = (float) Math.atan2(y0, x0);
         }
      }
   }

   public static final float sin(float rad)
   {
      return sin[(int) (rad * radToIndex) & SIN_MASK];
   }

   public static final float cos(float rad)
   {
      return cos[(int) (rad * radToIndex) & SIN_MASK];
   }

   public static final float atan2(float y, float x)
   {
      float add, mul;

      if (x < 0.0f)
      {
         if (y < 0.0f)
         {
            x = -x;
            y = -y;

            mul = 1.0f;
         }
         else
         {
            x = -x;
            mul = -1.0f;
         }

         add = -3.141592653f;
      }
      else
      {
         if (y < 0.0f)
         {
            y = -y;
            mul = -1.0f;
         }
         else
         {
            mul = 1.0f;
         }

         add = 0.0f;
      }

      float invDiv = 1.0f / (((x < y) ? y : x) * INV_ATAN2_DIM_MINUS_1);

      int xi = (int) (x * invDiv);
      int yi = (int) (y * invDiv);

      return (atan2[yi * ATAN2_DIM + xi] + add) * mul;
   }
}


FastMathNate.java
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
public class FastMathNate
{
   static public float atan2(float y, float x)
   {
      float abs_y = y < 0 ? -y : y;
      float angle;
      if (x >= 0)
         angle = 0.7853981633974483f - 0.7853981633974483f * (x - abs_y) / (x + abs_y);
      else
         angle = 2.356194490192345f - 0.7853981633974483f * (x + abs_y) / (abs_y - x);
      return y < 0 ? -angle : angle;
   }
}

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 840
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #26 - Posted 2009-08-07 22:37:02 »

I just knew my code couldn't be 35x slower than a call to Math.cos(), it turns out it is nearly 50x faster.




Result on cos(-2PI..+2PI): JRE 1.6.0u13 32bit
Java.cos:  5937us
DzzD.cos:  1755us
Riven.cos: 119us
Input: 0.13322899
DzzD: 0.99113816, avg.error=6.511363215909248E-8, max.error=4.2913641251640655E-7
Riven: 0.9912097, avg.error=6.112402706577345E-4, max.error=0.0022978588251646535
Java: 0.9911381382357358
~~~prevent opt. ~~~0.9547174~~~0.9547176190751046~~~0.95491374

Java.cos:  5182us
DzzD.cos:  1756us
Riven.cos: 106us
Input: -4.931952
DzzD: 0.217803, avg.error=6.461738699229257E-8, max.error=4.4446321678659473E-7
Riven: 0.21685559, avg.error=6.134784039957628E-4, max.error=0.0022999601748480807
Java: 0.21780315388830504
~~~prevent opt. ~~~0.59424895~~~0.5942489528581584~~~0.5950832

Java.cos:  5215us
DzzD.cos:  1767us
Riven.cos: 114us
Input: 5.194904
DzzD: 0.46400833, avg.error=6.49894411417267E-8, max.error=4.217999866051869E-7
Riven: 0.46393955, avg.error=6.083748864382175E-4, max.error=0.002297156823630276
Java: 0.4640083899859773
~~~prevent opt. ~~~-0.99691343~~~-0.9969134135444965~~~-0.9968811


Result on atan2(-PI..+PI, -PI..+PI): JRE 1.6.0u13 32bit

Java.atan2:  9594us
Nate.atan2:  966us
Riven.atan2: 1659us
Input: -0.12577985
Nate: 0.9921001, avg.error=0.04307788871778716, max.error=0.07111472846664224
Riven: 0.9923854, avg.error=0.0029046661205663024, max.error=0.00787278381587564
Java: 0.9921001376530111
~~~prevent opt. ~~~-0.5958918~~~-0.5486365662017681~~~-0.5450384

Java.atan2:  9093us
Nate.atan2:  925us
Riven.atan2: 1650us
Input: -2.7883778
Nate: -0.9382652, avg.error=0.04323209591878118, max.error=0.07111474342576019
Riven: -0.9376059, avg.error=0.00290633592197212, max.error=0.007867688886052049
Java: -0.9382654809616374
~~~prevent opt. ~~~1.0878927~~~1.1530357950577979~~~1.1554981

Java.atan2:  9194us
Nate.atan2:  935us
Riven.atan2: 1650us
Input: 1.9041518
Nate: -0.32721555, avg.error=0.043134051519980066, max.error=0.07111471440464712
Riven: -0.3274853, avg.error=0.0029083043816270954, max.error=0.007872691136467047
Java: -0.32721561538512783
~~~prevent opt. ~~~-1.2705337~~~-1.3387262269635727~~~-1.3370532

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline Nate

« JGO Bitwise Duke »


Medals: 158
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #27 - Posted 2009-08-08 03:54:48 »

This is great stuff guys! Let's do sqrt next! Cheesy

Nate, Dzzd, what are you accuracy / results respectively measuring?
Accuracy was just showing the accurate value from java.lang.Math and the inaccurate value from other algorithms. Riven greatly improved (and clarified) the benchmark! The time results are to give some (vague) idea of the speed improvement of using the less accurate algorithms.

On a desktop, this stuff is mostly moot, but it does end up mattering for the current Android phones, which I think is kind of fun. Smiley

Offline pjt33
« Reply #28 - Posted 2009-08-08 07:46:02 »

On a desktop, this stuff is mostly moot
You'd be surprised. Back in the 1.4 days when JNI was slooooooow doing trig was the major bottleneck in a game I was working on and it ended up being worthwhile porting the native implementations of sin and cos to Java.
Offline pjt33
« Reply #29 - Posted 2009-08-08 08:37:21 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
/**
 * Total table size: 64kB.
 */

public class FastMathPjt
{
   private static final float[] s1, s2, c1, c2;
   private static final float SCALE = (float)((1 << 23) / Math.PI);

   static
   {
      s1 = new float[1 << 12];
      s2 = new float[1 << 12];
      c1 = new float[1 << 12];
      c2 = new float[1 << 12];

      for (int i = 0; i < s1.length; i++)
      {
         float a = (i << 12) / SCALE;
         float b = i / SCALE;
         s1[i] = (float)Math.sin(a);
         s2[i] = (float)Math.sin(b);
         c1[i] = (float)Math.cos(a);
         c2[i] = (float)Math.cos(b);
      }
   }

   public static final float sin(float x)
   {
      if (x >= 0)
      {
         int i = (int)(x * SCALE);
         int a = (i >> 12) & 0xfff;
         int b = i & 0xfff;
         return s1[a] * c2[b] + c1[a] * s2[b];
      }
      else
      {
         int i = (int)(-x * SCALE);
         int a = (i >> 12) & 0xfff;
         int b = i & 0xfff;
         return -(s1[a] * c2[b] + c1[a] * s2[b]);
      }
   }

   public static final float cos(float x)
   {
      if (x < 0) x = -x;
      int i = (int)(x * SCALE);
      int a = (i >> 12) & 0xfff;
      int b = i & 0xfff;
      return c1[a] * c2[b] - s1[a] * s2[b];
   }
}

This is the #3 in my list above. Twice as much memory usage as FastMathRiven, twice as fast as FastMathDzzD, similar error magnitude to FastMathDzzD. Memory usage could be reduced at cost of speed.
Pages: [1] 2 3
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

rwatson462 (28 views)
2014-12-15 09:26:44

Mr.CodeIt (19 views)
2014-12-14 19:50:38

BurntPizza (37 views)
2014-12-09 22:41:13

BurntPizza (72 views)
2014-12-08 04:46:31

JscottyBieshaar (34 views)
2014-12-05 12:39:02

SHC (46 views)
2014-12-03 16:27:13

CopyableCougar4 (42 views)
2014-11-29 21:32:03

toopeicgaming1999 (110 views)
2014-11-26 15:22:04

toopeicgaming1999 (96 views)
2014-11-26 15:20:36

toopeicgaming1999 (29 views)
2014-11-26 15:20:08
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!