Java-Gaming.org Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (741)
Games in Android Showcase (225)
games submitted by our members
Games in WIP (823)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Fast Atan2 [Again]  (Read 2124 times)
0 Members and 1 Guest are viewing this topic.
Offline mooman219
« Posted 2017-08-30 10:58:29 »

Background:
Back at it with some Atan2 benchmarks. I found the source for my old benchmarks and wanted to run this again with some different settings. The purpose of this post is to find a faster way to do Math.atan2(y,x). To compare various methods, I setup a simple JMH benchmark and accuracy test. Criticisms and new atan2 submissions are appreciated.


Outcome:
Fastest implementation with lookup table: Icecore
Fastest implementation without lookup table: Diamond

Note: Fastest doesn't mean most accurate. Please take into account the average and largest error for each algorithm.


Relevant Data:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
A lower average means higher accuracy.
Apache       : Average Error 0.00000 / Largest Error 0.00000 : 36,012,001 samples : 1488.165 ±   7.285  ms/op
Default      : Average Error 0.00000 / Largest Error 0.00000 : 36,012,001 samples : 1773.411 ± 193.228  ms/op
Diamond      : Average Error 0.04318 / Largest Error 0.07112 : 36,012,001 samples :   47.299 ±   1.467  ms/op
DSPAccurate  : Average Error 0.00344 / Largest Error 0.01015 : 36,012,001 samples :   72.539 ±   3.702  ms/op
DSPFast      : Average Error 0.04318 / Largest Error 0.07111 : 36,012,001 samples :   56.547 ±   0.662  ms/op
Icecore      : Average Error 0.00038 / Largest Error 0.00098 : 36,012,001 samples :   58.265 ±   0.686  ms/op
Kappa        : Average Error 0.00224 / Largest Error 0.00468 : 36,012,001 samples :   67.622 ±  21.004  ms/op
Riven        : Average Error 0.00291 / Largest Error 0.00787 : 36,012,001 samples :   99.701 ±   2.846  ms/op
Baseline     :                                               : 36,012,001 samples :   36.155 ±   4.251  ms/op


Note: Keep in mind these are micro benchmarks and not indicative of real world performance; the JMH can only do so much to make sure the JIT doesn't interfere. There may be a fair amount of branch prediction and caching that's going on, and lookup tables might just be sitting in a cpu cache.


Source:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
148  
149  
150  
151  
152  
153  
154  
155  
156  
157  
158  
159  
160  
161  
162  
163  
164  
165  
166  
167  
168  
169  
170  
171  
172  
173  
174  
175  
176  
177  
178  
179  
180  
181  
182  
183  
184  
185  
186  
187  
188  
189  
190  
191  
192  
193  
194  
195  
196  
197  
198  
199  
200  
201  
202  
203  
204  
205  
206  
207  
208  
209  
210  
211  
212  
213  
214  
215  
216  
217  
218  
219  
220  
221  
222  
223  
224  
225  
226  
227  
228  
229  
230  
231  
232  
233  
234  
235  
236  
237  
238  
239  
240  
241  
242  
243  
244  
245  
246  
247  
248  
249  
250  
251  
252  
253  
254  
255  
256  
257  
258  
259  
260  
261  
262  
263  
264  
265  
266  
267  
268  
269  
270  
271  
272  
273  
274  
275  
276  
277  
278  
279  
280  
281  
282  
283  
284  
285  
286  
287  
288  
289  
290  
291  
292  
293  
294  
295  
296  
297  
298  
299  
300  
301  
302  
303  
304  
305  
306  
307  
308  
309  
310  
311  
312  
313  
314  
315  
316  
317  
318  
319  
320  
321  
322  
323  
324  
325  
326  
327  
328  
329  
330  
331  
332  
333  
334  
335  
336  
337  
338  
339  
340  
341  
342  
343  
344  
345  
346  
347  
348  
349  
350  
351  
352  
353  
354  
355  
356  
357  
358  
359  
360  
361  
362  
363  
364  
365  
366  
367  
368  
369  
370  
371  
372  
373  
374  
375  
376  
377  
378  
379  
380  
381  
382  
383  
384  
385  
386  
387  
388  
389  
390  
391  
392  
393  
394  
395  
396  
397  
398  
399  
400  
401  
402  
403  
404  
405  
406  
407  
408  
409  
410  
411  
412  
413  
414  
415  
416  
417  
418  
419  
420  
421  
422  
423  
424  
425  
426  
427  
428  
429  
430  
431  
432  
433  
434  
435  
436  
437  
438  
439  
440  
441  
442  
443  
444  
445  
446  
447  
448  
449  
450  
451  
452  
453  
454  
455  
456  
457  
458  
459  
460  
461  
462  
463  
464  
465  
466  
467  
468  
469  
470  
471  
472  
473  
474  
475  
476  
477  
478  
479  
480  
481  
482  
483  
484  
485  
486  
487  
public class Atan2 {

    ///////////////////////////////////////
    // Benchmark Settings
    ///////////////////////////////////////

    private static final double LOW_D = -3;
    private static final double HIGH_D = 3;
    private static final double INC_D = 0.001D;
    private static final float LOW_F = (float) LOW_D;
    private static final float HIGH_F = (float) HIGH_D;
    private static final float INC_F = (float) INC_D;

    public static Options buildOptions() {
        return new OptionsBuilder()
                .include(Atan2.class.getSimpleName())
                .warmupTime(new TimeValue(2, TimeUnit.SECONDS))
                .warmupIterations(4)
                .measurementTime(new TimeValue(2, TimeUnit.SECONDS))
                .measurementIterations(4)
                .forks(1)
                .mode(Mode.AverageTime)
                .timeUnit(TimeUnit.MILLISECONDS)
                .threads(1)
                .build();
    }

    ///////////////////////////////////////
    // Accuracy
    ///////////////////////////////////////

    public static void printError() {
        Atan2Benchmark[] benchmarks = new Atan2Benchmark[]{
            new Apache(),
            new Default(),
            new Diamond(),
            new DSPAccurate(),
            new DSPFast(),
            new Icecore(),
            new Kappa(),
            new Riven(),};

        System.out.println(String.format("A lower average means higher accuracy."));
        for (Atan2Benchmark benchmark : benchmarks) {
            benchmark.printError(LOW_D, HIGH_D, INC_D);
        }
    }

    ///////////////////////////////////////
    // Baseline
    ///////////////////////////////////////

    @Benchmark
    public double baseline() {
        double sum = 0;
        for (double x = LOW_D; x < HIGH_D; x += INC_D) {
            for (double y = LOW_D; y < HIGH_D; y += INC_D) {
                sum += x + y;
            }
        }
        return sum;
    }

    ///////////////////////////////////////
    // Default atan2
    ///////////////////////////////////////

    public static final class Default extends Atan2Benchmark {

        public Default() {
            super("Default");
        }

        @Override
        public float test(float x, float y) {
            return (float) Math.atan2(x, y);
        }
    }

    @Benchmark
    public double atan2_default() {
        double sum = 0;
        for (double x = LOW_D; x < HIGH_D; x += INC_D) {
            for (double y = LOW_D; y < HIGH_D; y += INC_D) {
                sum += Math.atan2(x, y);
            }
        }
        return sum;
    }

    ///////////////////////////////////////
    // Apache's FastMath.atan2 ( http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/util/FastMath.html )
    ///////////////////////////////////////

    public static final class Apache extends Atan2Benchmark {

        public Apache() {
            super("Apache");
        }

        @Override
        public float test(float x, float y) {
            return (float) FastMath.atan2(x, y);
        }
    }

    @Benchmark
    public double atan2_apache() {
        double sum = 0;
        for (double x = LOW_D; x < HIGH_D; x += INC_D) {
            for (double y = LOW_D; y < HIGH_D; y += INC_D) {
                sum += FastMath.atan2(x, y);
            }
        }
        return sum;
    }

    ///////////////////////////////////////
    // Riven's atan2 ( http://www.java-gaming.org/index.php?topic=14647.0 )
    ///////////////////////////////////////

    public static final class Riven extends Atan2Benchmark {

        public Riven() {
            super("Riven");
        }

        @Override
        public float test(float x, float y) {
            return Riven.atan2(x, y);
        }

        private static final int ATAN2_BITS = 7;

        private static final int ATAN2_BITS2 = ATAN2_BITS << 1;
        private static final int ATAN2_MASK = ~(-1 << ATAN2_BITS2);
        private static final int ATAN2_COUNT = ATAN2_MASK + 1;
        private static final int ATAN2_DIM = (int) Math.sqrt(ATAN2_COUNT);

        private static final float INV_ATAN2_DIM_MINUS_1 = 1.0f / (ATAN2_DIM - 1);

        private static final float[] atan2 = new float[ATAN2_COUNT];

        static {
            for (int i = 0; i < ATAN2_DIM; i++) {
                for (int j = 0; j < ATAN2_DIM; j++) {
                    float x0 = (float) i / ATAN2_DIM;
                    float y0 = (float) j / ATAN2_DIM;
                    atan2[j * ATAN2_DIM + i] = (float) Math.atan2(y0, x0);
                }
            }
        }

        public static final float atan2(float y, float x) {
            float add, mul;

            if (x < 0.0f) {
                if (y < 0.0f) {
                    x = -x;
                    y = -y;

                    mul = 1.0f;
                } else {
                    x = -x;
                    mul = -1.0f;
                }
                add = -3.141592653f;
            } else {
                if (y < 0.0f) {
                    y = -y;
                    mul = -1.0f;
                } else {
                    mul = 1.0f;
                }
                add = 0.0f;
            }

            float invDiv = 1.0f / (((x < y) ? y : x) * INV_ATAN2_DIM_MINUS_1);

            int xi = (int) (x * invDiv);
            int yi = (int) (y * invDiv);

            return (atan2[yi * ATAN2_DIM + xi] + add) * mul;
        }
    }

    @Benchmark
    public float atan2_riven() {
        float sum = 0;
        for (float x = LOW_F; x < HIGH_F; x += INC_F) {
            for (float y = LOW_F; y < HIGH_F; y += INC_F) {
                sum += Riven.atan2(x, y);
            }
        }
        return sum;
    }

    ///////////////////////////////////////
    // DSP's atan2 ( http://dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normalization )
    ///////////////////////////////////////

    public static final class DSPFast extends Atan2Benchmark {

        public DSPFast() {
            super("DSPFast");
        }

        @Override
        public float test(float x, float y) {
            return DSPFast.atan2(x, y);
        }

        private static final float PI_2 = 1.5707963F; // Math.PI / 2
        private static final float PI_4 = 0.7853982F; // Math.PI / 4
        private static final float PI_3_4 = 2.3561945F; // (Math.PI / 4) * 3
        private static final float MINUS_PI_2 = -1.5707963F; // Math.PI / -2

        public static final float atan2(float y, float x) {
            float r;
            float abs_y = Float.intBitsToFloat(Float.floatToRawIntBits(y) << 1 >>> 1);
            if (x == 0.0F) {
                if (y > 0.0F) {
                    return PI_2;
                }
                if (y == 0.0F) {
                    return 0.0f;
                }
                return MINUS_PI_2;
            } else if (x > 0) {
                r = (x - abs_y) / (x + abs_y);
                r = PI_4 - PI_4 * r;
            } else {
                r = (x + abs_y) / (abs_y - x);
                r = PI_3_4 - PI_4 * r;
            }
            return y < 0 ? -r : r;
        }
    }

    public static final class DSPAccurate extends Atan2Benchmark {

        public DSPAccurate() {
            super("DSPAccurate");
        }

        @Override
        public float test(float x, float y) {
            return DSPAccurate.atan2(x, y);
        }

        private static final float PI_2 = 1.5707963F; // Math.PI / 2
        private static final float PI_4 = 0.7853982F; // Math.PI / 4
        private static final float PI_3_4 = 2.3561945F; // (Math.PI / 4) * 3
        private static final float MINUS_PI_2 = -1.5707963F; // Math.PI / -2

        public static final float atan2(float y, float x) {
            float r;
            float c;
            float abs_y = Float.intBitsToFloat(Float.floatToRawIntBits(y) << 1 >>> 1);
            if (x == 0.0F) {
                if (y > 0.0F) {
                    return PI_2;
                }
                if (y == 0.0F) {
                    return 0.0f;
                }
                return MINUS_PI_2;
            } else if (x > 0) {
                r = (x - abs_y) / (x + abs_y);
                c = PI_4;
            } else {
                r = (x + abs_y) / (abs_y - x);
                c = PI_3_4;
            }
            r = 0.1963F * r * r * r - 0.9817F * r + c;
            return y < 0 ? -r : r;
        }
    }

    @Benchmark
    public float atan2_dsp_fast() {
        float sum = 0;
        for (float x = LOW_F; x < HIGH_F; x += INC_F) {
            for (float y = LOW_F; y < HIGH_F; y += INC_F) {
                sum += DSPFast.atan2(x, y);
            }
        }
        return sum;
    }

    @Benchmark
    public float atan2_dsp_accurate() {
        float sum = 0;
        for (float x = LOW_F; x < HIGH_F; x += INC_F) {
            for (float y = LOW_F; y < HIGH_F; y += INC_F) {
                sum += DSPAccurate.atan2(x, y);
            }
        }
        return sum;
    }

    ///////////////////////////////////////
    // kappa's atan2 ( http://www.java-gaming.org/topics/extremely-fast-atan2/36467/msg/346112/view.html#msg346112 )
    ///////////////////////////////////////

    public static final class Kappa extends Atan2Benchmark {

        public Kappa() {
            super("Kappa");
        }

        @Override
        public float test(float x, float y) {
            return Kappa.atan2(x, y);
        }

        private static final float PI = 3.1415927f;
        private static final float PI_2 = PI / 2f;
        private static final float MINUS_PI_2 = -PI_2;
        private static final float C_M = 0.2808722f;
        private static final float C_A = 0.2808722f;

        public static final float atan2(float y, float x) {
            if (x == 0.0f) {
                if (y > 0.0f) {
                    return PI_2;
                }
                if (y == 0.0f) {
                    return 0.0f;
                }
                return MINUS_PI_2;
            }

            final float atan;
            final float z = y / x;
            if (Math.abs(z) < 1.0f) {
                atan = z / (z * z * C_M + 1.0f);
                if (x < 0.0f) {
                    return (y < 0.0f) ? atan - PI : atan + PI;
                }
                return atan;
            } else {
                atan = PI_2 - z / (z * z + C_A);
                return (y < 0.0f) ? atan - PI : atan;
            }
        }
    }

    @Benchmark
    public float atan2_kappa() {
        float sum = 0;
        for (float x = LOW_F; x < HIGH_F; x += INC_F) {
            for (float y = LOW_F; y < HIGH_F; y += INC_F) {
                sum += Kappa.atan2(x, y);
            }
        }
        return sum;
    }

    ///////////////////////////////////////
    // Icecore's atan2 ( http://www.java-gaming.org/topics/extremely-fast-atan2/36467/msg/346145/view.html#msg346145 )
    ///////////////////////////////////////

    public static final class Icecore extends Atan2Benchmark {

        public Icecore() {
            super("Icecore");
        }

        @Override
        public float test(float x, float y) {
            return Icecore.atan2(x, y);
        }

        private static final float PI = (float) Math.PI;
        private static final float PI_2 = PI / 2;
        private static final float PI_NEG_2 = -PI_2;
        private static final int SIZE = 1024;
        private static final float ATAN2[];

        static {
            final int Size_Ar = SIZE + 1;
            ATAN2 = new float[Size_Ar];
            for (int i = 0; i <= SIZE; i++) {
                double d = (double) i / SIZE;
                double x = 1;
                double y = x * d;
                ATAN2[i] = (float) Math.atan2(y, x);
            }
        }

        public static final float atan2(float y, float x) {
            if (y < 0) {
                if (x < 0) {
                    // (y < x) because == (-y > -x)
                    if (y < x) {
                        return PI_NEG_2 - ATAN2[(int) (x / y * SIZE)];
                    } else {
                        return ATAN2[(int) (y / x * SIZE)] - PI;
                    }
                } else {
                    y = -y;
                    if (y > x) {
                        return ATAN2[(int) (x / y * SIZE)] - PI_2;
                    } else {
                        return -ATAN2[(int) (y / x * SIZE)];
                    }
                }
            } else {
                if (x < 0) {
                    x = -x;
                    if (y > x) {
                        return PI_2 + ATAN2[(int) (x / y * SIZE)];
                    } else {
                        return PI - ATAN2[(int) (y / x * SIZE)];
                    }
                } else {
                    if (y > x) {
                        return PI_2 - ATAN2[(int) (x / y * SIZE)];
                    } else {
                        return ATAN2[(int) (y / x * SIZE)];
                    }
                }
            }
        }
    }

    @Benchmark
    public float atan2_icecore() {
        float sum = 0;
        for (float x = LOW_F; x < HIGH_F; x += INC_F) {
            for (float y = LOW_F; y < HIGH_F; y += INC_F) {
                sum += Icecore.atan2(x, y);
            }
        }
        return sum;
    }

    ///////////////////////////////////////
    // Diamond's atan2 ( https://stackoverflow.com/questions/1427422/cheap-algorithm-to-find-measure-of-angle-between-vectors/14675998#14675998 )
    ///////////////////////////////////////

    public static final class Diamond extends Atan2Benchmark {

        public Diamond() {
            super("Diamond");
        }

        @Override
        public float test(float x, float y) {
            return Diamond.atan2(x, y);
        }

        private static final float PI_2 = 1.5707963F; // Math.PI / 2

        public static final float atan2(float y, float x) {
            float angle;
            if (y == 0f && x >= 0f) {
                return 0;
            } else if (y >= 0f) {
                if (x >= 0f) {
                    angle = y / (x + y);
                } else {
                    angle = 1f - x / (-x + y);
                }
            } else {
                if (x < 0f) {
                    angle = -2f + y / (x + y);
                } else {
                    angle = -1f + x / (x - y);
                }
            }
            return angle * PI_2;
        }
    }

    @Benchmark
    public float atan2_diamond() {
        float sum = 0;
        for (float x = LOW_F; x < HIGH_F; x += INC_F) {
            for (float y = LOW_F; y < HIGH_F; y += INC_F) {
                sum += Diamond.atan2(x, y);
            }
        }
        return sum;
    }
}



Updates:
1: Added baseline benchmark. Cleaned up DSPFast/DSPAccurate implementation to look more sane with no performance loss.
2: Improved Kappa's accuracy with no performance loss. Updated Icecore's submission. Retested all algorithms.
3: Added the Diamond implementation.
Offline Icecore
« Reply #1 - Posted 2017-08-30 18:07:29 »

I also have this - it's more Cache friendly, but have more OPs ^^
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
private static final float Pi = (float) Math.PI;
private static final float Pi_H = Pi / 2;
//4 x 4000 = 16 kb L1 Cache
private static final int Size_Ac = 4000;
private static final float Atan2[];

static{
   final int Size_Ar = Size_Ac + 1;
   Atan2 = new float[Size_Ar];
   for(int i = 0; i <= Size_Ac; i++){
      double d = (double) i / Size_Ac;
      double x = 1;
      double y = x * d;
      Atan2[i] = (float) Math.atan2(y, x);
   }
}

static public final float atan2_1(float y, float x){
   if(y < 0){
      if(x < 0){
         // (y < x) because == (-y > -x)
         if(y < x) return -(Pi_H + Atan2[(int) (x / y * Size_Ac)]);
         else return Atan2[(int) (y / x * Size_Ac)] - Pi;
      }
      else{
         y = -y;
         if(y > x) return Atan2[(int) (x / y * Size_Ac)] - Pi_H;
         else return -Atan2[(int) (y / x * Size_Ac)];
      }
   }
   else{
      if(x < 0){
         x = -x;
         if(y > x) return Pi_H + Atan2[(int) (x / y * Size_Ac)];
         else return Pi - Atan2[(int) (y / x * Size_Ac)];
      }
      else{
         if(y > x) return Pi_H - Atan2[(int) (x / y * Size_Ac)];
         else return Atan2[(int) (y / x * Size_Ac)];
      }
   }
}

Last known State: Reassembled in Cyberspace
End Transmission....
..
.
Offline mooman219
« Reply #2 - Posted 2017-08-30 19:11:49 »

@Icecore
It appears to be slower.

1  
2  
3  
4  
Benchmark                  Mode  Cnt   Score   Error  Units
ATAN2.atan2_icecore        avgt    4  53.079 ± 0.157  ms/op
ATAN2.atan2_icecore_cache  avgt    4  58.945 ± 2.317  ms/op
ATAN2.baseline             avgt    4  34.576 ± 0.274  ms/op


Source: https://gist.github.com/mooman219/7a881b2d043354d44036e0df35d1e0b5
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Icecore
« Reply #3 - Posted 2017-09-01 09:28:17 »

It have more accuracy because use bigger array and use less memory because its 1 array – not 8 )
(1000 x 8 x 4(f) = 32kb   4000 x 4(f) = 16 kb)
More array size more accuracy ^^

also may try this - less on one OP =)
1  
2  
3  
++      private static final float Pi_H_M = -Pi_H;
--      if(y < x) return -(Pi_H + Atan2[(int) (x / y * Size_Ac)]);
++      if(y < x) return Pi_H_M - Atan2[(int) (x / y * Size_Ac)];

Last known State: Reassembled in Cyberspace
End Transmission....
..
.
Offline mooman219
« Reply #4 - Posted 2017-09-09 00:49:33 »

I added your changes and updated the benchmark @icecore. I also improved the accuracy of Kappa's atan2. The 0.28 constant seemed suspicious, so I played around with it, getting a lower average error and lower largest error with the constant 0.2808722 instead.
Offline mooman219
« Reply #5 - Posted 2017-09-09 07:54:20 »

Had to fix some issues, but if you're willing to deal with an average error of 2.4 degrees, and a max error of 4.07 degrees, then boy do I have the algorithm for you. Adapted from "encoding 2d angles without trigonometry", I bring you the diamond method. It's the fastest by far.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
private static final float PI_2 = 1.5707963F; // Math.PI / 2

public static final float atan2(float y, float x) {
   float angle;
   if (x == 0f && y == 0f) {
      return 0;
   }
   if (y >= 0f) {
      if (x >= 0f) {
         angle = y / (x + y);
      } else {
         angle = 1f - x / (-x + y);
      }
   } else {
      if (x < 0f) {
         angle = -y / (-x - y) - 2f;
      } else {
         angle = x / (x - y) - 1f;
      }
   }
   return angle * PI_2;
}
Offline Roquen

JGO Kernel


Medals: 516



« Reply #6 - Posted 2017-09-09 11:38:04 »

Broken record time.  Any benchmark that doesn't reflect an actual usage pattern is wishful thinking.  If you're linearly walking through angles you can compute with tiny errors in a couple of products and adds (I'm tired and was thinking forward here, opps)..no lookups no branches.
Offline mooman219
« Reply #7 - Posted 2017-09-09 20:49:23 »

Broken record time.  Any benchmark that doesn't reflect an actual usage pattern is wishful thinking.  If you're linearly walking through angles you can compute with tiny errors in a couple of products and adds (I'm tired and was thinking forward here, opps)..no lookups no branches.

Completely fair and expected. I first made the post with the disclaimer below the results that this is a microbenchmark and has caveats. A LUT could improve the performance of your algorithm, but create some cache pollution that slows down other aspects of your program. LUTs aside, this post isn't a comprehensive search for the best Atan2, it takes a couple and tests some metrics on them. If you're writing a game, this will let you swap in various algorithms and you can see how they perform. It'll hopefully give you some reassurance that you're using a reasonable algorithm when compared to others.
Offline Roquen

JGO Kernel


Medals: 516



« Reply #8 - Posted 2017-09-11 09:00:58 »

My expectation is that the branches will be the much bigger problem in most cases.  If you're making enough atan/atan2 calls to be statistically important then it's pretty likely they will bunch up in time and the cost of loading the LUT to cache won't be a killer.  For the branches it's very unlikely that the probability of each won't be in the neighborhood of .5  so ~n/2 branch mispredictions per call where 'n' is the number of branches along a path.

EDIT: The way the benchmark is structured means the branch will statistically always be correctly predicted.
Offline Icecore
« Reply #9 - Posted 2017-09-12 10:04:53 »

Let make next Contest "Fast Sqrt" Smiley
one from the Baselines https://en.wikipedia.org/wiki/Fast_inverse_square_root
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
float Q_rsqrt( float number )
{
   long i;
   float x2, y;
   const float threehalfs = 1.5F;

   x2 = number * 0.5F;
   y  = number;
   i  = * ( long * ) &y;                       // evil floating point bit level hacking
   i  = 0x5f3759df - ( i >> 1 );               // what the f**k?
   y  = * ( float * ) &i;
   y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//   y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

   return y;
}

Last known State: Reassembled in Cyberspace
End Transmission....
..
.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline kappa
« League of Dukes »

JGO Kernel


Medals: 119
Projects: 15


★★★★★


« Reply #10 - Posted 2017-09-12 10:52:41 »

Let make next Contest "Fast Sqrt" Smiley
From my tests in the past, Java's Math.sqrt() is pretty damn fast these days, only worth it if you can beat that method substantially.
Offline Abuse

JGO Ninja


Medals: 60


falling into the abyss of reality


« Reply #11 - Posted 2017-09-12 13:03:08 »

Let make next Contest "Fast Sqrt" Smiley
From my tests in the past, Java's Math.sqrt() is pretty damn fast these days, only worth it if you can beat that method substantially.

My understanding is that the above approximation is no-longer relevant 'cos we have x86 instructional support (in various forms).
Offline Icecore
« Reply #12 - Posted 2017-09-12 16:44:04 »

Quote
From my tests in the past, Java's Math.sqrt() is pretty damn fast these days
...
we have x86 instructional support
Yes, ok, my mistake - why i even trying...
– no one cares)

Last known State: Reassembled in Cyberspace
End Transmission....
..
.
Pages: [1]
  ignore  |  Print  
 
 

 
Ecumene (110 views)
2017-09-30 02:57:34

theagentd (136 views)
2017-09-26 18:23:31

cybrmynd (245 views)
2017-08-02 12:28:51

cybrmynd (241 views)
2017-08-02 12:19:43

cybrmynd (240 views)
2017-08-02 12:18:09

Sralse (254 views)
2017-07-25 17:13:48

Archive (864 views)
2017-04-27 17:45:51

buddyBro (1008 views)
2017-04-05 03:38:00

CopyableCougar4 (1569 views)
2017-03-24 15:39:42

theagentd (1373 views)
2017-03-24 15:32:08
List of Learning Resources
by elect
2017-03-13 14:05:44

List of Learning Resources
by elect
2017-03-13 14:04:45

SF/X Libraries
by philfrei
2017-03-02 08:45:19

SF/X Libraries
by philfrei
2017-03-02 08:44:05

SF/X Libraries
by SkyAphid
2017-03-02 06:38:56

SF/X Libraries
by SkyAphid
2017-03-02 06:38:32

SF/X Libraries
by SkyAphid
2017-03-02 06:38:05

SF/X Libraries
by SkyAphid
2017-03-02 06:37:51
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!