Java-Gaming.org    
Featured games (78)
games approved by the League of Dukes
Games in Showcase (426)
Games in Android Showcase (89)
games submitted by our members
Games in WIP (466)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  FastMath: hypot(int,int)  (Read 1451 times)
0 Members and 1 Guest are viewing this topic.
Online Riven
Showcase Moderator

JGO Overlord


Medals: 611
Projects: 4
Exp: 16 years


Hand over your head.


« Posted 2013-03-12 08:33:37 »

Using a lookup table it is possible to vastly outperform Math.hypot(x,y), as long as the input values are integers and they are in a limited range (in this case: -512..+511). Within this range, we observe a factor 40 speedup and an accuracy of 100%, while outside of this range, it's still a respectable factor 4, with a worst case accuracy of 98%.

This code is used to speedup A* cost heuristic.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
public class FastMath
{
   private static final int HYPOT_LUT_SHIFT = 9; // 2^9=512
  private static final int HYPOT_LUT_MAX_RANGE = 1 << HYPOT_LUT_SHIFT;
   private static final float[] HYPOT_LUT;
   private static final int NARROW_DOWN_MUL = 9;
   private static final int NARROW_DOWN_DIV = NARROW_DOWN_MUL + 1;
   private static final float NARROW_DOWN_FACTOR = (float) NARROW_DOWN_DIV / NARROW_DOWN_MUL;

   static {
      class Helper {
         public float[] compute() {
            float[] arr = new float[HYPOT_LUT_MAX_RANGE * HYPOT_LUT_MAX_RANGE];
            EasyMath.init(arr);
            return arr;
         }
      }

      long t0 = System.nanoTime();
      HYPOT_LUT = new Helper().compute();
      long t1 = System.nanoTime();
      System.out.println("EasyMath.LUT calculation took " + ((t1 - t0) / 1000000) + "ms");

      //verify();

      //bench();
  }

   public static float hypot(int dx, int dy) {
      int ix = Math.abs(dx);
      int iy = Math.abs(dy);

      float mul = 1.0f;
      while (((ix | iy) >> HYPOT_LUT_SHIFT) > 0) {
         // narrow down, reasonably accurately
        ix = ix * NARROW_DOWN_MUL / NARROW_DOWN_DIV;
         iy = iy * NARROW_DOWN_MUL / NARROW_DOWN_DIV;
         mul *= NARROW_DOWN_FACTOR;
      }
      return HYPOT_LUT[(iy << HYPOT_LUT_SHIFT) | ix] * mul;
   }

   private static void init(final float[] lut) {
      if (false) {
         WorkerPool.distibute((Object[]) null, HYPOT_LUT_MAX_RANGE, WorkerPool.THREAD_COUNT, new WorkerPool.TaskArrayOperator<Object>() {
            @Override
            public void operate(int taskid, Object[] data, int off, int end) {
               for (int dx = off; dx < end; dx++) {
                  for (int dy = 0; dy < HYPOT_LUT_MAX_RANGE; dy++) {
                     lut[(dy << HYPOT_LUT_SHIFT) | dx] = (float) Math.hypot(dx, dy);
                  }
               }
            }
         });
      } else {
         for (int dx = 0; dx < HYPOT_LUT_MAX_RANGE; dx++) {
            for (int dy = 0; dy < HYPOT_LUT_MAX_RANGE; dy++) {
               lut[(dy << HYPOT_LUT_SHIFT) | dx] = (float) Math.hypot(dx, dy);
            }
         }
      }
   }

   private static void verify() {
      final int ext = 3;

      for (int dx = 0; dx < HYPOT_LUT_MAX_RANGE; dx++) {
         for (int dy = 0; dy < HYPOT_LUT_MAX_RANGE; dy++) {
            float d1 = hypot(dx, dy);
            float d2 = (float) Math.hypot(dx, dy);
            if (d1 != d2) {
               throw new IllegalStateException(dx + "," + dy + " --> " + d1 + ", " + d2);
            }
         }
      }

      for (int dx = -HYPOT_LUT_MAX_RANGE + 1; dx < HYPOT_LUT_MAX_RANGE; dx++) {
         for (int dy = -HYPOT_LUT_MAX_RANGE + 1; dy < HYPOT_LUT_MAX_RANGE; dy++) {
            float d1 = hypot(dx, dy);
            float d2 = (float) Math.hypot(dx, dy);
            if (d1 != d2) {
               throw new IllegalStateException(dx + "," + dy + " --> " + d1 + ", " + d2);
            }
         }
      }

      float maxError = 0.98f;
      for (int dx = -HYPOT_LUT_MAX_RANGE * ext; dx < HYPOT_LUT_MAX_RANGE * ext; dx++) {
         for (int dy = -HYPOT_LUT_MAX_RANGE * ext; dy < HYPOT_LUT_MAX_RANGE * ext; dy++) {
            float d1 = hypot(dx, dy);
            float d2 = (float) Math.hypot(dx, dy);
            if (d1 / d2 < maxError || d2 / d1 < maxError) {
               System.err.println(dx + "," + dy + " --> " + d1 + ", " + d2 + " (" + d1 / d2 + ")");
            }
         }
         System.out.println("dx=" + dx);
      }
   }

   private static void bench() {

      final int ext = 3;

      float sum0 = 0.0f;
      float sum1 = 0.0f;
      float sum2 = 0.0f;

      int ops0 = 0;
      int ops1 = 0;
      int ops2 = 0;

      System.out.println("-");
      long t0 = System.nanoTime();
      for (int dx = -HYPOT_LUT_MAX_RANGE * ext; dx < HYPOT_LUT_MAX_RANGE * ext; dx++) {
         for (int dy = -HYPOT_LUT_MAX_RANGE * ext; dy < HYPOT_LUT_MAX_RANGE * ext; dy++) {
            sum0 += hypot(dx, dy);
            ops0 += 1;
         }
      }
      System.out.println("a");
      long t1 = System.nanoTime();
      for (int dx = -HYPOT_LUT_MAX_RANGE + 1; dx < HYPOT_LUT_MAX_RANGE; dx++) {
         for (int dy = -HYPOT_LUT_MAX_RANGE + 1; dy < HYPOT_LUT_MAX_RANGE; dy++) {
            sum1 += hypot(dx, dy);
            ops1 += 1;
         }
      }
      System.out.println("b");
      long t2 = System.nanoTime();
      for (int dx = -HYPOT_LUT_MAX_RANGE + 1; dx < HYPOT_LUT_MAX_RANGE; dx++) {
         for (int dy = -HYPOT_LUT_MAX_RANGE + 1; dy < HYPOT_LUT_MAX_RANGE; dy++) {
            sum2 += (float) Math.hypot(dx, dy);
            ops2 += 1;
         }
      }
      System.out.println("c");
      long t3 = System.nanoTime();

      System.out.println("EasyMath.LUT out of range: " + (int) (ops0 / ((t1 - t0) / 1000000.0)) + "K ops/sec [" + sum0 + "]");
      System.out.println("EasyMath.LUT in range: " + (int) (ops1 / ((t2 - t1) / 1000000.0)) + "K ops/sec [" + sum1 + "]");
      System.out.println("EasyMath.LUT native: " + (int) (ops2 / ((t3 - t2) / 1000000.0)) + "K ops/sec [" + sum2 + "]");
   }
}


1  
2  
3  
4  
EasyMath.LUT calculation took 65ms
EasyMath.LUT out of range:   4861K ops/sec [1.10062418E10]
EasyMath.LUT in range:      57928K ops/sec [4.09633024E8]
EasyMath.LUT native:         1432K ops/sec [4.09633024E8]

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline princec

JGO Kernel


Medals: 284
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #1 - Posted 2013-03-12 09:44:34 »

By way of a little thinking about this optimisation -

We were running our Battledroid sim nonvisually, flat out. In the sim there are 1,000 G101 battledroids looking for each other on a 256x256 tile map, and blasting at each other with Standard Issue Puppygames Ordinary Blasters. The sim would run in approximately 12 seconds or so before a victorious side was declared. When put through the profiler we discovered an astonishing 75% of CPU time was taken up with Math.hypot(); about a third of which was just some foolish range checking code that was using hypot() when a squared distance would have done and easily optimised away, but the other two thirds was used by the getDistance() method of our A* pathfinder which the droids use to find each other.

Math.hypot() has several built-in disadvantages compared to a more specialised version. Firstly, it has to operate with double-precision accuracy, and it's running using strictfp here on my Java7 Server VM, due to its specifications (this is good; we need absolutely deterministic behaviour between VMs). Secondly, it has to operate on a totally arbitrary range of double-precision floating point inputs, which means it can't take any shortcuts at all and needs to work everything out properly every time it's called.

Our requirements are much more simple. We only use integer coordinates, and we've got a pretty restricted range of possible input values as well. It is this that allows us to use a (pretty massive) look up table (in our case we're using a 16mb table!) Even waiting for RAM to page in and out we're still vastly quicker than waiting for the extremely accurate Math.hypot() to calculate.

Finally I've got a suspicion that from tick to tick the sequence of values passed in to the hypot() function change very little and tend to be clustered. My suspicions lead me to think that we're actually not thrashing our L2 cache that much at all to keep all the relevant pages in RAM. I wonder if there is a tool we can use that will show us exactly how we're hammering the caches. Anyway, that accounts for the 40x speedup in most cases, I think.

The same sim now runs in approximately 3 seconds.

Cas Smiley

Offline Mike

JGO Ninja


Medals: 69
Projects: 1
Exp: 5 years


Java guru wanabee


« Reply #2 - Posted 2013-03-12 09:50:04 »

Did you try it with the Apache FastMath library as well? I kept running into hypot/pow performance issues as well and that one pretty much solved all my issues. It might not be 40 times as quick but maybe it'd beat the 4 times out of the range and wouldn't require the 16mb.

Nevertheless, another piece of nice code Riv Smiley

Mike

My current game, Minecraft meets Farmville and goes online Smiley
State of Fortune | Discussion thread @ JGO
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline delt0r

JGO Coder


Medals: 22


Computers can do that?


« Reply #3 - Posted 2013-03-12 09:51:02 »

Good part of java Math performance is also the requirement of being accurate to 2ulp. This costs quite a bit and equivalent C would require a few compiler flags to be compliant.

Thou for 2d stuff I wrote a quick and dirty game that used complex numbers for angles. This meant i never ever needed to call a any trig function at all and almost never a sqrt. I also had a all integer version.

I have no special talents. I am only passionately curious.--Albert Einstein
Online Riven
Showcase Moderator

JGO Overlord


Medals: 611
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2013-03-12 09:53:51 »

Apache FastMath.hypot(x,y) uses Math.sqrt(n)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Mike

JGO Ninja


Medals: 69
Projects: 1
Exp: 5 years


Java guru wanabee


« Reply #5 - Posted 2013-03-12 09:55:27 »

Indeed, apparently Math.sqrt is quicker than Math.hypot Smiley

Mike

My current game, Minecraft meets Farmville and goes online Smiley
State of Fortune | Discussion thread @ JGO
Online Riven
Showcase Moderator

JGO Overlord


Medals: 611
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #6 - Posted 2013-03-12 09:55:47 »

Thou for 2d stuff I wrote a quick and dirty game that used complex numbers for angles.
We don't use hypot for angles.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Online Riven
Showcase Moderator

JGO Overlord


Medals: 611
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #7 - Posted 2013-03-12 10:15:16 »

1  
2  
3  
EasyMath.hypot:  57928K ops/sec [4.09633024E8]
Math.hypot:       1432K ops/sec [4.09633024E8]
Math.sqrt:       65652K ops/sec [4.09633024E8]


 Emo

1  
2  
3  
4  
5  
6  
public class FastMath
{
   public static float hypot(float dx, float dy) {
      return (float) Math.sqrt(dx * dx + dy * dy);
   }
}


*sigh*



Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline princec

JGO Kernel


Medals: 284
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #8 - Posted 2013-03-12 10:38:09 »

Worth pointing out we used hypot() in the first place because of our need for strict accuracy between VMs Smiley Turns out it's a bit too accurate for its own good.

Cas Smiley

Offline Mike

JGO Ninja


Medals: 69
Projects: 1
Exp: 5 years


Java guru wanabee


« Reply #9 - Posted 2013-03-12 10:38:43 »

Sorry Riv, I still think you're awesome though!  Pointing Smiley

Mike

My current game, Minecraft meets Farmville and goes online Smiley
State of Fortune | Discussion thread @ JGO
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline delt0r

JGO Coder


Medals: 22


Computers can do that?


« Reply #10 - Posted 2013-03-12 10:47:04 »

The whole point of hypot is to be more accurate than Math.sqrt(x*x+y*y) and to avoid overflow. IIRC by internally using something like a=y/x;Math.abs(x)*Math.sqrt(1+a*a). It is not suppose to be fast.

My point with the complex number thing was that avoid normalization or hypot functions a lot as well. Thou its never been a bottle neck anyway. Integers was for lock step multiplayer. Not convinced just using doubles wouldn't be easier tbh.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline Mike

JGO Ninja


Medals: 69
Projects: 1
Exp: 5 years


Java guru wanabee


« Reply #11 - Posted 2013-03-12 10:49:15 »

The FastMath Library by Apache has some code in their hypot to avoid overflow. On the other hand it isn't as accurate indeed, but I can recommend it to everyone here as it speeds things up a lot.

Mike

My current game, Minecraft meets Farmville and goes online Smiley
State of Fortune | Discussion thread @ JGO
Offline Roquen
« Reply #12 - Posted 2013-03-12 11:24:17 »

Overflow AND underflow...neither of which are a concern here.  hypot is one of those functions with a high stink factor.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (75 views)
2014-04-15 18:08:23

BurntPizza (68 views)
2014-04-15 03:46:01

UprightPath (80 views)
2014-04-14 17:39:50

UprightPath (65 views)
2014-04-14 17:35:47

Porlus (81 views)
2014-04-14 15:48:38

tom_mai78101 (105 views)
2014-04-10 04:04:31

BurntPizza (165 views)
2014-04-08 23:06:04

tom_mai78101 (261 views)
2014-04-05 13:34:39

trollwarrior1 (210 views)
2014-04-04 12:06:45

CJLetsGame (220 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!