CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Posted
20030226 16:04:57 » 

Could it be possible to add this into the standard lwgl package? This is the c++ way: 1 2 3 4 5 6 7 8 9
 float InvSqrt (float x) { float xhalf = 0.5f*x; int i = *(int*)&x; i = 0x5f3759df  (i >> 1); x = *(float*)&i; x = x*(1.5f  xhalf*x*x); return x; } 
I was recently asking this




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #1  Posted
20030226 16:13:47 » 

And the Lomont way, which is supposebly better, I'm just picking these up from [/code] float InvSqrt_Lomont(float x) { float xhalf = 0.5f*x; int i = *(int*)&x; i = 0x5f375a86  (i>>1); x = *(float*)&i; x = x*(1.5fxhalf*x*x); // x = x*(1.5fxhalf*x*x); // added precision return x; } // InvSqrt_Lomont
[code]
I'm not sure if this can really boost anything because of native calls, or even if this can be converted to java, but I think its worth a try.
EDIT: This is obviously in wrong place, if admin sees this please move to lwgl forum.




princec


«
Reply #2  Posted
20030226 17:47:16 » 

Seems like a fine addition to me. If you've got the time some performance comparisons would be nice in real world situations (eg. normalise an array of 1,000,000 vector3fs). Cas




Games published by our own members! Check 'em out!


CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #3  Posted
20030227 08:15:11 » 

My problem is that I don't have decent c++ compiler available on my home computer. So it mean it will take at least 2 weeks to get back to home and then get c++ compiler.
Anyone here who has the possibility of doing this??




princec


«
Reply #4  Posted
20030227 09:08:56 » 

I didn't see any reason for it not to be implemented in Java.... Cas




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #5  Posted
20030227 09:43:26 » 

Umm what? I thought you can't do "binary logic" in java? Now what exactly have I missed? For example I have avsolutely no idea how to express if this is achievable then of course I will do the tests!




alexz
Senior Newbie
Java games rock!


«
Reply #6  Posted
20030227 09:53:19 » 

IMO you can do this via NIO (direct ByteBuffer with IntBuffer and FloatBuffer attached)...




NielsJ
Senior Newbie
Java games rock!


«
Reply #7  Posted
20030227 10:53:20 » 

hmm.. I'm not sure I see the point in involving buffers  it's a simple floatin, floatout method call ... Depending on the implementation of FloatToIntBits and IntBitsToFloat, a pure java implementation might also be viable.  modified: sorry, didn't read all posts , the code in java would be: float f = 2; int x = Float.floatToIntBits(f); f = Float.intBitsToFloat(x);




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #8  Posted
20030227 11:23:14 » 

hmm.. I'm not sure I see the point in involving buffers  it's a simple floatin, floatout method call ... Depending on the implementation of FloatToIntBits and IntBitsToFloat, a pure java implementation might also be viable.  modified: sorry, didn't read all posts , the code in java would be: float f = 2; int x = Float.floatToIntBits(f); f = Float.intBitsToFloat(x); I don't think this works... when used in the method I get .33295277 from 9... This is well beyond my knowledge, my initial thought was to call it through JNI, but if someone can do pure or "somewhat" pure java implementation it would be better, wouldn't it?




alexz
Senior Newbie
Java games rock!


«
Reply #9  Posted
20030227 12:42:11 » 

Ok! I've done this quick C++ to Java conversion... Here are two test functions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
 private static void testInvSqrt() { Vector3f v = new Vector3f(1, 2, 3); float sum = 0;
long start = System.currentTimeMillis(); for (int i = 0; i < 1000000; ++i) { v.x += i * 0.001f; v.y += i * 0.001f; v.z += i * 0.001f; sum += (float)(1.0 / Math.sqrt(v.x * v.x + v.y * v.y + v.z * v.z)); } long end = System.currentTimeMillis(); System.out.println("1.0 / Math.sqrt(): " + (end  start) + "ms, sum = " + sum); }
private static void testInvSqrtNEW() { Vector3f v = new Vector3f(1, 2, 3); float sum = 0;
long start = System.currentTimeMillis(); for (int i = 0; i < 1000000; ++i) { v.x += i * 0.001f; v.y += i * 0.001f; v.z += i * 0.001f; sum += newInvSqrt(v.x * v.x + v.y * v.y + v.z * v.z); } long end = System.currentTimeMillis(); System.out.println("newInvSqrt(): " + (end  start) + "ms, sum = " + sum); } 
Here is the newInvSqrt() implementation: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
 static ByteBuffer byteBuf = ByteBuffer.allocateDirect(4) .order(ByteOrder.nativeOrder()); static IntBuffer intBuf = byteBuf.asIntBuffer(); static FloatBuffer floatBuf = byteBuf.asFloatBuffer();
private static int f2i(float f) { floatBuf.put(0, f); return intBuf.get(0); }
private static float i2f(int i) { intBuf.put(0, i); return floatBuf.get(0); }
private static float newInvSqrt(float x) { float xhalf = 0.5f*x; int i = f2i(x); i = 0x5f3759df  (i >> 1); x = i2f(i); x = x*(1.5f  xhalf*x*x); return x; } 
Here is the test invocation code: 1 2 3 4 5
 for (int i = 0; i < 10; ++i) { testInvSqrt(); testInvSqrtNEW(); } 
Here is the result: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
 1.0 / Math.sqrt(): 271ms, sum = 27.828238 newInvSqrt(): 280ms, sum = 27.803202 1.0 / Math.sqrt(): 261ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202 1.0 / Math.sqrt(): 270ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202 1.0 / Math.sqrt(): 261ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202 1.0 / Math.sqrt(): 260ms, sum = 27.828238 newInvSqrt(): 161ms, sum = 27.803202 1.0 / Math.sqrt(): 270ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202 1.0 / Math.sqrt(): 261ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202 1.0 / Math.sqrt(): 280ms, sum = 27.828238 newInvSqrt(): 171ms, sum = 27.803202 1.0 / Math.sqrt(): 260ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202 1.0 / Math.sqrt(): 271ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202 
I think the code is clear and doesn't need my explanation...




Games published by our own members! Check 'em out!


alexz
Senior Newbie
Java games rock!


«
Reply #10  Posted
20030227 12:46:52 » 

The code tag skewed my code line indents...




alexz
Senior Newbie
Java games rock!


«
Reply #11  Posted
20030227 12:54:29 » 

The result of using native Float.floatToIntBits() and Float.intBitsToFloat() instead of NIO: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
 1.0 / Math.sqrt(): 270ms, sum = 27.828238 newInvSqrt(): 441ms, sum = 27.803202 1.0 / Math.sqrt(): 250ms, sum = 27.828238 newInvSqrt(): 391ms, sum = 27.803202 1.0 / Math.sqrt(): 250ms, sum = 27.828238 newInvSqrt(): 381ms, sum = 27.803202 1.0 / Math.sqrt(): 260ms, sum = 27.828238 newInvSqrt(): 381ms, sum = 27.803202 1.0 / Math.sqrt(): 260ms, sum = 27.828238 newInvSqrt(): 381ms, sum = 27.803202 1.0 / Math.sqrt(): 250ms, sum = 27.828238 newInvSqrt(): 391ms, sum = 27.803202 1.0 / Math.sqrt(): 250ms, sum = 27.828238 newInvSqrt(): 391ms, sum = 27.803202 1.0 / Math.sqrt(): 250ms, sum = 27.828238 newInvSqrt(): 391ms, sum = 27.803202 1.0 / Math.sqrt(): 250ms, sum = 27.828238 newInvSqrt(): 391ms, sum = 27.803202 1.0 / Math.sqrt(): 250ms, sum = 27.828238 newInvSqrt(): 380ms, sum = 27.803202 




cfmdobbie
Senior Duke Medals: 1
Who, me?


«
Reply #12  Posted
20030227 13:08:32 » 

Hrm, I thought NewtonRaphson required guessing a possible result, then looping refining your guess until the required precision is reached? The results I get with the following conversion of the above code is pretty good, accurate to two dp initially, but slipping thereafter: 1 2 3 4 5 6 7 8 9
 public static float sqrt(float x) { float xhalf = 0.5f * x ; int i = Float.floatToIntBits(x) ; i = 0x5f375a86  (i >> 1) ; x = Float.intBitsToFloat(i) ; x *= (1.5f  xhalf * x * x) ; return 1/x ; } 
What's the name of this algorithm, and where did you find it?

Hellomynameis Charlie Dobbie.



NielsJ
Senior Newbie
Java games rock!


«
Reply #13  Posted
20030227 13:34:32 » 

Amazing performance from the Float.xxxBits methods, eh?
How can it possibly be slower then NIO? It should at the very least be the same.




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #14  Posted
20030227 13:35:48 » 

Good job alexz, I'm getting vastly different performance, in a loop I get java.lang.Math.sqrt(): 1862 fos3d.util.Math.invSqrt(): 7331 With code like this 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 long starttime = 0, entime = 0; starttime = System.currentTimeMillis(); for(int i = 0; i<100000000; i++){ float f = 1/(float)java.lang.Math.sqrt(9d); } entime = System.currentTimeMillis(); System.out.println("JAVA FUNC: " + (entime  starttime) ); float f = 0; long starttime2 = 0, entime2 = 0; starttime2 = System.currentTimeMillis(); for(int i = 0; i<100000000; i++){ f = Math.sqrt(9f); } entime2 = System.currentTimeMillis(); System.out.println("My SHIT FUNC: " + (entime2  starttime2) ); 
And the results are not what they are supposed to resemple. From 9f i get 2.996579, which is pretty good.




princec


«
Reply #15  Posted
20030227 14:02:46 » 

I like this new sqrt method so much I've added it to the LWJGL Math class. It's accurate to at least 3dp and the accuracy can be tuned by simply repeating the last line of the equation over and over as many times as you like. What's the theory behind it? Cas




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #16  Posted
20030227 14:39:03 » 

I like this new sqrt method so much I've added it to the LWJGL Math class. It's accurate to at least 3dp and the accuracy can be tuned by simply repeating the last line of the equation over and over as many times as you like. What's the theory behind it? Cas Awesome, I wish I could explain.




qonjure
Junior Newbie
When nature calls we all shall drown.


«
Reply #17  Posted
20030227 19:03:01 » 





alexz
Senior Newbie
Java games rock!


«
Reply #18  Posted
20030227 20:20:14 » 

Ooops... Do you remember my previous result? 1.0 / Math.sqrt(): 261ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202
This was done on P3 600 MHz (JDK 1.4.1)... And here is the result on P4 1.6 GHz (JDK 1.4.1): 1.0 / Math.sqrt(): 70ms, sum = 27.828238 newInvSqrt(): 110ms, sum = 27.803202
Magic! Isn't it?




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #19  Posted
20030228 08:07:10 » 

I get same kind of result, my Athlon 1.33ghz gets ~1100ms from math square and about ~7400ms from inv sqrt.




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #20  Posted
20030228 08:08:02 » 

Are you using JRE 1.4.1 with JIT???




ryanm


«
Reply #21  Posted
20030228 08:20:51 » 

1.0 / Math.sqrt(): 261ms, sum = 27.828238 newInvSqrt(): 160ms, sum = 27.803202
This was done on P3 600 MHz (JDK 1.4.1)... And here is the result on P4 1.6 GHz (JDK 1.4.1):
1.0 / Math.sqrt(): 70ms, sum = 27.828238 newInvSqrt(): 110ms, sum = 27.803202
looks like the Math.sqrt() is being compiled to some fancydan hardware sqrt operation on newer processors. It's nice to know that the JVM does this where it can, but also a pain in the arse if you want top performance across a wide range of target hardware. At last! My methodology of coding to the api, and avoiding nasty hacks, in the slim hope that Sun would someday optimise it for me, is vindicated:D. Sort of :/.




alexz
Senior Newbie
Java games rock!


«
Reply #22  Posted
20030228 09:08:50 » 

Yes, I used JRE with JIT... Actually there are lot of C++ optimization tricks, which don't work in Java (didn't at all or did work prior to some "point")... I believe that JVM has special path for some of java.lang.Math methods as well as for other java.* ones, so it effectively inlines them in the final native code. In this case making your own implementation of a standard method (sqrt(), for instance) doesn't make sence especially in a long run. So you can think about java.* classes as they have superior performance already! So my opinion is that the only optimization at algorithm level is fully under your control and lowlevel tricks are... for JVM! OR You can keep Java code at highlevel (objectlevel) part and C++ code at lowlevel (polygonlevel, drawing) part of your game (engine) with all of the tricks, etc. I think that this is the optimal way for "hardcore" Java gaming at the moment!




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #23  Posted
20030228 09:38:47 » 

The new computers (I think it was included with MMX) have pipeline slot for straight repeated arithmetic calculations(exponent, sqrt, pi, and whatnot), but it doesn't make sense the jump is involved with 600mhz pentium to 1.4ghz, since technically they both be MMX. Does anyone know whether the new media processors have sqrt slot built into the cycle?
I have over 514% difference here in the sqrt calculations, so it must mean that the JIT really is doing its work, which is always good to know. However, the speed could be increased.
But still, integer 9^(1/2) is faster than (int)Math.sqrt(9)
Has anyone tested this with older versions of Java?




elias


«
Reply #24  Posted
20030228 17:11:38 » 

But still, integer 9^(1/2) is faster than (int)Math.sqrt(9)
I don't hope you litterally mean "9^(1/2)", because ^ is xor in java, not a power function.  elias




rgeimer
Senior Newbie


«
Reply #25  Posted
20030228 19:05:35 » 

The new computers (I think it was included with MMX) have pipeline slot for straight repeated arithmetic calculations(exponent, sqrt, pi, and whatnot), but it doesn't make sense the jump is involved with 600mhz pentium to 1.4ghz, since technically they both be MMX. Does anyone know whether the new media processors have sqrt slot built into the cycle? Perhaps there is something in SSE2, which is included in the P4 but not the P3? http://www.tommesani.com/InstructionSetCPU.html




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #26  Posted
20030301 03:35:16 » 

I don't hope you litterally mean "9^(1/2)", because ^ is xor in java, not a power function.
 elias
Well I do mean that, because I tested it and it was about 1000 ms faster. I bet you can test that your self easily as well. However, whatever your result might be I'm 100% sure that it is faster.




elias


«
Reply #27  Posted
20030301 07:39:04 » 

I could and would, if only ^ were the power operator, but it isn't. ^ in java is the boolean xor operator. Take this example: 1 2 3 4 5 6 7
 public class Test { public static void main(String[] args) { float wrong_sqrt = 9^(2); float real_sqrt = 9*9; System.out.println(wrong_sqrt + " != " + real_sqrt); } } 
gives me: 1 2
 [elias@ip172 snot]$ javac Test.java && java Test 11.0 != 81.0 
see http://java.sun.com/docs/books/jls/second_edition/html/expressions.doc.html#5233 for more information  elias




CaptainGoatse
Junior Duke
I suck at teh 2D. XBOX IS BIG LOL!111


«
Reply #28  Posted
20030301 10:14:46 » 

OK, thank god it came clear now, I have always been under impression that ^is truly ^




Viro
Senior Newbie
Java games rock!


«
Reply #29  Posted
20040530 10:30:47 » 

The difference between the speed of the Math.sqrt on a P4 and P3 is probably due to the difference in the way the fsqrt opcode is implemented on the processors. On the P4, the fsqrt instruction takes 38 clock ticks to complete while on the P3, it takes 57(!!) clock ticks. That would explain why you see much better results with Math.sqrt on a P4. My guess is that the JVM emits an fsqrt instruction for each call to Math.sqrt. More information about the performance of these floating point instructions on various processors can be found at http://www.aceshardware.com/Spades/read.php?article_id=25000194




