Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (516)
Games in Android Showcase (123)
games submitted by our members
Games in WIP (577)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  micro optimizations  (Read 8682 times)
0 Members and 1 Guest are viewing this topic.
Offline leknor

Junior Duke




ROCK!!!


« Posted 2002-10-28 01:54:18 »

Below is what I've found to be the most consistant performance boost with miminal optimization effort on a micro level. Other people are encouraged to post their pet optimizations.

(Re)Structure boolean experssions to evaluate against zero.

Java, and many other languages/cpus, has special opt codes to test if something is equal to zero as opposed to the generic opt codes to test if a == b. I find this most useful with for loops.

For example:
1  
2  
3  
4  
int a[] = new int[1000000];
for (int i = 0; i < a.length; i++) {
// Do stuff
}

is slower than:
1  
2  
3  
4  
5  
int a[] = new int[1000000];
for (int i = a.length-1; i >= 0; i--) {
// EDITED: Thanks to kenrod for noticing a bug, see his post below for details
// Do stuff
}

There is actually another optimization in the above example. The lookup of the length of a is only done once in the second example which also improves the speed, probably more than the test against zero optimization does.

For the record, I consider the follower to be even more optimized but I've found it makes the code harder to read so I never use it.
1  
2  
3  
4  
int a[] = new int[1000000];
for (int i = a.length; --i >= 0;) {
// Do stuff
}

This saves one fetch from memory because the decrement is done with the same fetch of i that is used in the boolean expression instead of being different steps for the decrement and then the test.

Hope this is useful and if I'm wrong please correct me.
Offline pepe

Junior Duke




Nothing unreal exists


« Reply #1 - Posted 2002-10-28 11:31:13 »

Good idea to post these kinds of things. Who knows, it might help.
But, to make the tests more comparable and help to relativise, we should know the jvm you tested this against, and if that performance was tested in a full program, or just a bench.

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline Herkules

Senior Duke




Friendly fire isn't friendly!


« Reply #2 - Posted 2002-10-28 11:31:29 »

Ever measured the effect?

HARDCODE    --     DRTS/FlyingGuns/JPilot/JXInput  --    skype me: joerg.plewe
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline leknor

Junior Duke




ROCK!!!


« Reply #3 - Posted 2002-10-28 12:30:31 »

I don't have any numbers. I've yet to find a way to reliably reproduce performance numbers on a micro benchmark. I will say I mentioned this to the author of idx3d back in 2000 and he thought it was an improvement enough to mention me in the source code and he replied back with an email basicly saying "that was the single most effective optimization I've done." At the time his target JVM was the one with Internet Exploder and Nutscrape 4.

I have yet to see a place where making your booleans test againt zero is slower, so I'd say it can't hurt the performance level. Like I said, it's my pet optimization. You're free to take it or leave it.
Offline kenrod

Senior Newbie




Penfold?


« Reply #4 - Posted 2002-10-28 23:06:16 »

Are your two examples equivalent?

Code:
int a[] = new int[1000000];
for (int i = 0; i < a.length; i++) {
// Do stuff
}  

Code:
int a[] = new int[1000000];
for (int i = a.length; i >= 0; i--) {
// Do stuff
}  

Given that the first loop does an iteration where 'i' equals '0', won't the second loop do an iteration where 'i' equals 'a.length'?

And won't, therefore, this break?
Offline leknor

Junior Duke




ROCK!!!


« Reply #5 - Posted 2002-10-29 02:28:27 »

Doh! You're right!
1  
for (int i = a.length; i >= 0; i--)

Will result in an ArrayIndexOutOfBoundsException at the first iteration. The correct code is:
1  
2  
for (int i = a.length-1; i >= 0; i--)
// Note the "-1" after the a.length

This is why people say optimization is dangerous. You get back in town, think you're being clever but you're really forgetting one important detail and break your code.

Thanks for pointing this out kenrod.
Offline kenrod

Senior Newbie




Penfold?


« Reply #6 - Posted 2002-10-29 02:49:33 »

No problem. Interestingly, your other example (which you don't like to use because it's hard to read) looks like it would solve this...

1  
2  
3  
4  
int a[] = new int[1000000]; 
for (int i = a.length; --i >= 0;) {
// Do stuff
}  


...because the evaluation (and hence the decrement) gets done before the first time through the loop.

Anyway, thank you for the code - I'm busy changing all my loops now to see what impact it has Smiley
Offline leknor

Junior Duke




ROCK!!!


« Reply #7 - Posted 2002-10-29 16:23:34 »

Quote
Anyway, thank you for the code - I'm busy changing all my loops now to see what impact it has Smiley

If you think you can generate reasonable performance numbers for before and after then please do post the results.
Offline GKW

Senior Duke




Revenge is mine!


« Reply #8 - Posted 2002-10-29 19:16:51 »

I was under the impression that the counting backwards trick was only useful on older 386/486 chips.  Due to an unfortunate me v. powertool accident last night I had some extra time to run this test today.  In my little tests method A was faster than the other two methods.  I didn't do anything in the loop.

Timers are in nanoseconds as reported by J3DTimer with a resolution of 3ns.  All methods were run 20 times with only the last ten runs being recorded.

Method A
Average: 19662318
High: 20229735
Low: 19544787

Method B
Average: 23496394
High: 25086930
Low: 21526614

Method C
Average: 24127960
High: 26512731
Low: 19953093
Offline leknor

Junior Duke




ROCK!!!


« Reply #9 - Posted 2002-10-29 21:32:10 »

I took the code from IDX3D can converted the for loops to Method A, they were alread in method B, and ran a few tests. You can download the code I used at: http://flick.nerdc.ufl.edu/IDXBench/ I gotta run so I can't do the math but here is enough output for you to play with. The output is the sample # and then FPS taken every 25 seconds. The test was run on a linux box with no other activity.

$ java -version
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)

$ time java -cp . idx3d_TestApp > /dev/null
idx3dA.idx3d_Scene
0 29.99
1 32.25
2 30.68
3 30.94
4 32.42
5 32.0
6 31.77
7 32.25
8 32.32
9 31.18

real    4m12.390s
user    0m1.820s
sys     0m0.160s

$ time java -cp . idx3d_TestApp > /dev/null
idx3dB.idx3d_Scene
0 30.94
1 32.25
2 31.65
3 32.42
4 31.9
5 32.38
6 31.87
7 31.77
8 32.35
9 31.21

real    4m12.146s
user    0m1.960s
sys     0m0.120s

It looks like Method B is equally as fast or slightly faster.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline EgonOlsen
« Reply #10 - Posted 2002-10-30 04:49:55 »

I played around with such optimizations some time ago for my own software renderer (similar to IDX3D, but better of course... Tongue) and i was unable to meassure a postive effect of this optimization. A problem is, that you usually access the array in the "do stuff" section and it is far more cache friendly if you do this from start to end than backwards. So i ended up being slower sometimes when doing this "optimization". Another idea i tried was to put "do stuff" in an endless loop  and let Java's range checking on arrays (takes place anyway) terminate it by generating an exception which i caught with a try-catch around that loop. Look ugly, feels ugly, should be faster but wasn't.

Offline kenrod

Senior Newbie




Penfold?


« Reply #11 - Posted 2002-10-30 19:19:04 »

First, I'm not going to pretend I've done any timings on this BUT, in the general case, this code would seem to be optimal (addressing your concerns EgonOlsen)...

1  
2  
3  
4  
5  
6  
int a[] = new int[1000000];
int j = -1;
for (int i = a.length; --i >= 0;) {
j++;
// Do stuff
}


...where you use 'j' (not 'i') as your regular loop counter (then you don't have to reverse your arrays or anything).

This seems like such a general case that I'd be tempted to think compilers could optimize this like this anyway, but then again they're probably not sure whether 'a.length' will remain constant for every iteration of the loop (unless it was declared final or something).

I agree with leknor that most machine languages have special instructions for testing zero, as opposed to loading a value into a register (which itself may require a function call or two) and then checking it.

Mind you, these sorts of micro optimizations are pretty academic - they're harder to read and not going to gain you anything over simply writing a better algorithm Smiley
Offline leknor

Junior Duke




ROCK!!!


« Reply #12 - Posted 2002-10-31 03:22:03 »

Kenrod: I've read that newer JVMs can detect loops that won't iterate past the array size and remove bounds checking which is a respectable speed gain.

I wouldn't be suprised if the method you suggest above with the j++; obscures the array access pattern enough that the JIT chooses not to remove bounds checking.

If someone has a link to a reasonably techinical article on how JITs determine when they can skip the bounds check on an array access I'd be intersted.
Offline pepe

Junior Duke




Nothing unreal exists


« Reply #13 - Posted 2002-10-31 06:12:37 »

i've been testing those loop methods on my image filter benchmark, and here is what came out:

test 1, using:
1  
2  
3  
4  
5  
int nbpixels=1024*1024;
for(int loop=0; loop< nbpixels; loop++)
{
  // filtering pixel at offset loop
}
55 million pixels filtered per second ( on 10s, 30s and 60s tests consistently )

test 2, using:
1  
2  
3  
4  
5  
int loop=1024*1024;
for (; loop >=0; loop--)
{
//filtering pixel at offset loop
}
34 million pixels filtered per second. ( on 10s, 30s and 60s tests consistently )

test 3, using
1  
2  
3  
4  
5  
6  
int loop=1024*1024;
int pos=0;
for (; loop >=0; loop--)
{
//filtering pixel at offset pos++
}
50 million pixels filtered per second. ( on 10s, 30s and 60s tests consistently )


The conclusions i can take out of these tests:
- read arrays from the beginning to the end
- these optimisations don't seem to be interesting.

The questions i have are: is the jvm testing at 0 using a special instruction, or does it use the standard, and loads 0 into a register prior testing? that would explain why there are no differences, if it can. We can also argue that register loading could take no time, if done in an other pipeline of the processor, bla bla bla... [paste the code for endless discussion here]

By the way, that was tested on a sun 1.4 client jvm.

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline princec

JGO Kernel


Medals: 409
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #14 - Posted 2002-10-31 07:36:42 »

The JVM does not "use" a special instruction to test against 0. Although it has one, it's turned quite readily into whatever the fastest way to test it is on current hardware. As far as I am aware, the current Pentium 2/3/4 chips execute nearly all of their basic instructions in a single cycle somewhere deep in a pipe. There is, effectively, no performance difference, excepting that Hotspot probably tries to do its bounds checking on the cheap by looking for the common case of counting from 0..n.

Not too long ago this was a valid performance optimisation, but not any more. Not too long ago it was faster to do maths using fixed point integer representations, but not any more. Unless you're working on ARM chips of course Smiley

Cas Smiley

Offline pepe

Junior Duke




Nothing unreal exists


« Reply #15 - Posted 2002-10-31 08:03:16 »

On the pipeline story, even if the register is loaded using an other pipeline, you still have some overhead ( (eventual) save register to stack, load register, compare, (eventual) reload register from stack ) compared to normal one (compare)
Instructions of one cycle should not be pipelined, so the pipes are free for long operations. Thus, if they don't compare with special instructions, it will still take more time.
Would be nice to get the info from the jjit team.

By the way, what is the op for the 'compare to 0' of x86?

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline EgonOlsen
« Reply #16 - Posted 2002-10-31 08:43:30 »

You don't have to do an explicit compare to 0. Just decrement the register (loop counter) and do a jnz (jump not zero) or a jns (jump not signed) to the start of the loop. If the register/counter reaches 0 then (or -1), the loop won't be taken because the zero-flag/signed-flag is set and you are done.

Offline pepe

Junior Duke




Nothing unreal exists


« Reply #17 - Posted 2002-10-31 09:03:40 »

Looks like we've got our x86 asm source. Grin

On x86, what is the register that is used for jumps?
As far as i remember, opposed to 68xxx, x86 have specialized registers.   Are you sure this register would be useable to do all operations needed? If it needs to be pushed, popped from an other one in the loop, of course, the gain would be null.

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline EgonOlsen
« Reply #18 - Posted 2002-10-31 09:10:26 »

You can use the standard registers (eax, ebx, ecx, edx) without problems for this. The main problem is the very limited number of registers on x86. You have to use your registers clever to optimize such things if the code inside the loop gets more complex.

Offline pepe

Junior Duke




Nothing unreal exists


« Reply #19 - Posted 2002-10-31 09:47:24 »

cool, thanks a lot for the infos.

Would be nice to have the native assembly versions of  our  routines as created by the JIT, by the way.

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline rgeimer

Senior Newbie





« Reply #20 - Posted 2002-10-31 19:24:39 »

>By the way, what is the op for the 'compare to 0'
>of x86?

cmp OPERAND,0
jz LABEL

cmp is the compare instruction

jz is "Jump if zero". There is also jnz for "Jump if not zero", etc. Both of these check the zero flag. Other jump instructions check different flags like carry, etc.

OPERAND can be a register or memory.

LABEL is just where you want to jump to.
Offline EgonOlsen
« Reply #21 - Posted 2002-11-01 03:25:07 »

Quote

cmp OPERAND,0
jz LABEL

Correct, but (as i mentioned above) this is not an optimized way of doing it in a loop, because a compare to 0 isn't different from a compare to X in this example. The actual optimization is to get rid of the "cmp" in the loop.

Offline chrisbliss18

Junior Newbie




Perchance to dream is better than to wake


« Reply #22 - Posted 2002-12-09 12:19:36 »

Quote
1  
2  
3  
4  
5  
int a[] = new int[1000000];
for (int i = a.length-1; i >= 0; i--) {
// EDITED: Thanks to kenrod for noticing a bug, see his post below for details
// Do stuff
}



I may be looking at this the wrong way, but are you actually checking for zero in this loop? It would seem that you are looking for i to be less than 0 which is a different check. "i >= 0" is much different than "i == 0" or "i != 0".
Offline pepe

Junior Duke




Nothing unreal exists


« Reply #23 - Posted 2002-12-09 12:56:48 »

The loop has to stop when i gets lower than 0. If we were testing for equality, the loop with 0 value will never happen.

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline chrisbliss18

Junior Newbie




Perchance to dream is better than to wake


« Reply #24 - Posted 2002-12-09 14:53:05 »

That's my point. You were talking about optimizing for "i == 0" when 0 needs to be included in the indexing. If the whole optimization is about "i == 0" then the whole way the code is written would need to be changed. Unless you were just being brief when talking about chips being able to test for 0 quickly. "i >= 0" doesn't test for that equality, rather it's a different test entirely. If I'm remembering my Assembly correctly (I only worked with it for a few months), then the test "i >= 0" results in a call to "GE" and a subsequent conditional jump rather than the clean "JZ". To halt execution of the loop when the counter equals 0 would make something odd looking like:
[Code]
int a[] = new int[1000000];
for (int i = a.length; i != 0; i--)
{
// Do stuff (While remembering that you
// have to subtract one to index the array)
}
[/Code]
Now I'm either off of my nut or it would be very hard to implement this optimization even in a language where you'd be guarenteed that the resulting compile would use the specific chip optimization.
Offline pepe

Junior Duke




Nothing unreal exists


« Reply #25 - Posted 2002-12-09 15:06:31 »

Oh, i see.
Look at answer number 16 in this thread. Both operations can be done the same at assembly level.
In fact, the topic seemed to have quickly slippped away from "i==0" tests to 'i and special cases (0 or -1)" tests.


Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline leknor

Junior Duke




ROCK!!!


« Reply #26 - Posted 2002-12-09 15:06:46 »

chrisbliss18: The optimization is based on "jump if greater than or equal to zero" described at:
http://mrl.nyu.edu/~meyer/jvmref/ref-_ifge.html

On http://mrl.nyu.edu/~meyer/jvmref/ check the following for java opt codes that are somehow related to a boolean test where one part  of the test is zero.

  • ifeq
  • ifge
  • ifgt
  • ifle
  • iflt
  • ifne
Offline chrisbliss18

Junior Newbie




Perchance to dream is better than to wake


« Reply #27 - Posted 2002-12-09 15:09:26 »

leknor: Okay... I've got you now. I got hung up on the whole "JZ" and "JNZ" thing. I'll look at those references. I'm new here and you seem to know your stuff. I'll have to watch what you put up.
Offline leknor

Junior Duke




ROCK!!!


« Reply #28 - Posted 2002-12-09 15:16:50 »

chrisbliss18: no problem dude. I almost couldn't find those urls and I was about to get unsure that my facts were straight. It's good for me to go back to my sources every once and a while to make sure I'm not dreaming up things that aren't there.
Offline Absolution

Senior Newbie




Java games rock!


« Reply #29 - Posted 2002-12-09 22:11:37 »

Off topic, but isn't prefix (++i) faster than postfix (i++)?
Pages: [1] 2
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

TehJavaDev (32 views)
2014-10-27 03:28:38

TehJavaDev (26 views)
2014-10-27 03:27:51

DarkCart (41 views)
2014-10-26 19:37:11

Luminem (22 views)
2014-10-26 10:17:50

Luminem (27 views)
2014-10-26 10:14:04

theagentd (33 views)
2014-10-25 15:46:29

Longarmx (61 views)
2014-10-17 03:59:02

Norakomi (58 views)
2014-10-16 15:22:06

Norakomi (47 views)
2014-10-16 15:20:20

lcass (43 views)
2014-10-15 16:18:58
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!