NativeOne
Senior Newbie 
Java games rock!
|
 |
«
Posted
2003-12-09 15:14:56 » |
|
Hi everybody. I'm working on a chess program which makes frequent calls to a method, that finds the position of the first (and only) set bit in a long value. So for 1->0, 2->1, 4->2, 8->3, 16->4, ... The straight-foward code is this: public static int bitIndex1(long x) { int r = 0; while ((x & (1L << r)) == 0) r++; return r; } The enhanced version takes less time in the average case: public static int bitIndex2(long x) { int r = 0; if ((x & 0xFFFFFFFF00000000L) != 0) r += 32; if ((x & 0xFFFF0000FFFF0000L) != 0) r += 16; if ((x & 0xFF00FF00FF00FF00L) != 0) r += 8; if ((x & 0xF0F0F0F0F0F0F0F0L) != 0) r += 4; if ((x & 0xCCCCCCCCCCCCCCCCL) != 0) r += 2; if ((x & 0xAAAAAAAAAAAAAAAAL) != 0) r++; return r; } Is it possible to do this even faster? On x86 there is a single assembly instruction which does this kind of stuff 
|
|
|
|
|
Woz
Senior Newbie 
A Troll who lives in a hole
|
 |
«
Reply #1 - Posted
2003-12-09 19:29:09 » |
|
Arrh yes the good old BSF opcode.
What am I saying!?! It used to be a one of the slowest instructions on the 486 taking upto 43 clock cycles, people used to do everything to avoid using it including using the FPU stack!!
[cold shirver runs down spine...]
Luckily the 486 days are gone for good (unless you work for nasa).
Native One you seem to be using 64-bit Longs, it would be better to break it into 2 32-bit integers.
But do you need to be using longs in the first place?
The -server version *may* handle these longs better as it is SSE enabled, although I have no idea if it supports 64 bit longs (a job for MMX not SSE).
You also have do not have an immediate escape clause in that if x is zero you still run through all 6 of those 64-bit checks before returning zero.
On the server version you also have the ability to set the size of code you want to inline, this could seriously help you if bitIndex is being called reguarly.
Hope that helps a bit. ------- Woz.
|
|
|
|
|
Woz
Senior Newbie 
A Troll who lives in a hole
|
 |
«
Reply #2 - Posted
2003-12-09 21:44:00 » |
|
Native One, After some testing I reckon this version is faster: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| public static int bitIndex4(long x) { final int a = (int)x; final int b = (int)(x >> 32); int r = 0;
if( a == 0 ) { if ((b & 0xFFFF0000) != 0) r += 16; if ((b & 0xFF00FF00) != 0) r += 8; if ((b & 0xF0F0F0F0) != 0) r += 4; if ((b & 0xCCCCCCCC) != 0) r += 2; if ((b & 0xAAAAAAAA) != 0) r++; r += 32; } else { if ((a & 0xFFFF0000) != 0) r += 16; if ((a & 0xFF00FF00) != 0) r += 8; if ((a & 0xF0F0F0F0) != 0) r += 4; if ((a & 0xCCCCCCCC) != 0) r += 2; if ((a & 0xAAAAAAAA) != 0) r++; } return r; } |
Although I'm not happy about the casting at the top but I assume you have to use longs, also there is a bit of a hole in that your style of code allows a value of 0 to be returned as if a bit 0 has been set. This has fallen through into the int version thus: bitIndex2( 0L ) returns 0 bitIndex2( 1L ) returns 0 bitIndex4( 0L ) returns 32 bitIndex4( 1L ) returns 0 Which does not affect the way you want it to work but is something to at least be aware of. Java 1.3.1 On a 1Ghz PIII coppermine with approx CPU Clock Speed = 997453224 Cycles per second. I was getting a consistant average (on 10000 x 64 tests) of 1154 clock cycles per call on bitIndex2() 1050 clock cycles per call on bitIndex4() These value includes the loop overhead per iteration but profiling down to this level if farely hard and should be considered a micro benchmark as on first run its possible to see how much cpu the hotspot takes to optimize the execution path when in bitIndex4() bit 32 is set causing to take the other branch. Profiling the actual methods results in bitIndex4() being ~20cc faster than bitIndex2() when done in a smaller bits 1 to 64 set test. Which is a different value from the 10000 * 64 tests. Microsoft JVM also mirrored this speed increase, although it is much faster, values of 515cc for bitIndex4() not unheard of (again jumps around a lot). Although I'm not happy with the profiling environment as I just wacked the code into an existing applet, so there are plenty of other threads running. So perhaps tomorrow I'll try it in a clean testbed and see if the values are any more consistant. ------- Woz.
|
|
|
|
|
Games published by our own members! Check 'em out!
|
|
Woz
Senior Newbie 
A Troll who lives in a hole
|
 |
«
Reply #3 - Posted
2003-12-10 06:58:25 » |
|
Guess who managed to post the wrong piece of code last night! Ooops, sorry about that, this is the one you want: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| public static int bitIndex5(long x) { int r = 0; int a = (int)x;
if( a == 0 ) { a = (int)(x >> 32); r += 32; }
if ((a & 0xFFFF0000) != 0) r += 16; if ((a & 0xFF00FF00) != 0) r += 8; if ((a & 0xF0F0F0F0) != 0) r += 4; if ((a & 0xCCCCCCCC) != 0) r += 2; if ((a & 0xAAAAAAAA) != 0) r++;
return r; } |
Caused by looking at the wrong spread sheet!! Anyway this one is faster and smaller than bitIndex4() as it avoids doing 2 casts in 50% of cases and the execution path is a little cleaner. Sorry about that. ------- Woz.
|
|
|
|
|
erikd
|
 |
«
Reply #4 - Posted
2003-12-10 08:10:44 » |
|
I'm wondering how 1.4.2 would perform compared to the (old) VMs you tested on....
Erik
|
|
|
|
Woz
Senior Newbie 
A Troll who lives in a hole
|
 |
«
Reply #5 - Posted
2003-12-11 19:53:18 » |
|
For those interested in dots. Appologies for the large post, and you probably want to go full screen to get the table data lined up (sorry). The clock cycles given include the loop over head, the top value is the average cycles, the value below it is the median. CPU = 1Ghz PIII coppermine with approx CPU Clock Speed = 997453224 Cycles per second Clock Cycles for bitIndex2().
335.163 335 ibm13index2e.csv Thread priority = 5 Classic VM IBM Corporation 1.3.0 1.3.0 IBM Corporation
|
462.998 463 ms114index2e.csv Thread priority = 5 null null null 1.1.4 Microsoft Corp.
|
334.378 334 sun12index2e.csv Thread priority = 5 Classic VM Sun Microsystems Inc. 1.2 1.2 Sun Microsystems Inc.
|
612.59 612 sun131index2e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.3.1-b24 1.3.1 Sun Microsystems Inc.
|
626.142 626 sun141index2e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.4.1-b21 1.4.1 Sun Microsystems Inc.
|
635.11 635 sun142index2e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.4.2_01-b06 1.4.2_01 Sun Microsystems Inc.
|
Clock cycles for bitIndex4()
216.247 216 ibm13index4e.csv Thread priority = 5 Classic VM IBM Corporation 1.3.0 1.3.0 IBM Corporation
|
354.651 354 ms114index4e.csv Thread priority = 5 null null null 1.1.4 Microsoft Corp.
|
308.959 309 sun12index4e.csv Thread priority = 5 Classic VM Sun Microsystems Inc. 1.2 1.2 Sun Microsystems Inc.
|
501.409 501 sun131index4e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.3.1-b24 1.3.1 Sun Microsystems Inc.
|
487.529 487 sun141index4e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.4.1-b21 1.4.1 Sun Microsystems Inc.
|
470.736 471 sun142index4e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.4.2_01-b06 1.4.2_01 Sun Microsystems Inc.
|
Clock cycles for bitIndex5()
208.485 208 ibm13index5e.csv Thread priority = 5 Classic VM IBM Corporation 1.3.0 1.3.0 IBM Corporation
|
342.374 342 ms114index5e.csv Thread priority = 5 null null null 1.1.4 Microsoft Corp.
|
278.477 278 sun12index5e.csv Thread priority = 5 Classic VM Sun Microsystems Inc. 1.2 1.2 Sun Microsystems Inc.
|
466.484 466 sun131index5e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.3.1-b24 1.3.1 Sun Microsystems Inc.
|
460.837 461 sun141index5e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.4.1-b21 1.4.1 Sun Microsystems Inc.
|
450.049 450 sun142index5e.csv Thread priority = 5 Java HotSpot(TM) Client VM Sun Microsystems Inc. 1.4.2_01-b06 1.4.2_01 Sun Microsystems Inc.
|
| ----- |
Speed gain when using bitIndex5() instead of using bitIndex2() |
161% |
135% |
120% |
131% |
135% |
141% |
As you can clearly see it is worth using bitIndex5() over bit Index2() giving at worst a 20% speed increase (at best a 61% increase). If you want to know which JVM was faster on my machine, well that goes (unsurprisingly) to IBM's 1.3. Unfortunately I didn't get to include Kaffe due to the version I have being broken on Windows. And I didn't succeed in compiling a .exe via GNU's GCJ, due to a jni link error which is most likely my fault (but haven't finished with this yet). --------- Woz.
|
|
|
|
|
NativeOne
Senior Newbie 
Java games rock!
|
 |
«
Reply #6 - Posted
2003-12-11 23:31:18 » |
|
Thanks a lot for your help. So basically bitIndex2() seems to be the right thing to do - just not in the fastest variant. Originally I hoped that maybe someone would come up with a better algorithm, maybe exploiting some obscure bit fumbling. Because my work is in the early stages I'll consider switching to C/C++ and later optimize with assembly. Could be quite fun and challenging - maybe I'll toast my CPU  I wonder how a C++ and then an asm version would compare in terms of cycles. Can it be true, that Sun 1.2 is faster than the successor VMs? Strange... Merry Christmas & Happy new Year NativeOne
|
|
|
|
|
crystalsquid
Junior Member  
... Boing ...
|
 |
«
Reply #7 - Posted
2003-12-12 09:34:09 » |
|
One area for improvement is the use of branches (ifs). You can do the same with some cunning bit-fumbling  This is also the best way to approach an asm version as you can pipeline the logical ops much better than using branches. Note also that the bit pattern was swapped for the test as this trick wouldn't work if the top bit was set. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| public static int bitIndex6(long x) { int r = 0; int a = (int)x; int b = (int)(x>>32) int c = (a>>32) | ((-a)>>32); a = (a & c) | (b & (~c)); r = 32 & (~c);
b = a&0x0000ffff; b = ~((-b)>>31); r |= 16 & b;
b = a&0x00ff00ff; b = ~((-b)>>31); r |= 8 & b;
b = a&0x0f0f0f0f; b = ~((-b)>>31); r |= 4 & b;
b = a&0x33333333; b = ~((-b)>>31); r |= 2 & b;
b = a&0x55555555; b = ~((-b)>>31); r |= 1 & b;
return r; } |
Don't know if this would be faster in java, but writing the asm or C++ like this should be faster overall (no branch prediction errors can occur). Also, this is straight out of my head so apologies for any typos / errors  - Dom
|
|
|
|
|
erikd
|
 |
«
Reply #8 - Posted
2003-12-12 10:43:55 » |
|
If you want to know which JVM was faster on my machine, well that goes (unsurprisingly) to IBM's 1.3.
Yeah, it's a shame it seems they don't develop VM's for windows anymore  Erik
|
|
|
|
Woz
Senior Newbie 
A Troll who lives in a hole
|
 |
«
Reply #9 - Posted
2003-12-12 19:00:41 » |
|
Erik, IBM still do make a JVM, its just part of Web Sphere now. http://www-106.ibm.com/developerworks/java/jdk/IBM typically runs a bit behind in the version numbers though, they were at version 1.1.8 for ages. But what I don't understand is why they have a version of 1.4.1 for Linux but not windows, the Linux crowd have had it for ages too. I completely agree with Cystal Squid, conditional branches kill modern CPUs (especially the P4 with its super long pipeline). It would be interesting to try and make as much of that code run in parallel (super scalar architecture). But do you know what! The BSF opcode is supposed to run in 1-2 clocks on the PPro, PII and above!!! Native One I would not necessarily say that SUN 1.2 is faster than 1.4 series as this is only one test, the rest of the JVM may very well suck! It may not scale at all, have bad array access and a poor I/O sub system. One thing you can note is that there is a difference between JITs and Hotspots, the JIT compilers like your code, Hotspots - for what ever reason do not. IBM's JVM is a Hotspot compiler but they seem to have managed to keep all the benefits of JIT too. Unfortunately all these JVM's are closed source so it really difficult to work out the actual reason for it. Also this is only on MY machine, it is entirely possible to be the other way around on your PC. Although typically all PII & PIII systems would behave the same in this test because the memory sub system is hardly used. But then of course this is not a "live" environment, the code does not actually do anything and it is not used the way you intend it to be used. The code is so small that the entire program should fit in the L1 cache of the CPU, something that probably wont happen when your chess program is finished. As to which programming language you should use, I see no reason why you should not use Java, its (relatively) easy to use, and changes can be made quickly. OK speed of execution "could" be a problem, but if this is your first go at a chess program its not really going to be an issue, your hardly going to be running the thing against Deep Thought are you!!?? Anyway hope you found this all useful, if not at least interesting! All the best, ------- Woz.
|
|
|
|
|
Games published by our own members! Check 'em out!
|
|
cfmdobbie
|
 |
«
Reply #10 - Posted
2003-12-12 20:14:11 » |
|
No one's mentioned the usual optimization mantra:
"Premature optimization is the root of all evil" - Donald Knuth
It's usually better to not worry about these kind of things right now, and detect any slowdowns once you start profiling. It may be that in the real environment Java will perform a lot better than your micro-benchmark predicts, or that there are much bigger bottlenecks elsewhere.
Unless you're sure this is going to be a big deal - exactly how frequently is this method going to be called?
|
Hellomynameis Charlie Dobbie.
|
|
|
erikd
|
 |
«
Reply #11 - Posted
2003-12-12 21:24:12 » |
|
Erik, IBM still do make a JVM, its just part of Web Sphere now. http://www-106.ibm.com/developerworks/java/jdk/IBM typically runs a bit behind in the version numbers though, they were at version 1.1.8 for ages. But what I don't understand is why they have a version of 1.4.1 for Linux but not windows, the Linux crowd have had it for ages too. Yes, you're right. It's just that I've given up hope after waiting so long for 1.4 for windows. I'm very impatient and I can't use 1.3.1 for my games  OTOH, I have no real problems with SUN's offering as it's stable and usually fast enough for what I do. I actually think Sun is doing a great job in general. But I do sometimes question Sun's statements that their JVM can reach C++ speeds with well written code if in all cases I tried, IBM's 1.3.1 JVM outperformed Sun's JVM's by a factor of 2 and up and I'm not talking about microbenchmarks or just my own crappy code... Hmmm... :-/ Erik
|
|
|
|
Jeff
|
 |
«
Reply #12 - Posted
2003-12-13 04:56:01 » |
|
But I do sometimes question Sun's statements that their JVM can reach C++ speeds with well written code if in all cases I tried,
The magic words. I've said it before and I'll say it again. Microbenchmarking is hard and interpreting the results is even harder. I have not seen such a difference in recent benchmarks. In fact I believe in SpecJVM, which was the standard CPU benchmark for Java for quite some time, we and IBM basically payed ping pong in who was on top for a long time. In general the differences between our VM and IBMs have had a lot to do with what you were measuring. A few versions back, the big differences you saw were that if you measured array access they won, because we hadn't implemented bounds check hoisting yet, and if you measured memory allocation/deallocation we won because they didn't have generational allocation and collection. A good example of how these trade-offs play out can be seen in Chris Rijk's excellent (if now old) article "Binaries v. Bytecodes." Since then (starting with 1,.4) we've got the bounds check hoisting. I don't know what they've done or not done about their allocator/collector.
|
|
|
|
Jeff
|
 |
«
Reply #13 - Posted
2003-12-13 05:04:03 » |
|
For those interested in dots. (1) Are you comparing apples vs apples? (IBM server VM v. Sun Server VM)? (2) Are you running the Sun VM with -Xcompile and properly warming the VM up?? If not the results could simply be due to IBm deciding to kick into compiled code sooner. In general, AFAIK a properly run SpecJVM does *not* show such a difference between IBM and our VM, which makes me suspect your methodology.
|
|
|
|
Jeff
|
 |
«
Reply #14 - Posted
2003-12-13 05:06:49 » |
|
Okay Woz, you are DEFINITELY running the VM wrong. I see classic (which doesn't even exist anymore) and Client, but NO server VM numbers. IBM OTOH i believe ONLY has a server VM. Try apples v. apples to start 
|
|
|
|
Woz
Senior Newbie 
A Troll who lives in a hole
|
 |
«
Reply #15 - Posted
2003-12-13 11:29:43 » |
|
Wow Jeff calm down! The comparison was which routine was quicker, for me its bitIndex5(). This was the IBM Run Time JVM I downloaded and use, IBM has either moved or removed the page since, but google still caches it. Google CacheThis has a RunTime installable package, ibm-cpt-rn20.exe. Planet mirror still has it: http:// http://public.planetmirror.com/pub/java-ibm/1.3.0/win32/I did not use the -Xcompile switch with Suns 1.4.x (or any of the other JVMs), although I allowed time for the JVMs to warm up. [move]Shall have a go and see if it makes any difference, when I've got the some time free.[/move] ------- Woz.
|
|
|
|
|
blahblahblahh
|
 |
«
Reply #16 - Posted
2003-12-13 13:12:14 » |
|
for quite some time, we and IBM basically payed ping pong in who was on top for a long time.
To what extent is this because Sun creates the new versions of Java, and IBM has to wait and see what version X + 1 is and does, and what has changed, before they can write their version? (I assume these days major Java outfits like IBM get v. early access to planned architectural and library change information?) My very limited understanding is that this used to be a major reason why IBM had a long time lag with keeping up with the Sun releases, along with the fact that they devoted a heck of a lot more resources to Java dev than Sun did (this is going back some years, but was certainly true for a long time; IBM touted the fact that they had 5 java devs for every 1 Sun java dev (or whatever the ratio was) probably for longer than was strictly fair, but... ). But then IBM has 300k employees  , so it perhaps wasn't hard. I've kind of assumed that IBM has been the driving force behind making Sun make Java faster and better - you don't want to be eclipsed (pun unintended) in your own language by a competitor - and this is a good thing; IBM spends longer on their JVM in an attempt (amongst other things) to get one over on Sun, which pushes Sun to devote more resources and push a bit harder at making the next iteration even better. Both companies are extremely good at VM development - IBM historically is significantly better, but many of their experts moved to Sun (and vice versa) over the years, so nowadays it's presumably fairly equal; with the continued demise of Compaq (and all they swallowed) is there anyone left in the world with VM knowledge and resources that even come close to Sun/IBM? Compare this to the MS monopolies, where MS rarely has any reason to improve e.g. Word, and the software just stagnates. IMHO. I just want to see IBM and Sun keep pushing each other, and not resting on laurels, and hopefully never running out of major new features etc to beat each other on (or the same stagnation may set in).
|
malloc will be first against the wall when the revolution comes...
|
|
|
abies
|
 |
«
Reply #17 - Posted
2003-12-13 16:02:53 » |
|
[...] with the continued demise of Compaq (and all they swallowed) is there anyone left in the world with VM knowledge and resources that even come close to Sun/IBM?
I don't know if Intel still develops it in house, but they had quite reasonable team working on orp jvm. Effort was mainly driven by need to have state of art jvm for IA64 architecture, which is non-trivial to say the least as far as compilers are concerned, which directly translates to jit implementation. At the same time they have open sourced x86 implementation. It is dead from what can be seen outside - no way of guessing if they have just focused on IA64 and stopped pushing patches to open source version or gave up alltogether (IA64 is not exactly a best selling processor after all...). Anyway, team is somewhere there, inside Intel research lab, waiting to take over the world when the hour comes.
|
Artur Biesiadowski
|
|
|
Jeff
|
 |
«
Reply #18 - Posted
2003-12-14 04:33:43 » |
|
To what extent is this because Sun creates the new versions of Java, and IBM has to wait and see what version X + 1 is and does, and what has changed, before they can write their version?
None. IBM has its own version of Java, its own source base. they don't have to wait for ours. They are equal members of the JCP and know when the JSR is final as soon as we do. By the way, I believe they are still full licensees which means that as soon as we finish OUR VM, they get to see the source. The reverse is not true. (I assume these days major Java outfits like IBM get v. early access to planned architectural and library change information?)
Acutally EVERYONE gets the info at more or less the same time, when the JSR is approved by the JCP J2SE Executive Comittee. IBM and anyone else who wants to join the expert group for the JSR has a slight lead in that they are invovled in the debates that lead to the version sent to the EC. My very limited understanding is that this used to be a major reason why IBM had a long time lag with keeping up with the Sun releases, along with the fact that they devoted a heck of a lot more resources to Java dev than Sun did
How could devoting more resources be a reason for delays? Maybe I'm not following you as that dpesnt seem very sensible to me? I don't know much of the internals of IBM's Java development so I can't comment on it beyond what I know. I do know that they have their development split in half (or did last time I talked to an IBm person which was a few yearsd ago.) They have a core VM group that develops the base of their technology and then porting groups that port it to all the platforms they support (which is a lot, they have a lot of old legacy systems like 360s and VM (thats an IBM operating system) machines. Likely they are stil lsupporting AIX users, etc. (this is going back some years, but was certainly true for a long time; IBM touted the fact that they had 5 java devs for every 1 Sun java dev
Its a deceiving number as they are suypporting many more platforms then we are. and structure their development totally differently. You really cant just count heads. (I've kind of assumed that IBM has been the driving force behind making Sun make Java faster and better -
You are half right. We have continually driven each others development foward. Its a bilateral sort of thing. and yes, competition is a very good thing  Imagine how much better (or at least more stable) Windows woudl be if MSFT had competitors...
|
|
|
|
Jeff
|
 |
«
Reply #19 - Posted
2003-12-14 04:40:46 » |
|
Wow Jeff calm down!
Sorry but "IBM is always twice is fast" I know isnt true and is fighting words :) Remember I was ON the JDK1.3 tunign team and we did heavy measurements of oruselves AND competitors. IBM's VM (which is a server style VM) is always twice as fast as Sun's CLIENT VM I believe. Which is what it looks liek yo umeaured. Make sure to run with -Xcompile and -server and you might get a fairer comaprsions. Ofcourse microbenchmarks are horribly prone to inaccuracies anyway in that they don't reflect the way real code executes which is what we at Sun tune for.
|
|
|
|
erikd
|
 |
«
Reply #20 - Posted
2003-12-14 19:13:27 » |
|
Sorry but "IBM is always twice is fast" I know isnt true and is fighting words  Maybe you're referring to my remark, and in that case I like to reemphasize that I didn't say that IBM is always twice as fast. I said that in *the cases I tried*, IBM was always faster. I know those cases do not give an general conclusion about IBM vs SUN, but cases of interest to me at most. And I was testing IBM 1.3 against SUN's 1.3 (apples to apples and all  ). I haven't really tried IBM 1.3 against recent Sun VM's mucg, since I do different things with Sun's VM that are simply not possible with 1.3. Just to make myself a bit more clear... Erik
|
|
|
|
Jeff
|
 |
«
Reply #21 - Posted
2003-12-15 00:50:54 » |
|
I said that in *the cases I tried*, IBM was always faster. I know those cases do not give an general conclusion about IBM vs SUN, but cases of interest to me at most. And I was testing IBM 1.3 against SUN's 1.3 (apples to apples and all  ). Ah. IBM had bounds check hoisting in their 1.3 . We didn't have it until our 1.4. Thats likely the difference you were seeing in most of your tests. (Not actually seeing the tests I can't tell for sure.) OTOH Since our 1.4 was out not too much later then their 1.3 its arguable that 1.3 to 1.3 is the correct comparison or whether 1.3 to 1.4 is. As a general rule I always try to compare whatever are the most current versions of whatever VMs Im comparing, regardless of their version numbers.
|
|
|
|
erikd
|
 |
«
Reply #22 - Posted
2003-12-15 09:14:23 » |
|
Ah. IBM had bounds check hoisting in their 1.3 . We didn't have it until our 1.4.
Well, in the server VM that is (or is it implemented in the client too lately?). For games, people usually use the client VM. IBM's VM might be a server but 'feels' like a client VM too (warming up times). Thats likely the difference you were seeing in most of your tests. (Not actually seeing the tests I can't tell for sure.)
Likely. At the time I was much into emulation having array access everywhere. Erik (sorry for hijacking this thread, btw)
|
|
|
|
Jeff
|
 |
«
Reply #23 - Posted
2003-12-16 02:57:40 » |
|
Well, in the server VM that is (or is it implemented in the client too lately?). For games, people usually use the client VM.
Not really true at all. Go ask Shawn Kendall  IBM's VM might be a server but 'feels' like a client VM too (warming up times).
Can't really comment as thats a subjective measure. You can make Hotspot "warm up" right away easily enough by saying -Xcompile.
|
|
|
|
erikd
|
 |
«
Reply #24 - Posted
2003-12-16 10:27:43 » |
|
Not really true at all. Go ask Shawn Kendall Hmmm... Well it's troublesome enough to get people download the JRE, let alone the whole SDK to get the server VM... You surely can't be serious that most people have the SDK already? Can't really comment as thats a subjective measure.
Yes it is. You can make Hotspot "warm up" right away easily enough by saying -Xcompile.
Yes I know, but it would be some hassle to get for example web start to use the server VM, and use -Xcompile. I vote for merging the client and server VMs 
|
|
|
|
Jeff
|
 |
«
Reply #25 - Posted
2003-12-16 19:28:52 » |
|
or example web start to use the server VM, and use -Xcompile. I vote for merging the client and server VMs  Well IMO you will always want different configuration parameters for client and server operation even if at the end they are running on the same codebase. The real problem here is that you say you cannot pass VM flags to Webstart. Thats pretty intolerable if true and needs to be fixed. Speaking of which it would be good to start a thread thats JUSt a list of open Webstart issues. I think I'll ask Chris for a Webstart topic of its own.
|
|
|
|
|
|
blahblahblahh
|
 |
«
Reply #27 - Posted
2003-12-16 21:21:39 » |
|
Well IMO you will always want different configuration parameters for client and server operation even if at the end they are running on the same codebase.
The real problem here is that you say you cannot pass VM flags to Webstart. Thats pretty intolerable if true and needs to be fixed.
Speaking of which it would be good to start a thread thats JUSt a list of open Webstart issues. I think I'll ask Chris for a Webstart topic of its own.
Yes, please do! In other discussion groups / forums / mailing lists / sites I come across quiite a few java games devs who are struggling with or debating using JWS. It would be really helpful if I could point them at such a forum whenever trouble came up, and so help route more people to the same particular point. I tell them to try JGO anyway, but currently you have to put in quite a lot of effort to e.g. find out about a single small thing like JWS from within JGO.
|
malloc will be first against the wall when the revolution comes...
|
|
|
Jeff
|
 |
«
Reply #28 - Posted
2003-12-16 21:55:33 » |
|
Well I've started a JWS RFE's thread in "On-line Games", but i agree JWS really needs its own topic. Im bugging chris about this now...
|
|
|
|
swpalmer
|
 |
«
Reply #29 - Posted
2003-12-17 01:53:23 » |
|
I think you can do this without any branching.. theoretically it could be faster. If you can rewrite the IFs using a mask to effectively add zero when the condition is false it might keep the processor pipeline full and be marginally faster. You may be able to exploit properties of the number.. E.g. recognize that binary numbers with a single bit set minus 1 give an interesting value that could make a good mask for ANDing, e.g. 10000 - 1 = 1111. I came up with this... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| public static int getIndex(int v) { int m1 = v-1; int m2 = ((v & 0xAAAAAAAA)-1+v); int m3 = (m1-m2)>>31; int i = m3 & 1;
m2 = ((v & 0xCCCCCCCC)-1+v); m3 = (m1-m2)>>31; i += m3 & 2;
m2 = ((v & 0xF0F0F0F0)-1+v); m3 = (m1-m2)>>31; i += m3 & 4;
m2 = ((v & 0xFF00FF00)-1+v); m3 = (m1-m2)>>31; i += m3 & 8;
m2 = ((v & 0xFFFF0000)-1+v); m3 = (m1-m2)>>31; i += m3 & 16; return i; } |
I haven't investigated if this can be further simplified.
|
|
|
|
|