Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Posted
2004-10-25 21:59:44 » |
|
Hi guys, Sorry for being away for so long, but I've been busy fixing bugs. Anyway I added another new intrinsic for the next version of Java. Absolute value (Math.abs) is now intrinsified to use hardware when available. For non-sse machines that means using FABS while sse machines use andpd or andps (depending on single or double precision). Client also picks up this improvement along with Server, SPARC, AMD64 and IA64 all have hardware implementations so those benefit as well. I'm open for suggestions to other improvements that might help.
|
|
|
|
|
blahblahblahh
|
 |
«
Reply #1 - Posted
2004-10-26 07:48:12 » |
|
No suggestions, but a quick thank-you for keeping us informed of this. Sooner or later we tend to notice these things in the bug-fixed lists for later VM's, but it's great to get a heads-up on the stuff that matters, rather than have to manually trawl through hundreds of bugs to see what's happening 
|
malloc will be first against the wall when the revolution comes...
|
|
|
princec
|
 |
«
Reply #2 - Posted
2004-10-26 09:45:58 » |
|
Could you explain what the mysterious two-tier compilation & threshold -XX flags are in the server VM...? Cas 
|
|
|
|
Games published by our own members! Check 'em out!
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #3 - Posted
2004-10-26 16:41:19 » |
|
Sure, those flags sorta kinda work. It was the first attempt at tiered compilation. Don't use it, well you can if you want it just won't help. Wait for the next version of Java, it'll have working tiered compilation and all my math goodies.
|
|
|
|
|
selendic
Junior Member  
Java games rock!
|
 |
«
Reply #4 - Posted
2004-10-26 16:52:23 » |
|
Sure, those flags sorta kinda work. It was the first attempt at tiered compilation. Don't use it, well you can if you want it just won't help. Wait for the next version of Java, it'll have working tiered compilation and all my math goodies. Is there any possible hint on when/if regular snapshots of 6.0 will be available, like for beta3 of Tiger?
|
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #5 - Posted
2004-10-26 16:58:51 » |
|
Dunno about the previews, we may or may not have them. If I find out, I'll let you all know
|
|
|
|
|
princec
|
 |
«
Reply #6 - Posted
2004-10-26 17:18:41 » |
|
Oh JOY! I think I moaned about the need for tiered compilation most vociferously in here, what, maybe four years ago or something! And finally we know it's being started on. At this rate we'll see Structs by the end of the decade  Cas 
|
|
|
|
phazer
Junior Member  
Come get some
|
 |
«
Reply #7 - Posted
2004-10-27 15:57:10 » |
|
Excuse my ignorance, but what is "tiered compilation"? 
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #8 - Posted
2004-10-27 16:20:59 » |
|
Tiered compilation is an extension of what HotSpot already does. So lets step back and talk about how HotSpot and most JIT compilers work in general. There are two phases in HotSpot, Interpreted and Compiled. When you start up a Java program, HotSpot first interprets the code until a certain threshold is reached (1000 for Client, 10,000 for Server) then the method is compiled and the compiled version is used (faster than interpreted, usually ALOT faster). Tiered compilation basically adds a second layer, so that you have interpreted -> fast jit, but low quality code -> slow jit but awesome code. Tiered compilation gives you the best of both worlds in that you get fast startup and good long running performance. Hope this helps...
|
|
|
|
|
selendic
Junior Member  
Java games rock!
|
 |
«
Reply #9 - Posted
2004-10-27 18:23:45 » |
|
hmmm. that includes merging of compilers? Will it use heueristics or command line switches to kick in? And memory consumption will be higher, I suppose?
|
|
|
|
|
Games published by our own members! Check 'em out!
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #10 - Posted
2004-10-27 18:27:17 » |
|
All of that is unkown at this time, although memory consumption shouldn't be higher. Since the JIT's memory usage (C-HEAP) is significantly lower than the Java Heap. So I don't expect a memory increase, but more info will be available once the work is nearing completion.
|
|
|
|
|
selendic
Junior Member  
Java games rock!
|
 |
«
Reply #11 - Posted
2004-10-27 18:44:48 » |
|
thanks, really great news. Now, when tiered compilation is covered, I hope escape analisys is next 
|
|
|
|
|
GKW
|
 |
«
Reply #12 - Posted
2004-10-27 18:56:56 » |
|
Is this going to be added to 5.1 or 6.0?
|
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #13 - Posted
2004-10-27 20:27:38 » |
|
Definitly 6.0 or later...
|
|
|
|
|
princec
|
 |
«
Reply #14 - Posted
2004-10-28 10:16:54 » |
|
I don't suppose you'd care to champion Structs with me? Cas 
|
|
|
|
ChrisRijk
Senior Newbie 
Optimise or Die
|
 |
«
Reply #15 - Posted
2004-10-28 14:32:57 » |
|
Tiered compilation is an extension of what HotSpot already does. So lets step back and talk about how HotSpot and most JIT compilers work in general. There are two phases in HotSpot, Interpreted and Compiled. When you start up a Java program, HotSpot first interprets the code until a certain threshold is reached (1000 for Client, 10,000 for Server) Isn't it 1,500 for client? Or is the -XX:CompileThreshold default for client on this page out of date: http://java.sun.com/docs/hotspot/VMOptions.htmlthen the method is compiled and the compiled version is used (faster than interpreted, usually ALOT faster). Tiered compilation basically adds a second layer, so that you have interpreted -> fast jit, but low quality code -> slow jit but awesome code. Tiered compilation gives you the best of both worlds in that you get fast startup and good long running performance. Hope this helps... I'll understand if it is too early to say, but - would this be server VM only...? Or would client and server effectively merge with the new model...?
|
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #16 - Posted
2004-10-28 16:13:18 » |
|
Whoops, its 1500 for X86 and 1000 for SPARC  I'm just so use to vieweing sparc files, that I didn't notice the discrepency 
|
|
|
|
|
pepe
Junior Member  
Nothing unreal exists
|
 |
«
Reply #17 - Posted
2004-10-30 08:30:46 » |
|
Thanks a lot for the info, Azeem ! Any insight about the previous 'new jvm improvements' topic and the bit shift/masking bench?
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #18 - Posted
2004-10-30 16:23:32 » |
|
Thanks a lot for the info, Azeem ! Any insight about the previous 'new jvm improvements' topic and the bit shift/masking bench? Well I'm working on JumpTables and I might be able to use SSE3 for some thing. My ultimate goal though, is to be able to use SSE to do SIMD. It would only help a limited set of code, and you'd have to write the code to a very narrow range, but I think I can figure out something. Stay tuned. ps. Anyone have any good SSE documents?
|
|
|
|
|
|
|
Spasi
|
 |
«
Reply #20 - Posted
2004-11-03 13:17:47 » |
|
It would only help a limited set of code, and you'd have to write the code to a very narrow range We'd appreciate a document describing which those narrow ranges are/will be. PS: Thanks for your efforts and for keeping us informed.
|
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #21 - Posted
2004-11-03 14:58:25 » |
|
We'd appreciate a document describing which those narrow ranges are/will be.
PS: Thanks for your efforts and for keeping us informed.
Well from what I understand SIMD only works on sets of similar data. I'm not an expert at all, but from what I understand SIMD lets you do the same operation on different data (Single Instruction Multiple Data), that much I know from my computer science classes a long time ago. But I'm unsure how Intel implemented this in SSE/SSE2. I'm still researching this  Again, if anyone has good documentation on SSE or a recommendation for a book I'm listening!
|
|
|
|
|
|
|
crystalsquid
Junior Member  
... Boing ...
|
 |
«
Reply #23 - Posted
2004-11-03 18:10:39 » |
|
I'm sure if you ask Intel they will be forthcoming. THey do some very nice docs as well as training courses several times a year. From my previous experience with it, the data types must be the same for all bits, and unless you pre-format the data into something suitable, you end up using as many instructions to shuffle/packthe data into the form you need in SSE than it would save  For example, a matrix multiply operation is not much faster in SSE because you have to transpose a matrix which takes quite a few cycles to do.
|
|
|
|
|
princec
|
 |
«
Reply #24 - Posted
2004-11-03 19:53:38 » |
|
Specifically, it's matrix4 and vector4 operations that need SIMD acceleration mostly as the highest priority, so it might be rather useful if these made it into the Java language as primitives and then got intrinsified. After that it's sound and videio decoding are the main uses, and signal processing. Cas 
|
|
|
|
Mithrandir
|
 |
«
Reply #25 - Posted
2004-11-04 13:28:59 » |
|
You don't even need to go that far. If we could get a version of the javax.vecmath package that used native code for everything, that would make a huge performance difference to a lot of applications.
|
|
|
|
Spasi
|
 |
«
Reply #26 - Posted
2004-11-04 13:49:22 » |
|
If we could get a version of the javax.vecmath package that used native code for everything, that would make a huge performance difference to a lot of applications. That would indeed be a quick'n'dirty solution (and would help lots of apps), but no, I'd hate that. It's a bad API and a bad implementation. The effort should be better spent on a more generic solution. I'm not sure about primitive types either. I'd prefer it if the VM could analyze the code and optimize on any possible opportunity (even non-vecmath code). I do realize though how difficult that is...
|
|
|
|
|
blahblahblahh
|
 |
«
Reply #27 - Posted
2004-11-04 17:33:24 » |
|
That would indeed be a quick'n'dirty solution (and would help lots of apps), but no, I'd hate that. It's a bad API and a bad implementation.
It's a heck of a lot less bad than generics, and for an awful lot of applications more valuable (*especially* if they were to do a little bit of updating and unify vecmath a little with J2D's Point classes etc). No disagreement on the implementation - but there are better impls available already for free. /me ducks and runs for cover
|
malloc will be first against the wall when the revolution comes...
|
|
|
swpalmer
|
 |
«
Reply #28 - Posted
2004-11-06 18:05:03 » |
|
After that it's sound and videio decoding are the main uses, and signal processing.
I think that level is beyond what Azeem is doing at the compiler level, but just for the record I will pipe up with my usual complaint re the lack of optimization in existing native code of the JRE. SSE must be used from JPEG decoding and encoding, to not do so is to throw performance out the window. It's like purposely using a bubble sort  when every other sort algorithm would result in a massive perfromance improvement. SSE can also be used to get massive performance gains in software loops that do image scaling and blitting.
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #29 - Posted
2004-11-07 15:50:42 » |
|
I think that level is beyond what Azeem is doing at the compiler level, but just for the record I will pipe up with my usual complaint re the lack of optimization in existing native code of the JRE. SSE must be used from JPEG decoding and encoding, to not do so is to throw performance out the window. It's like purposely using a bubble sort  when every other sort algorithm would result in a massive perfromance improvement. SSE can also be used to get massive performance gains in software loops that do image scaling and blitting. Yeah its definitly beyond just one person to get those kinds of changes into a JDK. I really can only make changes in the VM. Plus I was thinking more along the lines of a loop optimization that recognizes certain types of loops and emits instructions appropriately (in this case the SIMD instructions).
|
|
|
|
|
|