Mark Thornton
|
 |
«
Reply #30 - Posted
2004-06-25 09:57:49 » |
|
The server VM does check for SSE as well as SSE2, but yes support for more processor dependent features (or performance differences) would be welcome.
|
|
|
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #32 - Posted
2004-07-14 20:17:23 » |
|
Hi, The reason Server is slower than client is the way it handles FP. Its just inefficiently coded. That's why P4's run the mandelbrot faster, because of the extra available registers (the 8 SSE registers), which reduces register pressure. Basically it was decided that SSE is the future, and server was optimized for that. There really isn't much to be gained to improving the FP code.
|
|
|
|
|
Games published by our own members! Check 'em out!
|
|
erikd
|
 |
«
Reply #33 - Posted
2004-07-15 00:39:18 » |
|
Basically it was decided that SSE is the future, and server was optimized for that. There really isn't much to be gained to improving the FP code. I'm thinking about the here and now. This isn't about improving FP code, it's about fixing a rather serious performance bug in the server. Remember the majority of people don't have SSE2 and are affected by this bug. They could at least copy and paste the FP code from the client version methinks 
|
|
|
|
swpalmer
|
 |
«
Reply #34 - Posted
2004-07-15 02:03:58 » |
|
ajiva, based on your comments I'm guessing you are a Sun engineer? Am I right? If so, what do you do at Sun?
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #35 - Posted
2004-07-15 15:59:13 » |
|
First of all, yes I work for Sun, working on Server (4 years here so far). Secondly, its not easy to just copy and paste the code. There were very different methologies used to get to where each of the compilers are now. Client is faster because of so many reasons, and when it came time to speed up FP code for Server it was decided that we should optimize for SSE, because that really is the future. SSE provides several benefits, and yes even Athlons will get them too. - 8 FP registers, this is the biggie. X86 has way too few registers, and if your doing FP work, you'll need these registers. - Even if you have an Athlon (or P3 for that matter), just stick to using Floats and you'll used the single precision SSE instructions (vs the double precision SSE2 ones). Now if your code requires double precision, well go buy a P4 (or an Opteron)  Really we were thinking about the future here, and the fact of the matter is that all new Processors would be SSE/SSE2 enabled and we felt that it was better to work towards that then try to minimally improve FP for older processors
|
|
|
|
|
princec
|
 |
«
Reply #36 - Posted
2004-07-15 18:47:38 » |
|
Now you've revealed yourself you will never escape us  What's the scoop on - escape analysis - two-phase compilation (ie. the merging of client and server VMs) - Structs (see RFE...) Cas 
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #37 - Posted
2004-07-15 18:51:37 » |
|
I can't talk about future work...
|
|
|
|
|
swpalmer
|
 |
«
Reply #38 - Posted
2004-07-16 05:33:45 » |
|
Fair enough... Can you talk about what IS in the 1.5 betas? In other words can you confirm that escape analysis for instance isn't in the current 1.5 betas.... that way at least you aren't talking about "future" work and we can get an idea of what (not) to expect. Basically it would be great to get an idea of the performance enhancements that are going into the 1.5 VMs. Hopefully there are some  .
|
|
|
|
swpalmer
|
 |
«
Reply #39 - Posted
2004-07-16 05:51:24 » |
|
- 8 FP registers, this is the biggie. X86 has way too few registers, and if your doing FP work, you'll need these registers.
[offtopic-rant] The Intel processor architecture sucks big time - there is no debating that... it hasn't come very far from the ancient calculator it was based on as far as that goes  That's one of the many reasons I went with a Mac. Sparc, PowerPC, MIPs, Alpha.. all superior. All but Sparc suffering because MS wouldn't continue to keep NT working on them - I suspect the MS developers had trouble with machines that had more registers than they had fingers. Not having an operating system kinda makes your processor less popular  . [/offtopic-rant] The opteron has lots of registers (that aren't SSE) doesn't it? I remember reading something that indicated it was a decent processor - 32 or so general purpose registers, instead of 5 or 6 special purpose ones or whatever intel has these days SSE is mostly about vector instructions anyway isn't it. Exploiting that really gets you some speed if you can do it. The extra registers are just good in general. I keep suggesting that the bits of native code that are in the JRE use optimized assembly with SSE/SSE2 for things like JPEG loaders, and image blitting loops the improvements would be huge. All of Java2D's AlphaCompositing rules could be implemented with SSE2 and they would scream.
|
|
|
|
Games published by our own members! Check 'em out!
|
|
erikd
|
 |
«
Reply #40 - Posted
2004-07-16 11:47:11 » |
|
Secondly, its not easy to just copy and paste the code. I didn't seriously expect it to be that easy  Well, at least we know what the situation is regarding this, although it's a little bit disappointing: * The server's non SSE2 FP code is (and I quote) 'ineffeciently coded'. * It won't be improved because Sun thinks SSE2 supporting CPU's have the future, regardless of what the majority of people uses right now. I am not really affected by this bug because I rarely use any doubles and I my games are usually run on the client anyway. And of course this benchmark puts heavy emphasis on the bad double performance so in real life it's not as bad as it looks here.
|
|
|
|
Bombadil
|
 |
«
Reply #41 - Posted
2004-07-16 12:33:55 » |
|
Welcome Ajiva. Nice to see SUN engineers taking part in this forum. :-) Even more nice that you like games. PS to the fellow readers: Please have a look at Ajiva's weblog, which is informative. (Also clickable via his profile)
|
|
|
|
|
princec
|
 |
«
Reply #42 - Posted
2004-07-16 13:18:34 » |
|
He'll be sick to death of us in no time  Eventually he'll have to change his name and go live under a rock. Cas 
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #43 - Posted
2004-07-16 13:42:04 » |
|
Fair enough... Can you talk about what IS in the 1.5 betas? In other words can you confirm that escape analysis for instance isn't in the current 1.5 betas.... that way at least you aren't talking about "future" work and we can get an idea of what (not) to expect. Basically it would be great to get an idea of the performance enhancements that are going into the 1.5 VMs. Hopefully there are some  . Neither Escape Analysis nor Tier Compilation is in J2SE 5.0 These are just off the top of my head: Server - Trig speed up (Solaris SPARC and Solaris x86 only) - instructions similar to this (long = int & 0xFFFFL ) This was done for the cyrpto folks who do alot of unsigned work - Inlining/Loop Opts improvements - Startup improvements via Class Data Sharing (this is client and server) Tons of bug fixes  J2SE 5.0 is the most stable VM yet...
|
|
|
|
|
Abuse
|
 |
«
Reply #44 - Posted
2004-07-16 20:50:13 » |
|
Tons of bug fixes J2SE 5.0 is the most stable VM yet...
Progressing 3.5 versions in 1 release, I would hope so ^_^
|
Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! 
|
|
|
swpalmer
|
 |
«
Reply #45 - Posted
2004-07-17 01:10:12 » |
|
Hmm.. most enhancements are for the Server VM.... the VM that is basically missing from the client side  ... where the games run... Or did they fix the fact that the server VM is not installed as part of the JRE. (only in the JDK last I checked) Nice to see some of those in any case. How come the trig optimizations are specific to Solaris? That is, if it effects Solaris on x86, does the same optimization not apply to x86 elsewhere?
|
|
|
|
Azeem Jiva
Junior Member  
Java VM Engineer, Sun Microsystems
|
 |
«
Reply #46 - Posted
2004-07-18 19:02:40 » |
|
Hmm.. most enhancements are for the Server VM.... the VM that is basically missing from the client side  ... where the games run... Or did they fix the fact that the server VM is not installed as part of the JRE. (only in the JDK last I checked) Nice to see some of those in any case. How come the trig optimizations are specific to Solaris? That is, if it effects Solaris on x86, does the same optimization not apply to x86 elsewhere? The way the trig functions work, is that when you call out to sin (for example), the VM use to execute a JNI call to a C++ library and return the resulting value. This is slow, so to speed it up, I now short cut the whole thing, and have the VM recongize that a direct C++ call can me made instead. So this is great, except that the compilers we use for Windows (VC6++) and Linux (GCC 3.2) have an aliasing problem with the way the C++ code is structured. So we had to turn down the optimizations for those platforms. The Sun Compilers do not have this problem, and therefore we can crank up the optimizations. And no server does not come with the JRE...
|
|
|
|
|
crystalsquid
Junior Member  
... Boing ...
|
 |
«
Reply #47 - Posted
2004-07-18 21:21:10 » |
|
AFAIK, the C++ library trig functions just do the Taylor series expansion - which could be done in straight Java code & execute almost as fast as the C++ one - probably faster than the JNI call on all platforms.
- Dom
|
|
|
|
|
swpalmer
|
 |
«
Reply #48 - Posted
2004-07-18 22:16:05 » |
|
...the compilers we use for Windows (VC6++) and Linux (GCC 3.2) have an aliasing problem... You aren't using VC7? Is there a reason for that? Better yet you should use the Intel compiler - it performs much better than VC6 - particularly with floating point operations. Though it compiles much slower. I know of at least one commercial project that compiles their release builds with the intel compiler because they see a significant performance boost. They are making a visual effects (processing movie frames) number crunching application that needs every bit of speed it can get. But crystal squid has an interesting point. Why not a pure java implementation? Should it not get compiled to the same basic instructions as the C-library code, at least close enough that the JNI overhead saved would be more than the difference?
|
|
|
|
|