Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (489)
Games in Android Showcase (112)
games submitted by our members
Games in WIP (553)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Vectorization-Optimization RFE accepted  (Read 5693 times)
0 Members and 1 Guest are viewing this topic.
Offline Linuxhippy

Senior Member


Medals: 1


Java games rock!


« Posted 2005-10-25 14:53:33 »

Hi there,

I recently opend up a RFE for adding vectorization optimizations to hotspot-server which should help especially for games and other throughput-computing running on modern processors.
In fact its today the one of two reasons why we still use C for our most heavy number crunching stuff, the whole framework is written in java but the numbers are crunched in C about 45% faster than with Java :-(

So if you're interrested give it a vote under: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6340864

lg Clemens
Offline Mark Thornton

Senior Member





« Reply #1 - Posted 2005-10-25 15:32:14 »

A problem with floating point vectorization may be maintaining Java's relatively strict rules on FP results. This prevents the use of the multiply-accumulate instruction available on some CPUs (it uses extra precision for the intermediate result which Java forbids).
There was a JSR aimed at relaxing some of these rules, but it was withdrawn.  Embarrassed

http://jcp.org/en/jsr/detail?id=84
 This JSR proposes extensions to the JavaTM Programming Language and Java Virtual Machine that support more efficient execution of floating point code.

Withdrawn 2002.03.01. Due to the general absence of interest in the community, the Specification lead withdrew the JSR.
Offline Mark Thornton

Senior Member





« Reply #2 - Posted 2005-10-25 15:41:11 »

numbers are crunched in C about 45% faster than with Java :-(
Only 45% faster! I don't get out of bed for gains that small.  Wink

Seriously though, for me interesting performance gains start at a minimum factor of 2.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #3 - Posted 2005-10-25 16:14:06 »

I fear such an optimisation will be only done in a few undefined cases, where the JIT sees it as a possibility, so you've got to do trial-and-error and see when the JIT 'accepts' your code and does this optimisation.

Wouldn't it be a heck of a lot easier to use the native direct buffers we got now, make a little API in C/C++ and a wrapper around it? You're normally processing lots of vectors at once, so you could do a single JNI call reducing your 45% performance improvement to 44.9%. The advantage is that it's guaranteed, max out performance.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline NVaidya

Junior Member




Java games rock!


« Reply #4 - Posted 2005-10-25 16:25:34 »

Some RFEs  for FMA still seem  to be "in progress"... but not sure of the level of interest...

http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4851642

http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=289794b1aa0f08ffffffffb511f713ed5723c:YfiG?bug_id=4919337



Gravity Sucks !
Offline anarchotron

Junior Member




...precious bodily fluids.


« Reply #5 - Posted 2005-10-25 16:58:12 »

numbers are crunched in C about 45% faster than with Java :-(
Only 45% faster! I don't get out of bed for gains that small.  Wink

Seriously though, for me interesting performance gains start at a minimum factor of 2.

I for one see a 45% performance gain as pretty significant, especially in realtime or game development.
Offline Linuxhippy

Senior Member


Medals: 1


Java games rock!


« Reply #6 - Posted 2005-10-25 17:51:13 »

I also think so, especially when keeping in mind that some parts even intel's C compiler was not able to vectorize.
So its basically the same with C compilers, of course you have to write code the runtime can digest, it has never been different.
You can't use Primitives-Wrappers all the time in your code for algorythmic code and blame the jvm for its speed - sure it works but its slow by design. Its even the same with bounds-check removal, too complex statements cannot be optimized of course.

Quote
Wouldn't it be a heck of a lot easier to use the native direct buffers we got now, make a little API in C/C++ and a wrapper around it? You're normally processing lots of vectors at once, so you could do a single JNI call reducing your 45% performance improvement to 44.9%. The advantage is that it's guaranteed, max out performance.
So why have we optimizing runtimes at all? Simply using an interpreter-vm and write all the time-critical stuff in C/C++ using JNI.
Hmm, sorry ... I do not wan to loose platform independence nore do I want to waste my time with JNI.


Quote
Seriously though, for me interesting performance gains start at a minimum factor of 2.
This maybe helps only as little as 10% in your code. Adding escape analysis based stack allocation maybe adds another 5-10%. You can't expect such major and heavily optimized runtime systems like java to make jumps of x2 in performance anymore ;-)
lg Clemens
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #7 - Posted 2005-10-25 18:27:11 »

You can't use Primitives-Wrappers all the time in your code for algorythmic code and blame the jvm for its speed - sure it works but its slow by design.

I said native wrappers around the library implemented in c++. That's something else entirely. Think: JOGL or LWJGL style.

Quote
Wouldn't it be a heck of a lot easier to use the native direct buffers we got now, make a little API in C/C++ and a wrapper around it? You're normally processing lots of vectors at once, so you could do a single JNI call reducing your 45% performance improvement to 44.9%. The advantage is that it's guaranteed, max out performance.
So why have we optimizing runtimes at all? Simply using an interpreter-vm and write all the time-critical stuff in C/C++ using JNI.

ofcourse it can be a general speed increase for all applications, but for complex games you need guaranteed optimisation. The only way to force SIMD is by using wrapped native code. I'm not saying this RFE is bad, it's good.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Linuxhippy

Senior Member


Medals: 1


Java games rock!


« Reply #8 - Posted 2005-10-26 16:38:02 »

Quote
I said native wrappers around the library implemented in c++. That's something else entirely. Think: JOGL or LWJGL style.
I understood your post, I was referring to the topic that someone would write code that is JIT specific - I just wanted to show that its not different than today - even today you need to take care that your code performs well on your JVM.

Quote
ofcourse it can be a general speed increase for all applications, but for complex games you need guaranteed optimisation. The only way to force SIMD is by using wrapped native code.
native == assmbler?
Here it also depends which compiler the user has installed - especially on platforms where its common that users compile their programs theirself.

However, peace Wink

lg Clemens
Offline DaveLloyd

Junior Member




Making things happen fast with Java!


« Reply #9 - Posted 2005-10-26 23:04:37 »

How about a JNI wrapper to the blas - http://www.netlib.org/blas/? Most Fortran supercompilers work by re-expressing loops as a sequence of linalg operations that are then passed to hand optimised libraries that can take account of tiling, prefetching, simd ops etc.

With reference to the thread about a pure Java physics engine - the kernel of physics engines are easily vectorisable like this (ODE's kernel is an iterative matrix decomposition).

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Mark Thornton

Senior Member





« Reply #10 - Posted 2005-10-27 07:47:12 »

How about a JNI wrapper to the blas - http://www.netlib.org/blas/?
Unfortunately, for efficiency, you would probably end up[ with the matrices being in (Byte/Int/Float/Double)Buffer's which are not so convenient to use from the Java side. Without the JNI overhead even a 3x3 matrix multiply ould probably benefit from vectorisation, but the JNI overhead would make the minimum worthwhile matrix size rather larger.

I seem to recall that some JVM have some sort of fast path alternative to JNI which is used for special operations such as implementing System.currentTimeMillis. I don't know what conditions are required for this to work.
Offline abies

Senior Member





« Reply #11 - Posted 2005-10-27 15:50:57 »

I seem to recall that some JVM have some sort of fast path alternative to JNI which is used for special operations such as implementing System.currentTimeMillis. I don't know what conditions are required for this to work.

For natives which are known to jvm there is a plently of possibilities of inlining. Function can be expanded directly into calling method, or call can be converted to C call to special version of non-JNI implementation instead of java call going to JNI. I don't think there is any way to get such fast path for user libraries.

I think that requesting some simd operations on java.lang.Math or somewhere around it is a sane proposal. Implementing it on your own through JNI is probably a killer from performance point of view.

Artur Biesiadowski
Offline Mark Thornton

Senior Member





« Reply #12 - Posted 2005-10-27 16:37:17 »

I think that requesting some simd operations on java.lang.Math or somewhere around it is a sane proposal. Implementing it on your own through JNI is probably a killer from performance point of view.
The java.math package might be an appropriate location. Doing it as functions avoids the problem of Java's rather strict rules for floating point math and doesn't require any cleverness from the compiler.

For example if you want the compiler to recognise an opportunity for simd in
1  
2  
3  
4  
float[] a, b;
float sum=0;
for (int i=0; i<a.length; i++)
   sum += a[i]*b[i];


Then the PowerPC can't use its multiply/accumulate instruction. On the otherhand, if you have a method
1  
native float dotProduct(float[] a, float[] b);

Then we can happily implement that using the mac instruction provided the method's contract allows that sort of variation (i.e. it doesn't require bit identical results on all platforms).
Offline Mithrandir

Senior Member




Cut from being on the bleeding edge too long


« Reply #13 - Posted 2005-10-31 22:29:14 »

Better than java.math would be to promote javax.vecmath to the core JDK and add some methods to the classes there. For example, something like the Matrix4f.transform() method now takes a FloatBuffer and then the JVM/libraries could push these through using the native CPU SIMD instruction set. For things like skinning and some of the large-scale sci-viz applications, this would be a tremendous performance boost.

The site for 3D Graphics information http://www.j3d.org/
Aviatrix3D JOGL Scenegraph http://aviatrix3d.j3d.org/
Programming is essentially a markup language surrounding mathematical formulae and thus, should not be patentable.
Offline darkprophet

Senior Member




Go Go Gadget Arms


« Reply #14 - Posted 2005-11-01 09:59:01 »

Quote
Better than java.math would be to promote javax.vecmath to the core JDK

I am infavour of this..a unified vecmath library would do wonders. E.g, all scenegraphs having the same math library as JOODE (or something else) would be really nice...

However, I dont like vecmath alot...so maybe we should take out an RFE to improve the API?

DP

Friends don't let friends make MMORPGs.

Blog | Volatile-Engine
Offline Mark Thornton

Senior Member





« Reply #15 - Posted 2005-11-01 11:45:19 »

Better than java.math would be to promote javax.vecmath to the core JDK and add some methods to the classes there.
Vecmath is much too specialised for 3d graphics. If one wanted to do general vector/matrix operations you wouldn't start from there.
Offline DaveLloyd

Junior Member




Making things happen fast with Java!


« Reply #16 - Posted 2005-11-07 10:08:34 »

I still think we need to look at a BLAS approach. OK, maybe JNI isn't the best way to go (though you probably don't need to work on many vectors at once to win over the interface overhead) - but following the discussion of moving vecmath into the core, then the BLAS would be a far better choice at which point you've only got a normal subroutine call overhead which is insignificant.

A good example might be character skinning where for each vertex, you have an accumulating product with each bone influence and a scalar weight. If you set your data up correctly (I have derived classes from the native buffer that handle Vector3fBuffer etc) then you can do your character skinning in one BLAS call.

As I mentionned previously, there are highly optimised BLAS routines for every processor architecture out there including versions which will handle long/short vectors and support tiling, prefetching and other superoptimisations.

And the really nice thing is that you don't need to work too hard in the compiler to analyse many loops into a sequence of equivalent BLAS calls (particularly if you are allowed to relax the IEEE rules on operation order). This is what the Cray Fortran compilers did back in the late 80s.

Offline Mark Thornton

Senior Member





« Reply #17 - Posted 2005-11-07 11:16:00 »

Relaxing the operation order of expressions and statements in Java is probably a non starter. Not enough interest in the potential benefits and too many people who find floating point baffling enough as it is.
Offline abies

Senior Member





« Reply #18 - Posted 2005-11-07 21:07:34 »

I suppose that such work could be done somewhere on driver level - emulating some not GPU-accelerated vertex shaders on cpu. Vertex/pixel programs are wonderful target for any kind of vector optimalizations. Additionally, you don't have to care about exact floating point semantics - there are very loose on GPUs anyway.

Anyway, current movement is rather oposite - how to move some non graphic related computations from CPU to GPU. You can get 1 or 2 orders of magnitude improvement with that, as opposed to 2-3 times max for using vector CPU instructions.

Artur Biesiadowski
Online Spasi
« Reply #19 - Posted 2012-07-21 22:47:14 »

Sorry for the necro, but it looks like a Hotspot engineer has started working on this RFE. Better late than never? Shocked
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

TehJavaDev (18 views)
2014-08-28 18:26:30

CopyableCougar4 (26 views)
2014-08-22 19:31:30

atombrot (39 views)
2014-08-19 09:29:53

Tekkerue (36 views)
2014-08-16 06:45:27

Tekkerue (33 views)
2014-08-16 06:22:17

Tekkerue (22 views)
2014-08-16 06:20:21

Tekkerue (33 views)
2014-08-16 06:12:11

Rayexar (70 views)
2014-08-11 02:49:23

BurntPizza (47 views)
2014-08-09 21:09:32

BurntPizza (38 views)
2014-08-08 02:01:56
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!