Java-Gaming.org    
Featured games (91)
games approved by the League of Dukes
Games in Showcase (577)
games submitted by our members
Games in WIP (498)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  New VM performance improvements  (Read 3339 times)
0 Members and 1 Guest are viewing this topic.
Offline Azeem Jiva

Junior Member




Java VM Engineer, Sun Microsystems


« Posted 2004-09-01 15:58:15 »

Just wanted to let you guys know that sin, cos, tan, ln, log10 are all setup in the VM to use the X86 hardware (when possible).  This is in addition to square root and pow.  This is both on X86 and AMD64, and there are small speed ups to other platforms (by speeding up the calls to these trig and transcendentals).  This pretty much does it for this sorta work by me.  Anyone have other suggestions for things that need to be sped up?  

Oh and this is of course post-tiger, so don't expect it anytime soon  Smiley
Offline pepe

Junior Member




Nothing unreal exists


« Reply #1 - Posted 2004-09-01 16:15:46 »

Hello. Thanks for those, it is very apreciated !!!
I don't know if that is your field, or is a correct answer to your question, but shifts and masking (ints, for channel operations on pixels) showed to be very very slow. In fact, it was slower using an RGBA int than four floats for storing/handling pixel values due to those operations.  (i'm doing image filtering, and i -of course-  need it  to be fast )
If you could accelerate that also, i'd be your slave for life.
Grin

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline Azeem Jiva

Junior Member




Java VM Engineer, Sun Microsystems


« Reply #2 - Posted 2004-09-01 16:34:28 »

Can you write up a small test case showing the problem?  Something that I can look at and try to optimize?  Thanks
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline pepe

Junior Member




Nothing unreal exists


« Reply #3 - Posted 2004-09-01 17:51:05 »

of course.
here it is. i get a constant 4.5 speed increase factor going float. Shocked
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
public class FilteringTest
{

      public static void main(String args[])
      {
            FilteringTest ft=new FilteringTest();
            ft.startInt();
            ft.startFloat();
      }
     
      static final int nbPixels = 1024*1024; // change to set a new image size
     
      static final int loopCount = 500; // change to set a different image filtering count. Verification of the pixel will not match text printed. (i'm lazy)
     
      static int t;
      static float tf;


      void startInt()
      {
           
            long debut;
            long fin;
            int loop=0, imgloop;
            int r=0,g=0,b=0,a=0,pixel=0;
            debut=System.currentTimeMillis();
            int array[]=new int[nbPixels];
            fin=System.currentTimeMillis();
            System.out.println("init time:"+(fin-debut)+" ms");
           

           
            imgloop=0;
            for (;imgloop<nbPixels; imgloop++)
            {
                  array[imgloop]=0xffffffff;
    }

           

            System.out.println("test1: int pixels filtering");
            debut=System.currentTimeMillis();
            for (;loop<loopCount; loop++)
            {
                  imgloop=0;
                  for (;imgloop<nbPixels; imgloop++)
                  {
                        pixel=array[imgloop];
                        a=pixel>>>24;
                        r=(pixel>>>16)&0x000000ff;
                        g=(pixel>>>8)&0x000000ff;
                        b=pixel&0x000000ff;

                        t=((r+g+b)/3)-1; // performing a basic non weighted b&w. (-1 is to decrease the result, so rendering can be verified.)
                       
                        array[imgloop]=(t<<24)+(t<<16)+(t<<8)+t;
                  }
            }
            fin=System.currentTimeMillis();
            long test1=(fin-debut);
            System.out.println("Elapsed time: "+test1+" ms");
            int nbpix1=(int) ((nbPixels*loopCount)/((double)test1/1000.f));
            System.out.println(nbpix1+" pixels/second, that is "+(((double)nbpix1/(720*576*25))*100)+"% of real time video filtering.");
            System.out.println("random pixel result for validity of rendering: 0x"+ Integer.toHexString( array[ (int)(Math.random() * nbPixels) ] ));
            System.out.println("result should be:0x0a0a0a0a" +"\n\n");
            array=null;
      }
     

     
      void startFloat()
      {
           
            long debut;
            long fin;
            int loop=0, imgloop;
            float r=0.f,g=0.f,b=0.f,a=0.f;
            debut=System.currentTimeMillis();
            float array[]=new float[nbPixels*4];
            fin=System.currentTimeMillis();
            System.out.println("init time:"+(fin-debut)+" ms");


            imgloop=0;
            for (;imgloop<nbPixels; imgloop++)
            {
                  array[imgloop]=150000.f;
    }


            System.out.println("test2: float pixels filtering");
            debut=System.currentTimeMillis();
            for (;loop<loopCount; loop++)
            {
                  imgloop=0;
                  for (;imgloop < nbPixels ; imgloop+=4)
                  {
                        a=array[imgloop];
                        r=array[imgloop+1];
                        g=array[imgloop+2];
                        b=array[imgloop+3];

                        tf=((r+g+b)/3.f)-1; // performing a basic non weighted b&w. (-1 is to decrease the result, so validity of rendering can be verified.)

                        array[imgloop]=tf;
                        array[imgloop+1]=tf;
                        array[imgloop+2]=tf;
                        array[imgloop+3]=tf;
                  }
            }
            fin=System.currentTimeMillis();
            long test1=(fin-debut);
            System.out.println("Elapsed time: "+test1+" ms");
            int nbpix1=(int) ((nbPixels*loopCount)/((double)test1/1000.f));
            System.out.println(nbpix1+" pixels/second, that is "+(((double)nbpix1/(720*576*25))*100)+"% of real time video filtering.");
            System.out.println("random pixel result for validity of rendering:"+ array[ (int)(Math.random() * nbPixels) ] );
            System.out.println("result should be:149500" +"\n\n");
            array=null;
      }
}

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline Mark Thornton

Senior Member





« Reply #4 - Posted 2004-09-01 19:01:08 »

You should really put
t &= 0xFF
before creating the pixel.
In any case using

t = (((r+g+b)*5592406) >>> 24)-1;

is considerably faster than

t=((r+g+b)/3)-1;

although still not as good as the float based code (at least on my Athlon XP 2500+).
Offline pepe

Junior Member




Nothing unreal exists


« Reply #5 - Posted 2004-09-01 19:55:26 »

Quote
You should really put
t &= 0xFF
before creating the pixel.

That's interesting, but i think it's unnecessary. As the input can't be over 255, the result can't be illegal, that is, over 255.

Quote

In any case using

t = (((r+g+b)*5592406) >>> 24)-1;

is considerably faster than

t=((r+g+b)/3)-1;

True, but that kind of optimisation should belong to the compiler, not the coder.

Quote

although still not as good as the float based code (at least on my Athlon XP 2500+).

What is your ratio between each?

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline dranonymous

Junior Member




Hoping to become a Java Titan someday!


« Reply #6 - Posted 2004-09-01 20:03:23 »

Mark - Why is the version you presented faster?  My guess is that you avoid casting the ints to floats, but thats speculation.

Dr. A>
Offline Mark Thornton

Senior Member





« Reply #7 - Posted 2004-09-01 20:14:46 »

Quote

That's interesting, but i think it's unnecessary. As the input can't be over 255, the result can't be illegal, that is, over 255.

That -1 means the result can be -1!

Quote

True, but that kind of optimisation should belong to the compiler, not the coder.

While it may be practical for a compiler to replace division by a constant float with multiplication by the reciprocal, there are complications in doing the same thing for integers.
Quote

What is your ratio between each?

float about 5, vs int about 8 (seconds in both cases). The original int version takes 20.

dranonymous:
My revised int version is faster because muliplication is (usually) significantly faster than division. This is true for both integer and floating point, however I suspect that in the floating point case the division has been automatically replaced by a multiplication by the reciprocal.
Offline tom
« Reply #8 - Posted 2004-09-01 21:02:36 »

On my computer the integer version is a factor of 1.7 slower using Marks modification. Wich sounds about right as the integer version does twice the amount of work.

Quote
here it is. i get a constant 4.5 speed increase factor going float.

What did you expect?

Offline Mark Thornton

Senior Member





« Reply #9 - Posted 2004-09-01 21:15:35 »

Quote
What did you expect?

Current CPU have lots of hardware devoted to floating point, so that they can do simultaneous additions and multiplications. On the other hand there is usually only one shifter, so the integer version probably makes less effective use of the chip (less scope for operations to be performed in parallel).
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline pepe

Junior Member




Nothing unreal exists


« Reply #10 - Posted 2004-09-02 04:34:45 »

Quote

That -1 means the result can be -1!

I would be working with 8 bits store, that would be true. nevertheless, 0xFF in an int is 255, not -1... 0XFFFFFFFF would...

Quote
While it may be practical for a compiler to replace division by a constant float with multiplication by the reciprocal, there are complications in doing the same thing for integers.

Oh, interesting. why that?

Quote

float about 5, vs int about 8 (seconds in both cases). The original int version takes 20.

That 's a nice improvement, i agree. Too nice, in fact. there has to be something to do for that division..

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline swpalmer

JGO Coder




Where's the Kaboom?


« Reply #11 - Posted 2004-09-02 05:00:54 »

Quote
Anyone have other suggestions for things that need to be sped up?


Use of vecor instructions MMX/SSE/SSE2.. etc.  for common patterns found in manipulating RGBA ints - as above.


Offline NVaidya

Junior Member




Java games rock!


« Reply #12 - Posted 2004-09-02 15:02:56 »

With reference to this document from NIST (dated November 2002 !!! )
http://math.nist.gov/javanumerics/reports/jgfnwg-minutes-11-02.html

is FMA, in particular, currently in place in 1.5 (betas)?


Gravity Sucks !
Offline Mark Thornton

Senior Member





« Reply #13 - Posted 2004-09-02 15:15:36 »

Quote
is FMA, in particular, currently in place in 1.5 (betas)?

Unfortunately not.

JSR 84 which also proposed supporting FMA was withdrawn in March 2002 apparently due to difficulties in setting up the expert group.
http://jcp.org/en/jsr/detail?id=84
Offline dranonymous

Junior Member




Hoping to become a Java Titan someday!


« Reply #14 - Posted 2004-09-02 16:34:14 »

Pepe -  In the int version you shift the alpha value, but then you never did anything with it.  Did I miss where you manipulated the value again?

Mark/Pepe - Have you looked at the compiled byte code to see how it differs for those small shifting/masking areas?

Dr. A>
Offline pepe

Junior Member




Nothing unreal exists


« Reply #15 - Posted 2004-09-02 17:10:55 »

Quote
Pepe -  In the int version you shift the alpha value, but then you never did anything with it.  Did I miss where you manipulated the value again?

no. In first versions, the values were even all copied into temporary values, then pushed bacK.  That class is an expurged version of an other set where i tested how valuable it was to put pixel treatment in a method of an other class. In that old test, i had to extract all components, and pass them to filtering method, along with image array and poke offset.  That was a pretty interesting test, because doing so was faster than simply putting all code in a single loop. (server JIT only..)

Quote

Mark/Pepe - Have you looked at the compiled byte code to see how it differs for those small shifting/masking areas?

I would love to, but we can't have a look at how the JIT compiles bytecode, if that's what you meant.

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline dranonymous

Junior Member




Hoping to become a Java Titan someday!


« Reply #16 - Posted 2004-09-02 18:15:19 »

I realize you can't see how the JIT compiled it down to native assembly, but you could see the bytecode produced in the class files and compare them.  It would be interesting to see what was going on in each one.

Dr. A>
Offline Mark Thornton

Senior Member





« Reply #17 - Posted 2004-09-02 19:40:32 »

Byte code is very direct representation of the java source --- little or no optimisation is done at that point. Essentially all the optimisation is done by the JIT at runtime.
Offline pepe

Junior Member




Nothing unreal exists


« Reply #18 - Posted 2004-09-03 08:06:51 »

Byte code (compiled java source) is very basic. No optimisations are done there, in order for the JIT to recognise patterns, thus simplify its work and make it more efficient.
Assembly (compiled bytecode) is done by JIT, and us, mortals, don't have access to it.  That assembly can be way different than what is in the bytecode.

Home page: http://frederic.barachant.com
------------------------------------------------------
GoSub: java2D gamechmark http://frederic.barachant.com/GoSub/GoSub.jnlp
Offline NVaidya

Junior Member




Java games rock!


« Reply #19 - Posted 2004-09-04 19:04:33 »

Would this make Pepe feel better...Smiley


http://www.javaspecialists.co.za/archive/Issue054b.html

What's the deal with the % operator these days anyway ?

Gravity Sucks !
Offline crystalsquid

Junior Member




... Boing ...


« Reply #20 - Posted 2004-09-09 09:32:21 »

Step 1: Run MS Visual Studio (Boo Hiss!)
Step 2: Set up a new (empty) project.
Step 3: Go to debug settings, set exe to be your IE, and program arguments to point to the path of a simple HTML page with an applet on (I only deal with applets but you could do this with Java itself just as easily)
Step 4: Run a debugger session, and stop the debugger somewhere.

IF you are in an area called something like WIN32, or NT40.DLL or something, then you are in a system call.
If you are in <unknown> or <some hex string> then you are probably in the compiled code.
SOMETIMES you can tell more easily, as the compiled code will reside in memory with an address much greater than 0x40000000 (the default base address space for code loaded from an exe file).

It is then possible to track down specific parts of code by adding operations to add set constants to a static volatile variable, and then search the disassembly for the constants. Not saying its easy tho, but it can work if your desperate Smiley

- Dom
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (20 views)
2014-04-15 18:08:23

BurntPizza (16 views)
2014-04-15 03:46:01

UprightPath (29 views)
2014-04-14 17:39:50

UprightPath (14 views)
2014-04-14 17:35:47

Porlus (30 views)
2014-04-14 15:48:38

tom_mai78101 (55 views)
2014-04-10 04:04:31

BurntPizza (112 views)
2014-04-08 23:06:04

tom_mai78101 (212 views)
2014-04-05 13:34:39

trollwarrior1 (181 views)
2014-04-04 12:06:45

CJLetsGame (187 views)
2014-04-01 02:16:10
List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:05:20
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!