Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (512)
Games in Android Showcase (119)
games submitted by our members
Games in WIP (576)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  Array itteration speed up + other tips needed  (Read 5342 times)
0 Members and 1 Guest are viewing this topic.
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Posted 2004-02-18 21:25:50 »

I am wondering what programmers can do to speed up their programs.

according to a profiling tool, more than 45% of the CPU time is spent in one method:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
148  
149  
150  
151  
152  
            // encodes a given pixel at point (w,h)
            final public void encode(int w,int h,int k,int width,byte[][] frame, boolean verbose,boolean[][] ODCodes,boolean [][] IDCodes,boolean[][] huffCodes, BitOutputStream controlW,BitOutputStream IDW,BitOutputStream ODW,BitOutputStream CCW) throws IOException
            {
                  j=h*width+w;
                  currentPixelRed=frame[j][0];
                  currentPixelGreen=frame[j][1];
                  currentPixelBlue=frame[j][2];
                 

                  if (k>-1)
                  {
                        comparatorPixelRed=frame[k][0];
                        comparatorPixelGreen=frame[k][1];
                        comparatorPixelBlue=frame[k][2];
                        Rdiff=256+((currentPixelRed & 0xFF)-(comparatorPixelRed & 0xFF));
                        Gdiff=256+((currentPixelGreen & 0xFF)-(comparatorPixelGreen & 0xFF));
                        Bdiff=256+((currentPixelBlue & 0xFF)-(comparatorPixelBlue & 0xFF));
                  }
                 

                  innerBitCount=0;
                  // Determine the number of bits compressing via the InnerDifferences or saving Colour Component will give.
                       
                  innerBitCount+=huffCodes[((int)(currentPixelRed) & 0xff)].length+1;
                       
                  RGdiff= 256+((currentPixelGreen & 0xFF)-(currentPixelRed & 0xFF));
                  if (IDCodes[RGdiff].length>0 && IDCodes[RGdiff].length<huffCodes[(currentPixelGreen & 0xFF)].length)
                  {
                        innerBitCount+=IDCodes[RGdiff].length+1;
                  }
                  else
                  {
                        innerBitCount+=huffCodes[((int)(currentPixelGreen) & 0xff)].length+1;
                  }
                             
                  RBdiff= 256+((currentPixelBlue & 0xFF)-(currentPixelRed & 0xFF));
                  if (IDCodes[RBdiff].length>0 && IDCodes[RBdiff].length<huffCodes[(currentPixelBlue & 0xFF)].length)
                  {
                        innerBitCount+=IDCodes[RGdiff].length+1;
                  }
                  else
                  {
                        innerBitCount+=huffCodes[((int)(currentPixelBlue) & 0xff)].length+1;
                  }
                 
                  // if the pixel we are comparing the current pixel to is in range and all the RGB components of the current pixel are in
                  // the Outer Differences and coding using the OuterDifferences yields better compression than via the InnerDifferences write.write the OD codes.
                  if (k>-1 && ODCodes[Rdiff].length>0 && ODCodes[Gdiff].length>0 && ODCodes[Bdiff].length>0 && (innerBitCount>ODCodes[Rdiff].length+ODCodes[Gdiff].length+ODCodes[Bdiff].length+1))
                  {
                             
                        controlW.value=controlW.value<<1 |  bitMask; controlW.count++;
                        controlW.addToBuffer();

                        for (int i=0;i<ODCodes[Rdiff].length;i++)
                        {
                              ODW.value=ODW.value<<1; ODW.count++;
                              if (ODCodes[Rdiff][i]) ODW.value=ODW.value| bitMask;
                              ODW.addToBuffer();
                        }
                       
                        for (int i=0;i<ODCodes[Gdiff].length;i++)
                        {
                              ODW.value=ODW.value<<1; ODW.count++;
                              if (ODCodes[Gdiff][i]) ODW.value=ODW.value| bitMask;
                              ODW.addToBuffer();
                        }
                       
                        for (int i=0;i<ODCodes[Bdiff].length;i++)
                        {
                              ODW.value=ODW.value<<1; ODW.count++;
                              if (ODCodes[Bdiff][i]) ODW.value=ODW.value| bitMask;
                              ODW.addToBuffer();
                        }
                  }
                  else
                  {
                 
                        // set Red component to its huffman code
                        red=((int)(currentPixelRed) & 0xff);

                        controlW.value=controlW.value<<1;
                        controlW.count++;
                        controlW.addToBuffer();

                             
                        for (int i=0;i<huffCodes[red].length;i++)
                        {
                              CCW.value=CCW.value<<1; CCW.count++;
                              if (huffCodes[red][i]) CCW.value=CCW.value| bitMask;
                              CCW.addToBuffer();
                        }
           
                             
                        // if the Red-Green Difference is in the Inner Differences encode the component using Inner Differences
                        // unless the Green component huffman code is less than the difference code in which case the Green component huffman code is written.
                             
                       
                        RGdiff= 256+((currentPixelGreen & 0xFF)-(currentPixelRed & 0xFF));
                        if (IDCodes[RGdiff].length>0 && IDCodes[RGdiff].length<huffCodes[(currentPixelGreen & 0xFF)].length)
                        {
                              controlW.value=controlW.value<<1 |  bitMask; controlW.count++;
                              controlW.addToBuffer();
                             
                              for (int i=0;i<IDCodes[RGdiff].length;i++)
                              {
                                    IDW.value=IDW.value<<1; IDW.count++;
                                    if (IDCodes[RGdiff][i]) IDW.value=IDW.value| bitMask;
                                    IDW.addToBuffer();
                              }
                        }
                        else
                        {
                              controlW.value=controlW.value<<1; controlW.count++;
                              controlW.addToBuffer();
                              for (int i=0;i<huffCodes[(currentPixelGreen & 0xFF)].length;i++)
                              {
                                    CCW.value=CCW.value<<1; CCW.count++;
                                    if (huffCodes[(currentPixelGreen & 0xFF)][i]) CCW.value=CCW.value| bitMask;
                                    CCW.addToBuffer();
                              }
                                   
                        }
                             
                        // if the Red-Blue Difference is in the Inner Differences encode the component using Inner Differences
                        // unless the Blue component huffman code is less than the difference code in which case the Blue component huffman code is written.

                        RBdiff= 256+((currentPixelBlue & 0xFF)-(currentPixelRed & 0xFF));
                        if (IDCodes[RBdiff].length >0 && IDCodes[RBdiff].length<huffCodes[(currentPixelBlue & 0xFF)].length)
                        {
                              controlW.value=controlW.value<<1 |  bitMask; controlW.count++;
                              controlW.addToBuffer();
                             
                              for (int i=0;i<IDCodes[RBdiff].length;i++)
                              {
                                    IDW.value=IDW.value<<1; IDW.count++;
                                    if (IDCodes[RBdiff][i]) IDW.value=IDW.value| bitMask;
                                    IDW.addToBuffer();
                              }
                        }
                        else
                        {
                              controlW.value=controlW.value<<1; controlW.count++;
                              controlW.addToBuffer();
                              for (int i=0;i<huffCodes[(currentPixelBlue & 0xFF)].length;i++)
                              {
                                    CCW.value=CCW.value<<1; CCW.count++;
                                    if (huffCodes[(currentPixelBlue & 0xFF)][i]) CCW.value=CCW.value| bitMask;
                                    CCW.addToBuffer();
                              }
                        }
                       
                  }



This method is called over 300000 times, depending on the frame's size, and is called by the following block of code:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
                        ///// encode pixels /////
                       
                        // Generate Outer Differences
     
                        // encode Pixels using Scan Method 1
                        if (mode==1)
                        {
                       
                              prev=-1;
                              for (h=0;h<height;h++)
                              {
                                    for (w=0;w<width;w++)
                                    {
                                          encode.encode(w,h,h*width+w+prev,width,frame,verbose,ODCodes,IDCodes,huffCodes,controlW,IDW,ODW,CCW);
                                    }
                                   
                              }
                        }
                        // encode Pixels using Scan Method 2
                        else if (mode==2)
                        {
     
                              prev=-width;
                       
                              for (w=0;w<width;w++)
                              {
                                    for (h=0;h<height;h++)
                                    {
                                          encode.encode(w,h,h*width+w+prev,width,frame,verbose,ODCodes,IDCodes,huffCodes,controlW,IDW,ODW,CCW);
                                    }
                              }
                        }
                        // encode Pixels using Scan Method 3
                        else if (mode==3)
                        {
                       
                              prev=-width;
                              for (h=0;h<height;h++)
                              {
                                    for (w=0;w<width;w++)
                                    {
                                          encode.encode(w,h,h*width+w+prev,width,frame,verbose,ODCodes,IDCodes,huffCodes,controlW,IDW,ODW,CCW);
                                    }
                                   
                              }
                        }
                        // encode Pixels using Scan Method 4
                        else
                        {
                       
                              for (h=0;h<height;h++)
                              {
                                    encode.encode(0,h,-1,width,frame,verbose,ODCodes,IDCodes,huffCodes,controlW,IDW,ODW,CCW);
                              }
                       
                              prev=-1;
                              for (w=1;w<width;w++)
                              {
                                   
                                    for (h=0;h<height;h++)
                                    {
                                          encode.encode(w,h,h*width+w+prev,width,frame,verbose,ODCodes,IDCodes,huffCodes,controlW,IDW,ODW,CCW);
                                    }
                                   
                              }
                        }


As you can see there are four different 'scan modes' which dictate how the frame is scanned.
I have read that calling a method in java is much more expensive than in C/C++ I am wondering whether there is a pattern that allows me to make four classes? that basically have the same functionalitly (with out the encode method call, i.e. inline ) but differ on how the frame is scanned? And do do this without uncessary replication of the code?


Also, i would love to hear how c/c++ programs are able to quickly itterate over an array... for example in video codecs... how are they able to itterate over the pixels of a frames 30 times a second? i can bearly perform 1 time per second.

Any help is very much appreciated!
Offline Jeff

JGO Coder




Got any cats?


« Reply #1 - Posted 2004-02-19 01:58:34 »

Quote
I have read that calling a method in java is much more expensive than in C/C++


Which hopefully will teach you not to believe everything you read.

Thats errant nonsense.

Quote

I am wondering whether there is a pattern that allows me to make four classes? that basically have the same functionalitly (with out the encode method call, i.e. inline ) but differ on how the frame is scanned? And do do this without uncessary replication of the code?


Sounds like you want tor efactor into a method with an invocation of an obejct method at the cneter of the loop. Depending on what obejct is invoked that will rsult in different code at the center.  Ofcourse that will defeat inlining at the center of the  loop so if this is your bottleneck it may be better to have seperate, if redundant, code paths.


Quote

Also, i would love to hear how c/c++ programs are able to quickly itterate over an array... for example in video codecs... how are they able to itterate over the pixels of a frames 30 times a second? i can bearly perform 1 time per second.


Soudns more like your java code is messed up.   Thatsa lot of Java code to wade through though, coudl you pare your exampel down to a minimal case?

I used to code maximally fast C code. for video drivers  You dont deference arrays if you want that.  You set a pointer to the start of the data and auto-increment it.

You cannot walk a pointer in Java. Luckily though Java is smarter then C.  (Assuming this is Java2 and not J2ME) It will optimzize for you in ways C cannot.  The biggest key to looping over arrays in Java2 is that you make the loop clear to the compiler. If the end points of a loopa re predictable then it will elminate all the bounds checking except at the end points.  If they are not the you will get bounds checks on every array access and drop to about half speed.


Edit: Tooka   quick look. Jeez this is a mess.  It looks to me almost like you are  looping outside the call but fetching the indvidual pixels seperately inside of each call.  I suspect you are confusing the poor optimzier to the point where it is doing bounds cehcks for every pixel. Simplify your code. Al LEAST do your pixel access in the actual loop thats scanning the image and pass in the pixel valeus rather then doing what is effectively unpredictable array access in every call.

BTW this would *SUCK* in C too because C, beign worse then java abotu inlining, would likely be doing function calls for every damn pixel.

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #2 - Posted 2004-02-19 02:45:38 »

Quote


Thats errant nonsense.



ah, thats good to hear... tho in the benchmarks i have seen java does lag a bit... but you believe that it is probably not the cause?

Quote


Sounds like you want tor efactor into a method with an invocation of an obejct method at the cneter of the loop. Depending on what obejct is invoked that will rsult in different code at the center.  Ofcourse that will defeat inlining at the center of the  loop so if this is your bottleneck it may be better to have seperate, if redundant, code paths.



That is what i thought i might have to do.... it will be annoying to maintain the code but speed is an issue.


Quote


I used to code maximally fast C code. for video drivers  You dont deference arrays if you want that.  You set a pointer to the start of the data and auto-increment it.

You cannot walk a pointer in Java. Luckily though Java is smarter then C.  (Assuming this is Java2 and not J2ME) It will optimzize for you in ways C cannot.  The biggest key to looping over arrays in Java2 is that you make the loop clear to the compiler. If the end points of a loopa re predictable then it will elminate all the bounds checking except at the end points.  If they are not the you will get bounds checks on every array access and drop to about half speed.



How would one let the interpreter know that the end points of the loops are 'predictable'?


Quote


Edit: Tooka   quick look. Jeez this is a mess.  It looks to me almost like you are  looping outside the call but fetching the indvidual pixels seperately inside of each call.  I suspect you are confusing the poor optimzier to the point where it is doing bounds cehcks for every pixel. Simplify your code. Al LEAST do your pixel access in the actual loop thats scanning the image and pass in the pixel valeus rather then doing what is effectively unpredictable array access in every call.



Yes, that is what i am doing as it is the only way i know to achieve the functionality without replicated code in different paths. But i may just replicate the code so this issue  should hopefully dissappear.

I am not very knowledgable about how the optimizer works so i did not know that my pixel array access inside the method would cause confusion to the optimizer. I will put it on the outside.

Thanks for the tips!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Jeff

JGO Coder




Got any cats?


« Reply #3 - Posted 2004-02-19 03:04:45 »

My pleasure. Keep me up to date  on your progress and I'll try to help as I can and have the time. (Im incredibly busy the next 4 weeks but ill TRY to keep an eye  on this topic.)

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #4 - Posted 2004-02-19 03:29:43 »

I will modify the code to have seperate paths of replicated code in the next few hourse and see how the performance chages.

I'll post the results here.

If you have not gathered by the code i have posted, i am/have developed a lossless video codec. At the moment it is extreamly slow, my aim is to optimize the code such that on play back it can peform the standard 30 frames per second at 720x480. At the moment i am lucky to achieve 2 frames per second.

I am at a stage that optimization is needed before i go and try to improve the compression of my codec by useing different colour spaces and motion estimatiton.

If you are interested i can send the full source, It does have a decent amount of commenting Wink
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #5 - Posted 2004-02-19 05:43:29 »

Here are the results:
1  
2  
3  
4  
5  
6  
7  
Old build (as listed above)   
Sun jdk 1.4.01:  409 msec per frame
JET executable:  265 msec per frame

New build (with seperate exec paths)
Sun jdk 1.4.01:  404 msec per frame
JET executable:  273 msec per frame


The time was the average time per frame for the compression of a sequence of 25 frames repeated 5 times.


From this i can think of three reasons why nothing has changed:

1. I somehow incorrectly implemented the seperate code execution paths
2. The sun interpreter is optimizing the two versions the same
3. I have misstaken the bottleneck.

I am wondering how i might go about determining which of these reasons is correct, or some, or all or none of them are.

ta!


Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #6 - Posted 2004-02-19 12:37:59 »

Quote
From this i can think of three reasons why nothing has changed:

1. I somehow incorrectly implemented the seperate code execution paths
2. The sun interpreter is optimizing the two versions the same
3. I have misstaken the bottleneck.

I am wondering how i might go about determining which of these reasons is correct, or some, or all or none of them are.

Well, at a glance, I can see a few bad things that you are doing.

1) You're using 2D arrays. Generally, you're better off with 1D arrays. Especially for cases like your "frame". That should not be a 2D byte array. It should just be a simple int[] with the RGB values packed into a single int.

2) Redundant bit-masking. It looks like you are constantly re-doing conversions from signed bytes to unsigned ints. You should just read the rgb values from your (new) packed int array as ints.

3) Jeff's point should hold too. To the optimizer, your "j" index is going to be a random value into the array. You're incurring an array bounds check every single time you access any of your arrays.

I imagine that there are other optimizations, low-level and algorithmic, that can be applied to your code, but in it's current state, it's too obfuscated for me to spend any more time looking at it.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline Mark Thornton

Senior Duke





« Reply #7 - Posted 2004-02-19 12:46:48 »

I would suggest that you use a standard compression format and the new imageio package. It is easy to use and the low level optimisation is someone elses problem.
Offline Jeff

JGO Coder




Got any cats?


« Reply #8 - Posted 2004-02-19 14:51:24 »

best ways to determine a bottleneck is with a profiler.  If you think you maybe be misinterpreting what its telling you show us the output and maybe we can help.

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #9 - Posted 2004-02-19 15:29:35 »

Quote
best ways to determine a bottleneck is with a profiler.  If you think you maybe be misinterpreting what its telling you show us the output and maybe we can help.

He's essentially already done that - he's pointing to a single function that's eating up all of his cpu and wondering how he can make that function faster. People can point out some micro-inefficiencies (ala array-bounds checks), but he should clean up his code some so people can point out any obvious algorithmic inefficiencies he has.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Jeff

JGO Coder




Got any cats?


« Reply #10 - Posted 2004-02-19 16:26:22 »

Quote

He's essentially already done that


Hey TR.

He actually brought up the issue of misinterpreting his profile (look above), I was suggesting how he could confirm/deny that.

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline b0b0b0b

Junior Newbie




Java games rock!


« Reply #11 - Posted 2004-02-19 16:27:25 »

One suspicious looking thing here to me is:
BitOutputStream.addToBuffer() since I can't see what it's doing.


Looks like you're doing a lot of this stuff too:
1  
2  
3  
4  
5  
6  
    for (int i=0;i<huffCodes[red].length;i++) 
    {
     CCW.value=CCW.value<<1; CCW.count++;
     if (huffCodes[red][i]) CCW.value=CCW.value| bitMask;
     CCW.addToBuffer();
    }


I think someone said this or something similar, but you don't want 2d boolean arrays.  

An optimization would be to use 2d int arrays like this, but less lame:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
assert huffCodes.length > 0 && (32 * (huffCodes.length-1) >= huffCodes[0]);
    for (int i=1;i<huffCodes[red].length-1;i++)
    {
     CCW.value = huffCodes[i];
     CCW.count += 32;
     CCW.addToBuffer();
    }
    if (huffCodes[red][0] > 0) {
     CCW.value = huffCodes[red][1];
     CCW.count += huffCodes[red][0];
     CCW.addToBuffer();
    }

Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #12 - Posted 2004-02-19 16:52:08 »

Quote
I would suggest that you use a standard compression format and the new imageio package. It is easy to use and the low level optimisation is someone elses problem.

I don't think the imageio package has video codecs (i.e. something that is capable of inter-frame compression). Aside from that, the only lossless image format that imageio supports (OOTB) is PNG. I'm not sure that is appropriate.

Just out of curiosity, though, what do you plan on using the codec for? There are few applications that wouldn't be better off with a lossy codec. (Medical imaging is a good example of one of them).

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline Mark Thornton

Senior Duke





« Reply #13 - Posted 2004-02-19 17:43:40 »

Quote

Just out of curiosity, though, what do you plan on using the codec for? There are few applications that wouldn't be better off with a lossy codec. (Medical imaging is a good example of one of them).

On the images I compress the size difference between JPEG and PNG is small.
Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #14 - Posted 2004-02-19 18:16:30 »

Quote

On the images I compress the size difference between JPEG and PNG is small.

I don't understand your point. Are you trying to imply that there is no reason to use lossy encoding?

You must be using a very high quality level for JPEG or be working with abstract images. The compression used for PNG is deflate (the same thing you see in gzip) which performs nowhere near the compression on rounded DCTs in lossy JPEG.

Don't even get me started about the differences you'd see with JPEG2000. The trade off you spend in CPU for the wavelet-based compression pays off in spades and can be fairly minimized due to its amenability to SIMD operations.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #15 - Posted 2004-02-19 18:37:01 »

Quote


1) You're using 2D arrays. Generally, you're better off with 1D arrays. Especially for cases like your "frame". That should not be a 2D byte array. It should just be a simple int[] with the RGB values packed into a single int.


hmm, i didnt know that 2D array access was any more expensive than a 1D array Access...

Quote

2) Redundant bit-masking. It looks like you are constantly re-doing conversions from signed bytes to unsigned ints. You should just read the rgb values from your (new) packed int array as ints.



Yeah, i noticed this almost imediately after i posted the code, i had changed it such that the conversion only occurs once at the beginning.

However if there was any speed increase doing this... it seems to be negligable.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
                              j=h*width+w;
                              currentPixelRed=frame[j][0] & 0xFF;
                              currentPixelGreen=frame[j][1] & 0xFF;
                              currentPixelBlue=frame[j][2] & 0xFF;
                             
           
                              if (k>-1)
                              {
                                    comparatorPixelRed=frame[k][0]& 0xFF;
                                    comparatorPixelGreen=frame[k][1]& 0xFF;
                                    comparatorPixelBlue=frame[k][2]& 0xFF;
                                    Rdiff=256+currentPixelRed-comparatorPixelRed;
                                    Gdiff=256+currentPixelGreen-comparatorPixelGreen;
                                    Bdiff=256+currentPixelBlue -comparatorPixelBlue;
                              }


Quote


3) Jeff's point should hold too. To the optimizer, your "j" index is going to be a random value into the array. You're incurring an array bounds check every single time you access any of your arrays.



how do i let the optimizer know that j is never out of bounds?

Quote

I imagine that there are other optimizations, low-level and algorithmic, that can be applied to your code, but in it's current state, it's too obfuscated for me to spend any more time looking at it.


I have not put any effort into obsfuscating the code... or is it too hard to read due to bad programming practices? please tell me so i can learn.

Cheers for your input
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #16 - Posted 2004-02-19 18:41:44 »

Quote
I would suggest that you use a standard compression format and the new imageio package. It is easy to use and the low level optimisation is someone elses problem.


The whole point is that i am trying to make my own... for kicks Smiley

All the lossless compressors i have seen do not compress as much as they could, i.e. do not use motion estimation or other colourspaces.

I want to see how much lossless compression I can get using my own techniques, which are based on heavily extended work by Tadas Remencius.


Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #17 - Posted 2004-02-19 18:43:33 »

Quote
best ways to determine a bottleneck is with a profiler.  If you think you maybe be misinterpreting what its telling you show us the output and maybe we can help.


I had used Jprofiler.... I will workout how to save the output of the logger and then post the results in the next few hours.
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #18 - Posted 2004-02-19 18:45:02 »

Quote

He's essentially already done that - he's pointing to a single function that's eating up all of his cpu and wondering how he can make that function faster. People can point out some micro-inefficiencies (ala array-bounds checks), but he should clean up his code some so people can point out any obvious algorithmic inefficiencies he has.

God bless,
-Toby Reyelts


How can i do that? i believe i have cleaned it as much as i can... i will need pointers on how to futher clean the code.

I will post a link to the full source when in a few hours.


Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #19 - Posted 2004-02-19 18:55:34 »

Quote
One suspicious looking thing here to me is:
BitOutputStream.addToBuffer() since I can't see what it's doing.


Looks like you're doing a lot of this stuff too:
1  
2  
3  
4  
5  
6  
    for (int i=0;i<huffCodes[red].length;i++) 
    {
     CCW.value=CCW.value<<1; CCW.count++;
     if (huffCodes[red][i]) CCW.value=CCW.value| bitMask;
     CCW.addToBuffer();
    }


I think someone said this or something similar, but you don't want 2d boolean arrays.  

An optimization would be to use 2d int arrays like this, but less lame:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
assert huffCodes.length > 0 && (32 * (huffCodes.length-1) >= huffCodes[0]);
    for (int i=1;i<huffCodes[red].length-1;i++)
    {
     CCW.value = huffCodes[i];
     CCW.count += 32;
     CCW.addToBuffer();
    }
    if (huffCodes[red][0] > 0) {
     CCW.value = huffCodes[red][1];
     CCW.count += huffCodes[red][0];
     CCW.addToBuffer();
    }



The problem i can see from your implementation is that i will still need to get the individual bits out of the value as the code may only be 1 or two bits but we are adding the full 32 bits so we would need to substract the other bits. I am not sure whether it is gain anything. But it is owrth thinking about!

Here is the BitOutputStream utility class i have made.




1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
import java.io.*;

public class BitOutputStream
      {
            int value;                        // The byte in which the encoded bits are firstly stored.
            int count;                        // The number of bits written into value.
            byte[] buffer;                  // A byte buffer which is filled with 'value' each time value is full. Used for wirting to file.
            int buffCount;                  // The current number of 'values' written into the buffer.
            int masterCount;            // The overall count of bits that have been written
            OutputStream fos;
           
            public BitOutputStream(OutputStream fos1)
            {
                  fos=fos1;
                  value=0;
                  count=0;
                  buffer=new byte[4096];
                  buffCount=0;
                  masterCount=0;
            }
           
            // Writes the passed value (temp) to the file using the given number of bits
            public void write(int temp,int bits) throws IOException
            {

     

                        for (int j = 0, mask = 1; j < bits; j++, mask <<= 1)
                        {  
                              value=value<<1;count++;
                              if  ((temp & mask) > 0)
                              {
                                    value=value|0x01;
                              }
                              addToBuffer();
                        }
            }
     
            // adds bits stored in 'value' to a buffer which will be saved to a file
            // if the current bit count since last storing into the buffer is less than 8 then return without adding it to the buffer
            public void addToBuffer() throws IOException
            {
                  masterCount++;
           
                  if (count<8) return;

                  //byte temp=(byte) (value);
                  buffer[buffCount]=(byte) (value);;
                  buffCount++;
           
                  if (buffCount==buffer.length)
                  {
                        fos.write(buffer,0,buffCount);
                        buffCount=0;
                  }
                  value=0;
                  count=0;
            }
           
            public void write(byte[] b) throws IOException
            {
                  flush();
                  fos.write(b);
            }
           
            public void flush() throws IOException
            {
                  // pad out the last byte if necessary
                 
                  if (count>0)
                  {
                        value=value<<(8-count);
                        count=8;
                        addToBuffer();
                  }
                  if (buffCount>0)
                  {
                       
                        fos.write(buffer,0,buffCount);
                  }
                  buffCount=0;
            }
           
            public void close() throws IOException
            {
                  flush();
                  fos.close();
            }
      }

Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #20 - Posted 2004-02-19 19:01:52 »

Quote

I don't think the imageio package has video codecs (i.e. something that is capable of inter-frame compression). Aside from that, the only lossless image format that imageio supports (OOTB) is PNG. I'm not sure that is appropriate.

Just out of curiosity, though, what do you plan on using the codec for? There are few applications that wouldn't be better off with a lossy codec. (Medical imaging is a good example of one of them).

God bless,
-Toby Reyelts


There is no real goal in mind... it is just a personal programming project. I have semi-given up on programming games... i wanted to try making some thing useful  Tongue

The compressor i have made works best in the same domain as the PNG format, i.e. cartoons, CGI, etc.

In some areas, my key frame compressor beats PNG and some areas it is worse.

Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #21 - Posted 2004-02-19 19:29:33 »

Quote

Here is the BitOutputStream utility class i have made.

Umm.... what are you using for the OutputStream? If you're doing any real disk i/o here, it's going to dwarf the time that you're spending on the CPU - which would not be a good benchmark for how fast you can actually encode or decode.

Some more questions - What JVM are you using and are you using -server when running?

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #22 - Posted 2004-02-19 20:15:01 »

Quote

Umm.... what are you using for the OutputStream? If you're doing any real disk i/o here, it's going to dwarf the time that you're spending on the CPU - which would not be a good benchmark for how fast you can actually encode or decode.

Some more questions - What JVM are you using and are you using -server when running?

God bless,
-Toby Reyelts


I am running jdk 1.4.01 using the normal client setting.

The stream is ultimately a byte buffer as seen below:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
                        FileOutputStream fos2=new FileOutputStream(outFile);
                        GZIPOutputStream fos1=new GZIPOutputStream(fos2);
                        DataOutputStream fos=new DataOutputStream(fos1);
                       
                 
                        ByteArrayOutputStream controlS=new ByteArrayOutputStream();
                        ByteArrayOutputStream IDS=new ByteArrayOutputStream();
                        ByteArrayOutputStream ODS=new ByteArrayOutputStream();
                        ByteArrayOutputStream CCS=new ByteArrayOutputStream();
                       
                        BitOutputStream controlW=new BitOutputStream(controlS);
                        BitOutputStream IDW=new BitOutputStream(IDS);
                        BitOutputStream ODW=new BitOutputStream(ODS);
                        BitOutputStream CCW=new BitOutputStream(CCS);

Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #23 - Posted 2004-02-19 21:12:49 »

here are the results of a profiling session:










The source code and a sample compressed sequence can be found at: http://hdiff.mystarship.com

The sample is probably too biased, however i do not have the webspace to upload a more taxing sample.
Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #24 - Posted 2004-02-19 21:19:07 »

Quote

I am running jdk 1.4.01 using the normal client setting.

The stream is ultimately a byte buffer as seen below:

Comments:

1) About 2D arrays vs 1D arrays:

1D arrays are faster for several reasons.

a) They are guaranteed to be contiguous. This means that they are much more likely to work well with your processor's cache prefetch mechanism. In order to take the best advantage of this, you should process the pixel data in the order it is stored in memory. The importance of coding to the cache can't be overstated.

b) Accessing a single array only costs 1 bounds check. In your case, you're paying six (3*2) bounds checks each time you pull out a pixel when you should only be paying for one check - and if you write your code better, 0.

c) 2D arrays cost linear time to allocate in the 2nd dimension. 1D arrays cost O(1) to allocate.

2) How you write optimizer friendly code for array bounds:

Write your for loops so that they process array elements linearly, i.e:

for ( int i = 0; i < myArray.length; ++i ) {
// array indexing in here..
}

Don't play with the index in the loop. Do the actual array indexing in the same function as you're doing the looping. If you must put the actual array indexing in a different function, make the function small enough that Hotspot will inline it.

3) You're not setting an intial size for ByteArrayOutputStream, so it's growing needlessly. In fact, you should probably get rid of ByteArrayOutputStream altogether and just allocate an array that you can guarantee is large enough for a compressed frame. This will get rid of a lot of needless copying and processing going on.

4) In BitOutputStream.write() you're doing a for loop. Practically anywhere you're performing a loop like that, you should go ahead and perform loop unrolling. You'll have to play around with the amount of unrolling to see what works best.

5) In general, you want to be careful about making function calls. I know Jeff told you that function calls in Java are just as fast as than those in C++, but, even in C++, you avoid trying to make millions of function calls per second. (i.e. a function call per pixel in a 300K pixel image). HotSpot may or may not inline the function. but it's less likely to do so if the function is large (because, in general, it's bad policy - you end up overflowing the instruction cache much more often).

6) On comparing Java codec performance vs C++ codec performance:

A well written C++ codec is going to get all of its performance gains by managing the processor's cache very carefully and exploiting parallelism through loop unrolling and SIMD instructions. It may also be able to avoid some array bounds checks that you may not.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #25 - Posted 2004-02-19 21:34:37 »

!

that is very informative!

They dont teach you this sort of thing at uni... i had no idea!
Offline Jeff

JGO Coder




Got any cats?


« Reply #26 - Posted 2004-02-19 21:57:30 »

Quote


hmm, i didnt know that 2D array access was any more expensive than a 1D array Access...


Common sense here. Java arrays are bounds checked.  2 dimensions means twice as many checks.


Yeah, i noticed this almost imediately after i posted the code, i had changed it such that the conversion only occurs once at the beginning.

However if there was any speed increase doing this... it seems to be negligable.

Quote

1  
2  
3  
4  
5  
                              j=h*width+w;
                              currentPixelRed=frame[j][0] & 0xFF;
                              currentPixelGreen=frame[j][1] & 0xFF;
                              currentPixelBlue=frame[j][2] & 0xFF;
    



how do i let the optimizer know that j is never out of bounds?


At least that way you cant because there is no deterministic guarantee that it is.  Its dependent on other values.  You get a bad value in width and BOOM you are out of range.

The only way the compielr can know an index stays within a given range is if it has full control over its value and it has well known end points.  The most basic example is a for lopp where the index looped on is never assigned to inside of the loop body.

Quote

I have not put any effort into obsfuscating the code... or is it too hard to read due to bad programming practices? please tell me so i can learn.

Cheers for your input


yah thats what they are telling you in their shorthand.  The flow and organization of the code is really really hard to follow.  I suspect this has to do with either attempts to prematurely optimize or attempts to over-factor your code and avoid all redundancies.

Re write it so it is simple and easy to read and follow.  Clean and clear code is ALWAYS easier to tune.


Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline moogie

JGO Knight


Medals: 13
Projects: 6
Exp: 10 years


Java games rock!


« Reply #27 - Posted 2004-02-19 22:15:55 »

Quote

Common sense here. Java arrays are bounds checked.  2 dimensions means twice as many checks.


hmm, i guess i never really thought enough into it to ever think about the bound checking increases for each dimension... however that makes perfect sense!


Quote

The only way the compielr can know an index stays within a given range is if it has full control over its value and it has well known end points.  The most basic example is a for lopp where the index looped on is never assigned to inside of the loop body.


I am having difficulty visualizing such a loop. Is there an online resource that explains this and related topics?

would this loop be of a type where the optimizer knows that it will be always inrange?

1  
2  
3  
4  
5  
6  
7  
Final Static Max=20;
int[] array = new array[20];

for (int i=0;i<Max;i++)
{
    array[i]++;
}


Quote


yah thats what they are telling you in their shorthand.  The flow and organization of the code is really really hard to follow.  I suspect this has to do with either attempts to prematurely optimize or attempts to over-factor your code and avoid all redundancies.

Re write it so it is simple and easy to read and follow.  Clean and clear code is ALWAYS easier to tune.



heh, i thought i had. I am not sure how i would go about cleaning it even more.... again are there any good resources to help me there?
Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #28 - Posted 2004-02-19 22:18:57 »

Quote
I am running jdk 1.4.01 using the normal client setting.

Oops, forgot to mention that you should use -server. It's going to perform much stronger optimizations. You'll also need to run your code for a little while to let the optimizer get a feel for what functions it needs to inline, etc... before performing any timings. You should probably look for and read some postings in this forum about the benchmarking process.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline rreyelts

Junior Duke




There is nothing Nu under the sun


« Reply #29 - Posted 2004-02-19 22:33:35 »

Quote
!

that is very informative!

They dont teach you this sort of thing at uni... i had no idea!

Probably because they don't want their students going around writing a lot of terrible code. The techniques I talked about should only be used in very performance intensive situtations - i.e. developing a codec. You're more likely to understand these kind of things if you are deep into computer architecture.

God bless,
-Toby Reyelts

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Pages: [1] 2
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Longarmx (49 views)
2014-10-17 03:59:02

Norakomi (39 views)
2014-10-16 15:22:06

Norakomi (31 views)
2014-10-16 15:20:20

lcass (35 views)
2014-10-15 16:18:58

TehJavaDev (66 views)
2014-10-14 00:39:48

TehJavaDev (65 views)
2014-10-14 00:35:47

TehJavaDev (55 views)
2014-10-14 00:32:37

BurntPizza (72 views)
2014-10-11 23:24:42

BurntPizza (43 views)
2014-10-11 23:10:45

BurntPizza (84 views)
2014-10-11 22:30:10
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!