Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (475)
Games in Android Showcase (106)
games submitted by our members
Games in WIP (530)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Slow array filling  (Read 2109 times)
0 Members and 1 Guest are viewing this topic.
Offline EgonOlsen
« Posted 2003-10-08 22:05:09 »

Hi all,

please have a look at this great piece of code:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
public class ArrayTest {

   private static int length=196608;
   private static int[] pixelArray=new int[length];
   private static int[] zbufferArray=new int[length];
   private static int color=0;

   public static void main(String[] args) {
      do {
         long start=System.currentTimeMillis();
         for (int z=0; z<100; z++) {
           clearArray(color);
         }
         System.out.println(System.currentTimeMillis()-start);
      } while (true);
   }

   private static void clearArray(int color) {
      /*
      for (int i=0; i<length; i++) {
            pixelArray[i]=color;
            zbufferArray[i]=-2147483647;
      }
      */


      for (int i=0; i<length; i++) {
            pixelArray[i]=color;
      }

      for (int i=0; i<length; i++) {
            zbufferArray[i]=-2147483647;
      }
   }
}


When running this on a P4HT@3.2Ghz using the 1.4.2 VM in client mode, each loop takes around 180ms. When using the commented-out array-filling instead (the one that fills both arrays in one loop), i'm at 450ms. But it's getting even more strange: This test is not a real-world-app (of course not...), but my software renderer is and it's basically doing the same thing. In that application, i used to use the version with the single loop and it starts fast (like the 180ms version) but it drops to the 450ms performance after some seconds. It doesn't do this on my AthlonXP 2600+ machine (same OS (XP) and VM). And to complete the wiredness: On this machine, the version with the single loop is faster than the splitted one.

To summerize this:

P4HT/1.4.2/single loop: 450ms
P4HT/1.4.2/two loops: 180ms
XP2600+/1.4.2/single loop: 180ms
XP2600+/1.4.2/two loops: 230ms

I've one question: WHY? And why does the P4 starts fast (so obviously, it can run it fast...) when i'm doing this in the actual renderer but drops after some seconds?

BTW: -server mode doesn't help. It's a bit faster, but the behaviour is the same.

Offline arm

Senior Newbie




Java games rock!


« Reply #1 - Posted 2003-10-09 06:51:53 »


I tried your code with  IBM Jre 1.4.0 for Windows :

                                 Two loops           One loop
Pentium 4 1.5 Ghz         160 ms                 950 ms


Then I ported ArrayTest  to C++, compiled with MS Visual C++ 6.0 (full optimization) :


                                 Two loops           One loop
Pentium 4 1.5 Ghz         160 ms            1101 ms


Probably, It's due to memory access.

                        Ciao

Offline oNyx

JGO Coder


Medals: 1


pixels! :x


« Reply #2 - Posted 2003-10-09 07:56:08 »

>but it drops to the 450ms performance after some seconds

Hm... so you have to refill it again and again with that numbers?

If so System.arraycopy might be worth a try. Obviously it will need more ram (since you have everything twice) but that shouldnt be a problem right?

edit: heh ok... it's slower Tongue

It's somewere between two loops (fast) and one loop (slow).

弾幕 ☆ @mahonnaiseblog
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #3 - Posted 2003-10-09 11:45:20 »

Tried it on 1.4.1_b21 on 1 GHz P3 win2k

Two loops: 1041
One loop: 1072

Also tried with jview:

Two loops: 1042
One loop: 1232

The results are very much alike in my situation with one loop being slightly slower.

I don't see how this could be because of memory access since there's more memory access with 2 loops.

I have no clue why 2 loops is faster :-/ (even though on my system it doesn't make much of a difference)

Offline EgonOlsen
« Reply #4 - Posted 2003-10-09 14:39:26 »

Ran the test on a P4@2.4Ghz and a 1.4.1 VM and it took around 380ms for both versions. The same for the 1.3.1 VM...
so basically, i tend to say that it's a problem with 1.4.2 on P4, but that doesn't explain the C++ results. And it doesn't really explain why my actual application that's using these loops behaves slightly different.
Right now, i offer a method that the user of the API can call to determine which way is the fastest on the current machine, but i'm quite unhappy with this. On the other hand, i don't want to ignore the problem, because we are talking about the difference between 52 and 40fps here...

Offline morbo

Senior Newbie





« Reply #5 - Posted 2003-10-09 15:29:57 »

This is probably due to 'cache trashing'.

In the single loop case, you're alternating between arrays, which causes a cache fault, forcing it to both save out the changes to the first array, then load up the second. In the two loop case, a single array stays in the cache until it's done with, resulting in less hits to the system memory.

So what you're basically seeing is the difference between direct memory access and cached memory access.
Offline EgonOlsen
« Reply #6 - Posted 2003-10-09 15:34:50 »

It really seems to be a problem with memory access...with alignment to be exact.
Adding this line

1  
private static int[] dummy=new int[2];


between the pixel and the zbuffer-array improves performance from 450ms to 200ms. That's fine for this test, but i can't do that for the application, because the pixels-array is part of a BufferedImage while the zbuffer isn't. This sux somehow... Angry

Offline Herkules

Senior Member




Friendly fire isn't friendly!


« Reply #7 - Posted 2003-10-09 16:11:20 »

Without having any insight into that topic.....

Might it be possible that some bounds-check-elimimation doesn't work in the one-loop construct?


HARDCODE    --     DRTS/FlyingGuns/JPilot/JXInput  --    skype me: joerg.plewe
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #8 - Posted 2003-10-10 08:15:48 »

Quote
Might it be possible that some bounds-check-elimimation doesn't work in the one-loop construct?


I go for the mem. alignment theory myself Smiley
I thought of bounds check elimination too, but that doesn't explain jview's results which I think doesn't have any bounds check elimination. And I don't think it could make such a large difference.

Offline NVaidya

Junior Member




Java games rock!


« Reply #9 - Posted 2003-10-10 09:26:30 »


How would the length of the array affect the performance ?
- in terms of overflowing the cache ...?

I got ~450 and ~990 on a P4 1.6GHz. When I changed the
array size from 196608 to 496608, the times were nearly
identical - ~1100.

Food for thought...

Gravity Sucks !
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline EgonOlsen
« Reply #10 - Posted 2003-10-10 15:10:59 »

Quote

How would the length of the array affect the performance ?
- in terms of overflowing the cache ...?
Unlikely, because the performance you are getting is quite good. However, changing the length of an array may change its alignment...who knows what the VM's memory management is doing there. I've found a bug report for a 1.4.beta VM that doubles were not aligned correctly on the stack. Maybe this is a similar problem that hurts P4s more than Athlons or P3s. The P4 IS quite sensible to incorrect alignment...have a look here (for example): http://gcc.gnu.org/ml/gcc-bugs/2001-07/msg01255.html

Offline NVaidya

Junior Member




Java games rock!


« Reply #11 - Posted 2003-10-10 16:24:08 »


@EgonOlsen:

Oh  yes, possibly I may have tripped into the alignment problem
when I changed the array size.

The 1.4 double alignment problem, I thought, went away with
the release of Hopper - though I've heard also reports to the
contrary. (Bruce) Walter had highlighted the problem well in his
website.

I've come across a report which talks about P4's bad performance
with unaligned reals. Wouln't the JVM have "compiler" instructions
for optimizing the alignment, though !

Gravity Sucks !
Offline HWBBH

Innocent Bystander




Java games rock!


« Reply #12 - Posted 2003-10-10 21:25:33 »

on my win98 PIII 400 Mhz the result are the same for both loops...
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

ctomni231 (34 views)
2014-07-18 06:55:21

Zero Volt (30 views)
2014-07-17 23:47:54

danieldean (25 views)
2014-07-17 23:41:23

MustardPeter (27 views)
2014-07-16 23:30:00

Cero (42 views)
2014-07-16 00:42:17

Riven (44 views)
2014-07-14 18:02:53

OpenGLShaders (32 views)
2014-07-14 16:23:47

Riven (34 views)
2014-07-14 11:51:35

quew8 (30 views)
2014-07-13 13:57:52

SHC (66 views)
2014-07-12 17:50:04
HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54

HotSpot Options
by Roquen
2014-05-06 15:03:10

Escape Analysis
by Roquen
2014-04-29 22:16:43

Experimental Toys
by Roquen
2014-04-28 13:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!