Java-Gaming.org    
Featured games (91)
games approved by the League of Dukes
Games in Showcase (581)
games submitted by our members
Games in WIP (500)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Fastest way of using FloatBuffers  (Read 8577 times)
0 Members and 1 Guest are viewing this topic.
Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 12


Game Engineer


« Posted 2010-12-14 03:00:53 »

So I'm not hijacking a different thread...

I've been asking Riven the fastest ways of using FloatBuffers after having some real performance issues with using them for a lot of values. I made a simple profiler and get really weird results.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
Generating array of 524288 values to use...
    Done.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.011403 seconds.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.015468 seconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.006779 seconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.006532 seconds.

This looks totally opposite. Individual inserts are really fast and bulk inserts are quite slow. This is Java 1.6.

Okay wait. I just ran it again with more values and the timing is across the map.
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
Generating array of 8388608 values to use...
    Done.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.119186 seconds.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.035243 seconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.074629 seconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.058328 seconds.


Here is the really quick and dirty code. Is it a bad idea to use nanoTime? Am I not warming it up correctly?

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
import java.nio.FloatBuffer;

public class FloatBufferProfiler
{
   public static void main(String[] args)
   {
      final int ARRAY_SIZE = 8388608;
      final double NANO = 1000000000.0;
     
      System.out.println("Generating array of " + ARRAY_SIZE + " values to use...");
     
      float[] valueArray = new float[ARRAY_SIZE];
      for (int i = 0; i < valueArray.length; i++)
      {
         valueArray[i] = (float) (Math.random() * 100000);
      }
     
      System.out.println("    Done.");
     
      //Warm up the clock.
     long timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
     
      System.out.println("Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...");
      float[] newValues = new float[ARRAY_SIZE];
      FloatBuffer buffer = FloatBuffer.allocate(ARRAY_SIZE);
      for (int i = 0; i < valueArray.length; i++)
      {
         newValues[i] = valueArray[i];
      }
      buffer.put(newValues);
      buffer.flip();
      System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
     
      System.out.println("Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...");
      timePoll = System.nanoTime();
      buffer.clear();
      for (int i = 0; i < valueArray.length; i++)
      {
         newValues[i] = valueArray[i];
      }
      buffer.put(newValues);
      buffer.flip();
      System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
     
      System.out.println("Using index-based puts (using nonexisting FloatBuffer)...");
      buffer.clear();
      buffer = null;
      timePoll = System.nanoTime();
      buffer = FloatBuffer.allocate(ARRAY_SIZE);
      for (int i = 0; i < valueArray.length; i++)
      {
         buffer.put(i, valueArray[i]);
      }
      buffer.flip();
      System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
     
      System.out.println("Using individual puts (using nonexisting FloatBuffer)...");
      buffer.clear();
      buffer = null;
      timePoll = System.nanoTime();
      buffer = FloatBuffer.allocate(ARRAY_SIZE);
      for (int i = 0; i < valueArray.length; i++)
      {
         buffer.put(valueArray[i]);
      }
      buffer.flip();
      System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
   }
}


The four situations I was testing are:
- A new array and buffer is made every draw pass. All the values are copied over into the array one by one (simulating what you would be doing making draw calls from all over your code), then bulk copied to the buffer.
- An existing array and buffer have already been made, so are in memory. The buffer is cleared. Then the above operation happens (basically the same as above with a clear instead of making a new buffer and array it just clears them out).
- No array is used, a new buffer is made every draw pass. A bunch of individual puts happen, each one using the index value.
- No array is used, a new buffer is made every draw pass. A bunch of individual puts happen, with no index value.

This seems to vary wildly based on the number of items I'm putting in. Wha?

See my work:
OTC Software
Offline Nate

JGO Kernel


Medals: 129
Projects: 3
Exp: 14 years


Esoteric Software


« Reply #1 - Posted 2010-12-14 04:38:39 »

Are you using the server VM?

Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 12


Game Engineer


« Reply #2 - Posted 2010-12-14 06:53:33 »

Are you using the server VM?
1  
2  
3  
4  
$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)


Yes. Smiley I believe this is now the default in Mac OS X. There is no longer a client VM.

See my work:
OTC Software
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline princec

JGO Kernel


Medals: 284
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #3 - Posted 2010-12-14 10:01:40 »

JDK 7 changes the game completely again as well Smiley I will do a bit of benchmarking on my machine later and see what the differences are.

Hurrah for microbenchmarks!

Cas Smiley

Offline OttoMeier

Senior Member


Medals: 4
Projects: 1



« Reply #4 - Posted 2010-12-14 12:56:25 »

interesting but please try to get average numers from several runs (e.g 100 ).
Offline princec

JGO Kernel


Medals: 284
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #5 - Posted 2010-12-14 13:30:02 »

I'm more likely to do about 1,000,000 runs...

Cas Smiley

Offline lhkbob

JGO Knight


Medals: 32



« Reply #6 - Posted 2010-12-14 15:46:44 »

Here's an updated test case that averages across multiple runs, and also adds a case where you do index puts into an existing buffer. Additionally, it uses direct float buffers since that's what OpenGL libraries require.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
import java.nio.FloatBuffer;
import java.nio.ByteBuffer;

public class FloatBufferProfiler
{
   public static void main(String[] args)
   {
      final int ARRAY_SIZE = 8388608;
      final int numTests = 100;
      final double NANO = 1000000000.0 * numTests;

     
      System.out.println("Generating array of " + ARRAY_SIZE + " values to use...");
     
      float[] valueArray = new float[ARRAY_SIZE];
      for (int i = 0; i < valueArray.length; i++)
      {
         valueArray[i] = (float) (Math.random() * 100000);
      }
     
      System.out.println("    Done.");
     
      //Warm up the clock.
     long timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
      timePoll = System.nanoTime();
     
      {
         System.out.println("Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...");
         for (int u = 0; u < numTests; u++) {
            float[] newValues = new float[ARRAY_SIZE];
            FloatBuffer buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
            for (int i = 0; i < valueArray.length; i++)
            {
               newValues[i] = valueArray[i];
            }
            buffer.put(newValues);
            buffer.flip();
         }
         System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
      }

      {
         System.out.println("Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...");
         float[] newValues = new float[ARRAY_SIZE];
         FloatBuffer buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
         timePoll = System.nanoTime();
         for (int u = 0; u < numTests; u++) {
            buffer.clear();
            for (int i = 0; i < valueArray.length; i++)
            {
               newValues[i] = valueArray[i];
            }  
            buffer.put(newValues);
            buffer.flip();
         }
         System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
      }
     
      {
         System.out.println("Using index-based puts (using nonexisting FloatBuffer)...");
         timePoll = System.nanoTime();
         for (int u = 0 ; u < numTests; u++) {
            FloatBuffer buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
            for (int i = 0; i < valueArray.length; i++)
            {
               buffer.put(i, valueArray[i]);
            }
            buffer.flip();
         }
         System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
      }

      {
         System.out.println("Using individual puts (using nonexisting FloatBuffer)...");
         timePoll = System.nanoTime();
         for (int u = 0; u < numTests; u++) {
            FloatBuffer buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
            for (int i = 0; i < valueArray.length; i++)
            {
               buffer.put(valueArray[i]);
            }
            buffer.flip();
         }
         System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
      }

      {
         System.out.println("Using index-based puts (using existing FloatBuffer)...");
         timePoll = System.nanoTime();
         FloatBuffer buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
         for (int u = 0 ; u < numTests; u++) {
            buffer.clear();
            for (int i = 0; i < valueArray.length; i++)
            {
               buffer.put(i, valueArray[i]);
            }
            buffer.flip();
         }
         System.out.println("    Took " + ((System.nanoTime() - timePoll) / NANO) + " seconds.");
      }
   }
}


My results (also on a Mac with Java 6 are):
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
Generating array of 8388608 values to use...
    Done.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.06889852 seconds.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.03462399 seconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.10300789 seconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.11679034 seconds.
Using index-based puts (using existing FloatBuffer)...
    Took 0.05540613 seconds.

Offline Spasi
« Reply #7 - Posted 2010-12-14 16:23:43 »

Some ideas for better micro-benchmarking:

1) Try to use Caliper or at least extract each test in its own method and add some warm-up runs.
2) Use a much smaller ARRAY_SIZE, something that's representative of fp data in a game. Increase the number of tests runs accordingly.
3) For the buffer put loops, use buffer.limit() (or .remaining()) instead of valueArray.length. In the array loops, you're making it easy for the VM to remove array bounds checks, but you aren't doing the same in the buffer loops.
4) Add tests that use random indices (instead of going from 0 to array/buffer length). Would be interesting to see the differences then.
Offline Spasi
« Reply #8 - Posted 2010-12-14 16:36:49 »

Btw, I have developed a clone of the NIO buffer API that uses sun.misc.Unsafe, supports direct buffers only and allows disabling all bounds checks (ala org.lwjgl.util.NoChecks=true). For random access it's several times faster than normal NIO and it's just as fast for the easy cases. I'm using it with a private LWJGL build that has been modified to support it.
Offline kappa
« League of Dukes »

JGO Kernel


Medals: 70
Projects: 15


★★★★★


« Reply #9 - Posted 2010-12-14 16:55:37 »

ah, Spasi's secret LWJGL Ninja Edition Smiley
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 12


Game Engineer


« Reply #10 - Posted 2010-12-14 21:23:29 »

Btw, I have developed a clone of the NIO buffer API that uses sun.misc.Unsafe, supports direct buffers only and allows disabling all bounds checks (ala org.lwjgl.util.NoChecks=true). For random access it's several times faster than normal NIO and it's just as fast for the easy cases. I'm using it with a private LWJGL build that has been modified to support it.
Oooh, I would love to use that.

Good point on the smaller array size, I didn't get to the point of averaging runs together so I was just increasing the array size to try to get more "accurate" data. Like I said, quick and dirty (especially crap like copy pasting System.nanoTime() a bunch instead of using a for loop). Smiley

But in hindsight it's definitely best to have an accurate benchmark, or you're just wasting your time. Looks like lhkbob's results are more in line with what we would expect.

See my work:
OTC Software
Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 12


Game Engineer


« Reply #11 - Posted 2010-12-14 22:18:10 »

I hacked up the code a bit more so that the timer gets "more warmed up" and everything is now called through functions. And it prints the time in ms, and there are smaller arrays and more iterations.

With a 50,000 item array and 50,000 iterations:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
Generating array of 50000 values to use...
    Done.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.21014882 milliseconds.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.42391662 milliseconds.
Using individual puts (using existing FloatBuffer)...
    Took 0.20898886 milliseconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.58338562 milliseconds.
Using index-based puts (using existing FloatBuffer)...
    Took 0.26990588 milliseconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.6380724 milliseconds.


With 40,000 items (10,000 quads equivalent) and 1,000,000 iterations:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
Generating array of 40000 values to use...
    Done.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.119918364 milliseconds.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.254571916 milliseconds.
Using individual puts (using existing FloatBuffer)...
    Took 0.110950286 milliseconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.381005605 milliseconds.
Using index-based puts (using existing FloatBuffer)...
    Took 0.155216281 milliseconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.412920993 milliseconds.


New code:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
import java.nio.FloatBuffer;
import java.nio.ByteBuffer;

public class FloatBufferProfiler
{
   public static final int ARRAY_SIZE = 40000;
   public static final double NANO_TO_MILLI = 1000000.0;
   public static final int NUM_TESTS = 1000000;
   
   public static void main(String[] args)
   {
      float[] valueArray = generateValueArray();
      warmUpClock(10000);
      bulkPut(true, valueArray);
      bulkPut(false, valueArray);
      singlePut(true, valueArray);
      singlePut(false, valueArray);
      singlePutIndexed(true, valueArray);
      singlePutIndexed(false, valueArray);
   }
   
   private static float[] generateValueArray()
   {
      System.out.println("Generating array of " + ARRAY_SIZE + " values to use...");
      float[] valueArray = new float[ARRAY_SIZE];
      for (int i = 0; i < valueArray.length; i++)
      {
         valueArray[i] = (float) (Math.random() * 100000);
      }
      System.out.println("    Done.");
      return valueArray;
   }
   
   private static void warmUpClock(int iterations)
   {
      long timePoll = System.nanoTime();
      for (int i = 0; i < iterations; i++)
      {
         timePoll = System.nanoTime();
      }
   }
   
   private static void bulkPut(boolean existingBuffer, float[] valueArray)
   {
      System.out.println("Copying all values into a new array, then bulk putting (using " + (existingBuffer ? "existing" : "nonexisting") + " array and FloatBuffer)...");
     
      float[] newValues = null;
      FloatBuffer buffer = null;
     
      if (existingBuffer)
      {
         newValues = new float[ARRAY_SIZE];
         buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
      }
     
      long timePoll = System.nanoTime();
     
      for (int u = 0; u < NUM_TESTS; u++)
      {
         if (existingBuffer)
         {
            buffer.clear();
         }
         else
         {
            newValues = new float[ARRAY_SIZE];
            buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
         }
         
         for (int i = 0; i < valueArray.length; i++)
         {
            newValues[i] = valueArray[i];
         }
         buffer.put(newValues);
         buffer.flip();
      }
     
      System.out.println("    Took " + ((System.nanoTime() - timePoll) / (NANO_TO_MILLI * NUM_TESTS)) + " milliseconds.");
   }
   
   private static void singlePut(boolean existingBuffer, float[] valueArray)
   {
      System.out.println("Using individual puts (using " + (existingBuffer ? "existing" : "nonexisting") + " FloatBuffer)...");
     
      FloatBuffer buffer = null;
     
      if (existingBuffer)
      {
         buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
      }
     
      long timePoll = System.nanoTime();
     
      for (int u = 0; u < NUM_TESTS; u++)
      {
         if (existingBuffer)
         {
            buffer.clear();
         }
         else
         {
            buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
         }
         
         for (int i = 0; i < valueArray.length; i++)
         {
            buffer.put(valueArray[i]);
         }
         buffer.flip();
      }
      System.out.println("    Took " + ((System.nanoTime() - timePoll) / (NANO_TO_MILLI * NUM_TESTS)) + " milliseconds.");
   }
   
   private static void singlePutIndexed(boolean existingBuffer, float[] valueArray)
   {
      System.out.println("Using index-based puts (using " + (existingBuffer ? "existing" : "nonexisting") + " FloatBuffer)...");
     
      FloatBuffer buffer = null;
     
      if (existingBuffer)
      {
         buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
      }
     
      long timePoll = System.nanoTime();
     
      for (int u = 0 ; u < NUM_TESTS; u++)
      {
         if (existingBuffer)
         {
            buffer.clear();
         }
         else
         {
            buffer = ByteBuffer.allocateDirect(ARRAY_SIZE * 4).asFloatBuffer();
         }
         
         for (int i = 0; i < valueArray.length; i++)
         {
            buffer.put(i, valueArray[i]);
         }
         buffer.flip();
      }
      System.out.println("    Took " + ((System.nanoTime() - timePoll) / (NANO_TO_MILLI * NUM_TESTS)) + " milliseconds.");
   }
}


Here is a table of fasty-ness!
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
Using individual puts (using existing FloatBuffer)
    is the fastest!
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer):
    takes 8% longer.
Using index-based puts (using existing FloatBuffer)
    takes 40% longer.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)
    takes 229% longer.
Using individual puts (using nonexisting FloatBuffer)
    takes 343% longer.
Using index-based puts (using nonexisting FloatBuffer)
    takes 372% longer.


Interestingly enough, this means that just using a plain old put() is the fastest method (with this number of vertices). I'm not sure why index-based puts would be slower, but they appear to be. The difference between the bulk put and the single non-indexed pushes are pretty minor, but looks like you should definitely avoid index-based puts. Especially notable - absolutely keep your FloatBuffer in memory and then just clear() it every time you want to put new stuff in it. That is magnitudes faster.

Here's a run with only 5,000 vertices, note that the array method becomes faster in this case.
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
Generating array of 5000 values to use...
    Done.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.013873423 milliseconds.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.028957902 milliseconds.
Using individual puts (using existing FloatBuffer)...
    Took 0.014773814 milliseconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.0437836 milliseconds.
Using index-based puts (using existing FloatBuffer)...
    Took 0.018534608 milliseconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.047237627 milliseconds.

See my work:
OTC Software
Offline lhkbob

JGO Knight


Medals: 32



« Reply #12 - Posted 2010-12-14 22:58:51 »

Another take away is that high-level graphics engines aren't really required to use FloatBuffers in their interfaces.  They can have geometry, etc. represented as plain old arrays and then keep a cached buffer around behind the scenes when they need to talk with OpenGL.  This is nice because you don't have to worry about the user screwing the direct-ness or endian-ness of the buffer anymore.

Here are the updated timings from my Mac (10.6.5, 2.53 GHz i5 with Java 1.6_22):
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
Generating array of 40000 values to use...
    Done.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.12576384 milliseconds.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.2772573 milliseconds.
Using individual puts (using existing FloatBuffer)...
    Took 0.14233947 milliseconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.41132515 milliseconds.
Using index-based puts (using existing FloatBuffer)...
    Took 0.16847472 milliseconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.44021002 milliseconds.


and here is the same benchmark run on Ubuntu 8.04 Hardy (Intel Duo 3.16 GHz, Java 1.6_21):
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
Generating array of 40000 values to use...
    Done.
Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...
    Took 0.1619271662 milliseconds.
Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...
    Took 0.29461064228 milliseconds.
Using individual puts (using existing FloatBuffer)...
    Took 0.12362605117 milliseconds.
Using individual puts (using nonexisting FloatBuffer)...
    Took 0.2110052842 milliseconds.
Using index-based puts (using existing FloatBuffer)...
    Took 0.16837437908 milliseconds.
Using index-based puts (using nonexisting FloatBuffer)...
    Took 0.24990385835 milliseconds.


Both of these used 40000 long arrays/buffers and only 100,000 test runs because I got bored :/ I don't know why there was such a huge performance hit for individual and index puts with nonexisting buffers on Mac.  Either way, using existing arrays + bulk puts is a viable option it seems on both OS's.

Offline Spasi
« Reply #13 - Posted 2010-12-15 02:24:17 »

Your benchmark code is flawed. You are not warming up the test code and you aren't setting the native ByteOrder on the buffer you use. This is the modified code:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
148  
149  
150  
151  
152  
153  
154  
155  
156  
157  
158  
159  
160  
161  
package org.lwjgl;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.FloatBuffer;
import java.util.Random;

public class FloatBufferProfiler {

   private static final int RANDOM_SEED = 1023;
   private static final int ARRAY_SIZE = 40000;
   private static final double NANO_TO_MILLI = 1000000.0;

   private static final int WARMUP_RUNS = 5;
   private static final int TEST_RUNS = 10;
   private static final int LOOPS_PER_RUN = 10000;

   public static void main(String[] args) {
      FloatBuffer buffer = generateBuffer();

      System.out.println("FloatBuffer implementation: " + buffer.getClass().getName());

      System.out.println("\n---------------------\n");

      System.out.println("Warming up...");
      System.out.println("\tClock warmed up: " + warmUpClock(10000));
      runTest(WARMUP_RUNS, true);
      System.out.println("\tDone.");

      System.out.println("\n---------------------\n");

      runTest(TEST_RUNS, false);
   }

   private static void runTest(final int runs, final boolean warmup) {
      float[] values = generateValues();
      float[] newValues = generateNewValues();
      FloatBuffer buffer = generateBuffer();

      long bulkPutOld = 0,
         singlePutOld = 0,
         singlePutIndexedOld = 0;

      /*
      long bulkPutNew = 0,
         singlePutNew = 0,
         singlePutIndexedNew = 0;
      */


      for ( int i = 0; i < runs; i++ ) {
         bulkPutOld += bulkPut(true, values, newValues, buffer);
         //bulkPutNew += bulkPut(false, values, newValues, null);
        singlePutOld += singlePut(true, values, buffer);
         //singlePutNew += singlePut(false, values, null);
        singlePutIndexedOld += singlePutIndexed(true, values, buffer);
         //singlePutIndexedNew += singlePutIndexed(false, values, null);
     }

      if ( !warmup ) {
         System.out.println("Copying all values into a new array, then bulk putting (using existing array and FloatBuffer)...");
         printTime(bulkPutOld / runs);

         //System.out.println("Copying all values into a new array, then bulk putting (using nonexisting array and FloatBuffer)...");
        //printTime(bulkPutNew / runs);

         System.out.println("Using individual puts (using existing FloatBuffer)...");
         printTime(singlePutOld / runs);

         //System.out.println("Using individual puts (using nonexisting FloatBuffer)...");
        //printTime(singlePutNew / runs);

         System.out.println("Using index-based puts (using existing FloatBuffer)...");
         printTime(singlePutIndexedOld / runs);

         //System.out.println("Using index-based puts (using nonexisting FloatBuffer)...");
        //printTime(singlePutIndexedNew / runs);
     }
   }

   private static void printTime(long time) {
      System.out.println("\tTook " + Double.toString((time) / (NANO_TO_MILLI * LOOPS_PER_RUN)) + " milliseconds.");
   }

   private static float[] generateValues() {
      float[] valueArray = new float[ARRAY_SIZE];
      Random rand = new Random(RANDOM_SEED);
      for ( int i = 0; i < valueArray.length; i++ ) {
         valueArray[i] = rand.nextFloat() * 100000.0f;
      }
      return valueArray;
   }

   private static float[] generateNewValues() {
      return new float[ARRAY_SIZE];
   }

   private static FloatBuffer generateBuffer() {
      return ByteBuffer.allocateDirect(ARRAY_SIZE * 4).order(ByteOrder.nativeOrder()).asFloatBuffer();
   }

   private static long warmUpClock(int iterations) {
      long timePoll = System.nanoTime();
      for ( int i = 0; i < iterations; i++ ) {
         timePoll += System.nanoTime();
      }
      return timePoll;
   }

   private static long bulkPut(boolean existingBuffer, float[] values, float[] newValues, FloatBuffer buffer) {
      long timePoll = System.nanoTime();

      for ( int u = 0; u < LOOPS_PER_RUN; u++ ) {
         if ( !existingBuffer ) {
            newValues = generateNewValues();
            buffer = generateBuffer();
         }

         //System.arraycopy(values, 0, newValues, 0, values.length);
        for ( int i = 0; i < values.length; i++ )
            newValues[i] = values[i];

         buffer.put(newValues);
         buffer.flip();
      }

      return System.nanoTime() - timePoll;
   }

   private static long singlePut(boolean existingBuffer, float[] values, FloatBuffer buffer) {
      long timePoll = System.nanoTime();

      for ( int u = 0; u < LOOPS_PER_RUN; u++ ) {
         if ( !existingBuffer )
            buffer = generateBuffer();

         buffer.position(0);
         buffer.limit(values.length);
         for ( int i = 0; i < values.length; i++ )
            buffer.put(values[i]);

         buffer.flip();
      }

      return System.nanoTime() - timePoll;
   }

   private static long singlePutIndexed(boolean existingBuffer, float[] values, FloatBuffer buffer) {
      long timePoll = System.nanoTime();

      for ( int u = 0; u < LOOPS_PER_RUN; u++ ) {
         if ( !existingBuffer )
            buffer = generateBuffer();

         for ( int i = 0; i < values.length; i++ )
            buffer.put(i, values[i]);
      }

      return System.nanoTime() - timePoll;
   }

}


I have commented out the tests that recreate the buffer, since I found that the re-allocations take the majority of the time. Feel free to un-comment and test it if you want. In my tests, index-based puts were faster, followed by bulk puts, followed by individual puts. Results:

1  
2  
3  
4  
5  
6  
7  
8  
9  
FloatBuffer implementation: java.nio.DirectFloatBufferU
---------------------
Copying all values into a new array, then bulk putting (using existing array and
 FloatBuffer)...
        Took 0.0421260103 milliseconds.
Using individual puts (using existing FloatBuffer)...
        Took 0.0820538328 milliseconds.
Using index-based puts (using existing FloatBuffer)...
        Took 0.0326686016 milliseconds.


An interesting thing I noticed was that if you do the following:

1  
2  
buffer.position(0);
buffer.limit(values.length);


before the individual put loop, you get a 20-25% speed-up. With those two lines un-commented:

1  
2  
3  
4  
5  
6  
7  
Copying all values into a new array, then bulk putting (using existing array and
 FloatBuffer)...
        Took 0.0421146698 milliseconds.
Using individual puts (using existing FloatBuffer)...
        Took 0.0607214925 milliseconds.
Using index-based puts (using existing FloatBuffer)...
        Took 0.0327218355 milliseconds.


So basically you can easily get rid of the bounds checks, but it's still slower than index-based puts because you're updating the current buffer position on every put (see the package-private nextPutIndex() in java.nio.Buffer).
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (64 views)
2014-04-15 18:08:23

BurntPizza (62 views)
2014-04-15 03:46:01

UprightPath (75 views)
2014-04-14 17:39:50

UprightPath (58 views)
2014-04-14 17:35:47

Porlus (76 views)
2014-04-14 15:48:38

tom_mai78101 (101 views)
2014-04-10 04:04:31

BurntPizza (161 views)
2014-04-08 23:06:04

tom_mai78101 (256 views)
2014-04-05 13:34:39

trollwarrior1 (209 views)
2014-04-04 12:06:45

CJLetsGame (216 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!