Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (494)
Games in Android Showcase (113)
games submitted by our members
Games in WIP (563)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  FloatBuffer.put (int index, float f) expensive  (Read 9285 times)
0 Members and 1 Guest are viewing this topic.
Offline lsgames

Senior Newbie





« Posted 2010-06-02 14:38:10 »

I am working on a particle system using OpenGL on Android 2.1. To communicate with OpenGL a FloatBuffer is used. Allocated as such:

buffer = ByteBuffer.allocateDirect(FLOAT_SIZE * size * 2).order(ByteOrder.nativeOrder()).asFloatBuffer();

used as such:

buffer.put(index, f)

I have noticed that buffer.put() takes at least 10 times as long time as assigning an ordinary float array. This becomes a real bottleneck and the limiting factor as to how many particles I can have.

Has anyone noticed this problem or have any suggestions as to how to get around it?

Thanks,

Martin
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #1 - Posted 2010-06-02 14:48:53 »

Write everything to a float[] and use FloatBuffer.put(float[]) ?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline lsgames

Senior Newbie





« Reply #2 - Posted 2010-06-02 16:00:46 »

Yes tried that. It had no effect. So I found the source code and noticed that that method just iterates over the array and calls pu(index, float) on each element.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline lsgames

Senior Newbie





« Reply #3 - Posted 2010-06-02 16:04:28 »

Was wondering if there could be an alternative way to construct the FloatBuffer for OpenGL. Haven't really been able to think one up though. From the source code it appear what is taking 10 times as long time is different checks and function calls. Nothing much but when applied 1000 times pr frame in a particle system it really ads up.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2010-06-02 16:07:56 »

Create one FloatBuffer and slice() it in 1000 buffers.

But ehm... why would you want 1000 buffers per frame? Can't you store all particles in the same buffer?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline lsgames

Senior Newbie





« Reply #5 - Posted 2010-06-02 17:34:11 »

No, I meant the put is done many times pr frame. There is only one floatbuffer. But as the particles move each frame I have to update all positions in the floatbuffer.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #6 - Posted 2010-06-02 17:39:53 »

No, I meant the put is done many times pr frame. There is only one floatbuffer. But as the particles move each frame I have to update all positions in the floatbuffer.

Well, you said 'construct'.

According to:
http://apistudios.com/hosted/marzec/badlogic/wordpress/?p=478

Heap-floatbuffers have better performance. Upon copying to a VBO, the 'driver' seems to make its own (fast) copy.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline EgonOlsen
« Reply #7 - Posted 2010-06-02 21:04:39 »

Write everything to a float[] and use FloatBuffer.put(float[]) ?
This actually helps tremendously on Android. I've no idea which it doesn't in your case... Huh

Offline lsgames

Senior Newbie





« Reply #8 - Posted 2010-06-03 08:01:07 »

Hmmm it is probably hardware specific which implementation of the put method you get. I am running on a Nexus One. And after putting in breakpoints I could see that the implementation of put(float[]) I got was one that just traversed the array and called put(float) on each element.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #9 - Posted 2010-06-03 10:58:45 »

Hmmm it is probably hardware specific which implementation of the put method you get. I am running on a Nexus One. And after putting in breakpoints I could see that the implementation of put(float[]) I got was one that just traversed the array and called put(float) on each element.

This wouldn't be the first time a profiler alters the optimisation of an application.

It'd be safe to assume you ran it without a profiler too?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline lsgames

Senior Newbie





« Reply #10 - Posted 2010-06-03 12:24:04 »

Yes, wrote a small test app that times the different methods (one at a time, float[], FloatBuffer.wrap(float[]). One at a time is fastest, then float [], the slowest is to pass it a wrapped FloatBuffer.
Offline EgonOlsen
« Reply #11 - Posted 2010-06-03 12:41:49 »

Why not upload this test somewhere so that we can benchmark on various platforms!? My experience is that putting one float[] is between 500 and 600% faster than single puts...at least on 1.5. Maybe they have worked on singles puts in 2.1 or something.

BTW: Are you sure that you are measuring direct buffer performance? You can't create a direct buffer by using wrap(...), can you?

Offline lsgames

Senior Newbie





« Reply #12 - Posted 2010-06-03 13:02:38 »

Good idea Egon.

You can download source here:

http://games.martineriksen.net/PerformanceTest.zip

In the "bin" folder there is the APK which you can install using adb install. Then you can run the test by starting the PerformanceTest app on your phone. The test will output the data in LogCat - these are the interesting lines (seen on a Nexus One 2.1):


06-03 14:56:01.445: INFO/System.out(22956): time: 247.8s >> vertex buffer single puts
06-03 14:56:01.445: INFO/System.out(22956): time: 254.2s >> vertex buffer single puts with specified positions
06-03 14:56:01.445: INFO/System.out(22956): time: 264.3s >> vertex buffer full array puts
06-03 14:56:01.445: INFO/System.out(22956): time: 0.3s >> vertex buffer wrapping
06-03 14:56:01.445: INFO/System.out(22956): time: 285.3s >> wrapped array to vertex buffer

Here is the interesting code that runs this part of the test:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
               FloatBuffer nativeDirectFloatBuffer = OpenGlMemoryUtil.makeFloatBuffer(FLOAT_BUFFER_SIZE);
               float[] floatArray = new float[FLOAT_BUFFER_SIZE];
               
               for (int i = 0; i < FLOAT_BUFFER_SIZE; i++) {
                       floatArray[i]=0.5f;
               }
               
               time = print("Going VertexBuffers");        
               for (int i = 0; i < TESTSIZE_MILLIONTH; i++) {
                       nativeDirectFloatBuffer.position(0);
                       for (int k = 0; k < FLOAT_BUFFER_SIZE; k++) {
                               nativeDirectFloatBuffer.put(0.5f);
                       }                        
               }
               time = PerfLogUtil.logTime(time, "vertex buffer single puts", logindex++, TESTSIZE_THOUSANDS);
               
               time = print("Going VertexBuffers");        
               for (int i = 0; i < TESTSIZE_MILLIONTH; i++) {
                       for (int k = 0; k < FLOAT_BUFFER_SIZE; k++) {
                               nativeDirectFloatBuffer.put(k,0.5f);
                       }                        
               }
               time = PerfLogUtil.logTime(time, "vertex buffer single puts with specified positions", logindex++, TESTSIZE_THOUSANDS);

               time = print("Going VertexBuffers");        
               for (int i = 0; i < TESTSIZE_MILLIONTH; i++) {
                       for (int k = 0; k < FLOAT_BUFFER_SIZE; k++) {
                               floatArray[k]=.5f;
                       }                        
                       nativeDirectFloatBuffer.position(0);
                       nativeDirectFloatBuffer.put(floatArray);
               }
               time = PerfLogUtil.logTime(time, "vertex buffer full array puts", logindex++, TESTSIZE_THOUSANDS);
               
               FloatBuffer floatBufferWrappedArray = FloatBuffer.wrap(floatArray);
               time = PerfLogUtil.checkPoint();
               for (int i = 0; i < TESTSIZE_MILLIONTH; i++) {
                       floatBufferWrappedArray = FloatBuffer.wrap(floatArray);
               }
               time = PerfLogUtil.logTime(time, "vertex buffer wrapping", logindex++, TESTSIZE_THOUSANDS);
               
               for (int i = 0; i < TESTSIZE_MILLIONTH; i++) {
                       for (int k = 0; k < FLOAT_BUFFER_SIZE; k++) {
                               floatArray[k]=.5f;
                       }                        
                       nativeDirectFloatBuffer.position(0);
                       floatBufferWrappedArray.position(0);
                       nativeDirectFloatBuffer.put(floatBufferWrappedArray);
               }
               time = PerfLogUtil.logTime(time, "wrapped array to vertex buffer", logindex++, TESTSIZE_THOUSANDS);
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #13 - Posted 2010-06-03 13:16:51 »

Please put your code between [ code ] and [/ code ], otherwise[ i ] will be converted into italic styled text.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline lsgames

Senior Newbie





« Reply #14 - Posted 2010-06-03 14:02:35 »

Done :-) Thanks for the info Riven.
Offline EgonOlsen
« Reply #15 - Posted 2010-06-03 15:08:52 »

Tried it on my Samsung Galaxy with Android 1.5. The results are similar (but slower of course):

1  
2  
3  
4  
5  
06-03 16:49:07.297: INFO/System.out(2166): time: 1137.7s >> vertex buffer single puts
06-03 16:49:07.297: INFO/System.out(2166): time: 1079.4s >> vertex buffer single puts with specified positions
06-03 16:49:07.297: INFO/System.out(2166): time: 1175.3s >> vertex buffer full array puts
06-03 16:49:07.297: INFO/System.out(2166): time: 1.4s >> vertex buffer wrapping
06-03 16:49:07.297: INFO/System.out(2166): time: 1220.6s >> wrapped array to vertex buffer


However, this changes once you add a variable instead of 0.5f, i.e. do something like this:

1  
2  
3  
4  
5  
6  
7  
8  
for (int i = 0; i < TESTSIZE_MILLIONTH; i++) {
   nativeDirectFloatBuffer.position(0);
   float val=0;
   for (int k = 0; k < FLOAT_BUFFER_SIZE; k++) {
      nativeDirectFloatBuffer.put(val);
      val+=0.1f;
   }        
}


This results in:

1  
2  
3  
4  
5  
06-03 17:03:43.307: INFO/System.out(2782): time: 1387.5s >> vertex buffer single puts
06-03 17:03:43.307: INFO/System.out(2782): time: 1308.4s >> vertex buffer single puts with specified positions
06-03 17:03:43.307: INFO/System.out(2782): time: 1187.9s >> vertex buffer full array puts
06-03 17:03:43.317: INFO/System.out(2782): time: 1.4s >> vertex buffer wrapping
06-03 17:03:43.317: INFO/System.out(2782): time: 1192.8s >> wrapped array to vertex buffer


I'm still not sure why it helped that much more in my code (which is a bit more complex than this simple benchmark or course)  to go with float[]s... Huh Dalvik is strange...slow and strange...

Offline EgonOlsen
« Reply #16 - Posted 2010-06-03 19:17:53 »

I've reverted my own stuff to use single puts to see what happens then...i have a loop with 6 "puts" in each iteration filling two different buffers. With float[] instead of single puts, this is 3 times faster on my device.

Offline lsgames

Senior Newbie





« Reply #17 - Posted 2010-06-04 11:56:54 »

OK - is your code in a format that you can send so I can try and replicate your test run?

Also after instrumenting and further analysis I found that TraceView had exaggerated the cost of the buffer puts in relation to the whole programme execution. Traceview said that the buffer puts were 17% of all time spent whereas actually they are only 3%. So like Riven suggested the profiler was not quite truthful. It is still the case that the puts take 10 times longer than array puts - but the overall impact is lower than I thought.

Still it would be great to find a way to write the buffers faster.

/Martin
Offline EgonOlsen
« Reply #18 - Posted 2010-06-04 12:39:03 »

Sure. Code looks like this:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
int ix=0;
for (c = 0; c < endII; c++) {
   vcoords[ix] = x[c];
   ncoords[ix++] = nx[c];
   vcoords[ix] = y[c];
   ncoords[ix++] = ny[c];
   vcoords[ix] = z[c];
   ncoords[ix++] = nz[c];
}

...

vertices.put(vcoords);
normals.put(ncoords);


Single put code looks the same except that in the loop i'm doing 3 puts into vertices and 3 into normals instead of filling the array.

Offline ryanm

Senior Member


Projects: 1
Exp: 15 years


Used to be bleb


« Reply #19 - Posted 2010-09-06 17:34:22 »

I've just run into the bulk-put problem, and have noticed that IntBuffers do not suffer the same fate - bulk put( int[] ) calls are very quick. I'm seeing a x10 speedup using this for 10000-element arrays, and about x2 for 10 elements.
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
   private static int[] intArray = new int[ 0 ];

   /**
    * Work-around for crappy {@link FloatBuffer#put(float[])}
    * performance
    *
    * @param buff
    * @param data
    */

   public static void put( IntBuffer buff, float[] data )
   {
      if( intArray.length < data.length )
      {
         intArray = new int[ data.length ];
      }

      for( int i = 0; i < data.length; i++ )
      {
         intArray[ i ] = Float.floatToIntBits( data[ i ] );
      }

      buff.put( intArray, 0, data.length );
   }
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #20 - Posted 2010-09-06 20:51:31 »

Wow. Nice find.

Maybe time to write a bug-report?

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline ryanm

Senior Member


Projects: 1
Exp: 15 years


Used to be bleb


« Reply #21 - Posted 2010-09-07 10:58:37 »

I reckon so.

Note the logarithmic scale. IntBuffer bulk puts are essentially free, and I can't see any reason why FloatBuffers can't do the same.
Benchmark code here. Can anyone spot any problems with this?
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #22 - Posted 2010-09-07 13:04:51 »

Benchmark code here. Can anyone spot any problems with this?
Looks good enough.

Seems like floats are 266x slower than ints. It probably goes through the FPU instead of a memcpy.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline ryanm

Senior Member


Projects: 1
Exp: 15 years


Used to be bleb


« Reply #23 - Posted 2010-09-07 17:47:50 »

Add your stars to Issue 11078.
edit: actually don't bother, it's been fixed in Gingerbread already apparently

Also, there's a more-or-less drop-in replacement for FloatBuffer over here. It'll automatically convert float arrays that you give it, and also allow you to pass in pre-converted int arrays
Offline EgonOlsen
« Reply #24 - Posted 2010-09-07 21:02:59 »

edit: actually don't bother, it's been fixed in Gingerbread already apparently
Too bad that 3.0 has some hefty hardware requirements and most likely wont make it to a lot of current phones... Sad

Offline badlogicgames
« Reply #25 - Posted 2010-09-13 03:11:46 »

I'm late to the party but i wanted to follow up on this. First off: thanks Ryan for posting that bug report. I went with bulk puts and never thought about testing single puts (old nio habit...). I wrote a quick JNI method which is even faster than your in[] array trick. You can find more info at http://apistudios.com/hosted/marzec/badlogic/wordpress/?p=904.

http://www.badlogicgames.com - musings on Android and Java game development
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Dwinin (16 views)
2014-09-12 09:08:26

Norakomi (50 views)
2014-09-10 13:57:51

TehJavaDev (58 views)
2014-09-10 06:39:09

Tekkerue (27 views)
2014-09-09 02:24:56

mitcheeb (50 views)
2014-09-08 06:06:29

BurntPizza (34 views)
2014-09-07 01:13:42

Longarmx (20 views)
2014-09-07 01:12:14

Longarmx (23 views)
2014-09-07 01:11:22

Longarmx (22 views)
2014-09-07 01:10:19

mitcheeb (31 views)
2014-09-04 23:08:59
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!