Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (487)
Games in Android Showcase (110)
games submitted by our members
Games in WIP (553)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Minor optimisation for vertex arrays and a request  (Read 2378 times)
0 Members and 1 Guest are viewing this topic.
Offline Markus_Persson

JGO Wizard


Medals: 14
Projects: 19


Mojang Specifications


« Posted 2006-01-25 18:41:22 »

1  
2  
3  
4  
for (int i=0; i<len; i++)
{
    vertexBuffer.put(x).put(y).put(z);
}


Is significantly slower than

1  
2  
3  
4  
5  
6  
7  
8  
float[] temp = new float[len*3];
for (int i=0; i<len; i++)
{
    temp[i*3+0] = x;
    temp[i*3+1] = y;
    temp[i*3+2] = z;
}
vertexBuffer.put(temp);


My test application renders a 3000 poly model at 90 fps with the first method, and 550 fps with the second. (I have to rebuild the buffers every time as I'm doing skinned animation)

In light of this, it would be nice to have a glVertexPointer (and color-, and the other pointers) method that takes a float[] and sends it to native itself behind the scenes.

Play Minecraft!
Offline Markus_Persson

JGO Wizard


Medals: 14
Projects: 19


Mojang Specifications


« Reply #1 - Posted 2006-01-25 18:43:07 »

Eh, nevermind, that won't work. JOGL won't know when it's safe to release that buffer.

Carry on. ;-)

Play Minecraft!
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #2 - Posted 2006-01-25 18:50:59 »

Oh well, I thought this was common knowledge Smiley

I've had a wrapper-class for this that would only put everything in a native-buffer when the data was about to be rendered and was not changed.

With the 1.4 server-vm (or 1.6 client-vm for that matter) things really start to fly, as arrays get really fast.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Markus_Persson

JGO Wizard


Medals: 14
Projects: 19


Mojang Specifications


« Reply #3 - Posted 2006-01-25 18:52:57 »

I knew it was faster, but not.. you know.. five times faster. Wink

I thought the put methods would get inlined.

Play Minecraft!
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2006-01-25 18:56:11 »

Only effectively with the server-vm, but even then it's slow.

If you want to get really cranky, *abuse* the Unsafe class and do your own pointer-arithmetic, it's about 10% faster than field-access (!!) in the server-vm. You should search the forum for that topic.

* Riven shuts up already...

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Markus_Persson

JGO Wizard


Medals: 14
Projects: 19


Mojang Specifications


« Reply #5 - Posted 2006-01-25 19:13:21 »

10% is just barely worth optimising for. 500% definitely is. Wink

The reason I've never run into this before is that I never have to rebuild the vertices for each frame, so it's never been a bottleneck. I'm going to go change a lot of code now. Cheesy

(also, my main reason for posting this was to request the not-so-thought-through addition of the new method. But that won't work)

Play Minecraft!
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #6 - Posted 2006-01-29 04:23:21 »

Use the absolute put() methods instead of the relative ones for better performance. See the slides from Sven Goethel's and my JavaOne 2002 talk on the JOGL web page.

As an aside, please do not use sun.misc.Unsafe directly. You should be able to get all of the performance you need out of the NIO direct buffer classes. If you can't, please file a bug.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #7 - Posted 2006-01-29 16:10:50 »

As an aside, please do not use sun.misc.Unsafe directly. You should be able to get all of the performance you need out of the NIO direct buffer classes. If you can't, please file a bug.

As you have participated in this thread...
http://www.java-gaming.org/forums/index.php?topic=11112.msg88496#msg88496

Just one of my benchmarks: (running server vm 1.5)
1  
2  
3  
4  
5  
Running benchmark with 2048 3d vecs...
math on Vec3[]:          66.4ms      30800 / sec <---
math on FloatBuffer:    299.4ms       6800 / sec
math on unsafe buffer:   58.9ms      34700 / sec <---
math on unsafe struct:  107.0ms      19100 / sec


299ms / 59ms = 5x faster, FloatBuffer is really slow, and Unsafe is even faster than field-access.


And I'm not going to file a bug, as all previous bug-reports have been blunty ignored so far. Not doing it again.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #8 - Posted 2006-01-29 18:10:53 »

Could you please post complete source code for this benchmark or email it to me at kbr at dev.java.net?
Offline swpalmer

JGO Coder




Where's the Kaboom?


« Reply #9 - Posted 2006-01-29 18:15:25 »

And I'm not going to file a bug, as all previous bug-reports have been blunty ignored so far.

How do you know? Just because you haven't got any feedback or the bug hasn't been fixed doesn't mean it was ignored.

Quote
Not doing it again.

If you don't file a bug, you will have no right to complain.  I would file bugs just so I can say to Sun later on, "I told you so."

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #10 - Posted 2006-01-29 20:29:36 »

swpalmer: I'm not complaining Smiley Just proving that a statement is wrong.

ken russell: you can find the sourcecode in the referenced thread

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #11 - Posted 2006-01-30 01:38:59 »

ken russell: you can find the sourcecode in the referenced thread

I don't see an obvious link to a complete, compilable class or set of classes. Could you please either point me to it or generate it and attach it here?
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #12 - Posted 2006-01-30 08:40:02 »

Okay, it wasn't really compilable, but I expected it to be enough Smiley

No problem, I'll make a set of classes tonight (GMT+1), as I'm at work now.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Markus_Persson

JGO Wizard


Medals: 14
Projects: 19


Mojang Specifications


« Reply #13 - Posted 2006-01-30 08:46:59 »

Ooh, statistics!

Hmm, unsafe buffers are 13% faster.. that's borderline worth optimising for. Definitely so if vertex transfer is a bottleneck.
I wish it was slightly more kosher so I'd dare using it. sun.* packages are a big nono.

Play Minecraft!
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #14 - Posted 2006-01-30 20:27:27 »

To get any serious results, use the server VM:

Self-contained compilable sourcecode with the following results:

1  
2  
3  
4  
duration of last 8 runs:
---> arr:   264ms
---> buf:   1445ms
---> pnt:   240ms


arr = float[]-based
buf = FloatBuffer-based (~5.5x slower than float[])
pnt = pointer-based (~10% faster than float[])



Update:
It does a simple weight of 2 data-sources and stores the result in the third:
1  
c = a*x + b*(1-x)


I've tried to optimize all three ways for best performance, by trial-and-error.



float[]
1  
2  
3  
      // unrolling this loop makes it slower
      for (int i = 0; i < a.length; i++)
         c[i] = aMul * a[i] + bMul * b[i];



FloatBuffer
1  
2  
3  
4  
5  
6  
      while(fbA.hasRemaining())
      {
         fbC.put(aMul * fbA.get() + bMul * fbB.get());
         fbC.put(aMul * fbA.get() + bMul * fbB.get());
         fbC.put(aMul * fbA.get() + bMul * fbB.get());
      }



pointer-aritmetic
1  
2  
3  
4  
5  
6  
      for (int i = -4; i < bytes;)
      {
         unsafe.putFloat((i += 4) + c, unsafe.getFloat(i + a) * aMul + unsafe.getFloat(i + b) * bMul);
         unsafe.putFloat((i += 4) + c, unsafe.getFloat(i + a) * aMul + unsafe.getFloat(i + b) * bMul);
         unsafe.putFloat((i += 4) + c, unsafe.getFloat(i + a) * aMul + unsafe.getFloat(i + b) * bMul);
      }



Update 2:
Using pointer-arithmetic, cache-misses kick in much much later - when using large data-sets.
1  
2  
3  
4  
5  
6  
7  
When processing   512 vertices   (18KB)    float[] gets 125M/s, pointers get 133M/s.
When processing  1024 vertices   (36KB)    float[] gets 125M/s, pointers get 133M/s.
When processing  2048 vertices   (72KB)    float[] gets 125M/s, pointers get 133M/s.
When processing  4096 vertices  (144KB)    float[] gets  92M/s, pointers get 133M/s. <--
When processing  8192 vertices  (288KB)    float[] gets  42M/s, pointers get 133M/s. <--
When processing 16384 vertices  (576KB)    float[] gets  42M/s, pointers get  42M/s.
When processing 32768 vertices (1152KB)    float[] gets  31M/s, pointers get  31M/s.


Makes you wonder what happens under the hood... Smiley

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #15 - Posted 2006-01-30 22:32:43 »

You aren't comparing apples to apples. When using the relative get() and put() methods on Buffers, this implies more work, incrementing indices in the Buffer objects after each loop iteration. If you fix the benchmark to use the absolute get() and put() methods as is discussed in the JavaOne 2002 talk I referenced, above, the buffer-based version is faster than the array-based version:
1  
2  
3  
4  
duration of last 8 runs:
---> arr:       649ms   51M vertices/s
---> buf:       582ms   57M vertices/s
---> pnt:       467ms   71M vertices/s


This is on a Pentium M 1.4 GHz with 5.0u6 and -server. Revised benchmark is attached.

Please DO NOT reference sun.misc.Unsafe directly in your classes, or at least in your products. By doing so you're hiding potential performance issues with the public java.nio classes and only providing ammunition to parties within Sun who want to more severely restrict access to that class.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #16 - Posted 2006-01-30 22:47:23 »

When using the relative get() and put() methods on Buffers, this implies more work, incrementing indices in the Buffer objects after each loop iteration. If you fix the benchmark to use the absolute get() and put() methods as is discussed in the JavaOne 2002 talk I referenced...

Sorry, I quickly read that post, and (stupidly) read it the other way around...



Please DO NOT reference sun.misc.Unsafe directly in your classes, or at least in your products.

Do you mean making it accessible by that public-static method in my test-case, or just in the general case? (not referencing it anywhere == not using it)



By doing so you're <snip> only providing ammunition to parties within Sun who want to more severely restrict access to that class.

Anyway, I think, once the problems are solved, access to Unsafe actually should be severly restricted.



After I patched my code with your snippet, I got the following results:
1  
2  
3  
4  
duration of last 8 runs:
---> arr:   265ms   126M vertices/s
---> buf:   347ms   96M vertices/s
---> pnt:   244ms   137M vertices/s


P4 2.4 @1.8GHz (533 @400MHz FSB)
512MB PC2700

The difference between buf and pnt is 42%. So there is still some room for improvement on the implementation of FloatBuffers on (at least) my hardware-configuration.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #17 - Posted 2006-01-30 23:16:43 »

Okay, I removed the ByteBuffer->FloatBuffer code from the loop, which turned out to consume quite some cpu-cyles.


Results:

1  
2  
3  
4  
duration of last 8 runs:
---> arr:   369ms   90M vertices/s
---> buf:   333ms   100M vertices/s
---> pnt:   246ms   136M vertices/s




I uploaded the last version.

After my update, your (Ken) and my percentages are about the same. If you apply the update too, you'll have increased performance on buffers.

I'm very curious about that.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #18 - Posted 2006-01-31 00:29:48 »

Look again at the numbers. The issue isn't that the buffer case got significantly faster, but that the array case got significantly slower. I also see very high variation in the array case on my machine. I'm not sure whether that's because of data placement or because of differences in the generated machine code.

In general I would avoid reading too much into the results of microbenchmarks. The take-home point here, in my opinon, is that direct buffers are not significantly slower than arrays, at least when used properly. There are also more optimization opportunities possible in the HotSpot JVM which we will investigate (such as making the earlier version of the benchmark using the relative get/put methods perform identically to the one using the absolute versions).

Regarding sun.misc.Unsafe, I mean not referencing it at all. I think it's fine to do so when writing performance benchmarks like this one to try to prove or disprove a performance issue, but not when writing any sort of publicly released library or application.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #19 - Posted 2006-01-31 00:42:23 »

Yes, I noticed the math on the float[] lost about one third of its performance, after changing the FloatBuffer code.

The JIT-compiler is an impressive piece of art, appearantly invalidating the results of quite solid microbenchmarks.



I was mainly pointing at the buffer/pointer difference, as the float[] was getting way off.
But well, without knowing why the float[] performance varies, the performance of buffer vs. pointer might be influenced by "unknown forces" too, obviously.


At least I now know the fastest code for all three ways of accessing data Smiley

* Riven heads off to do important stuff... Wink

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline mekong

Junior Newbie





« Reply #20 - Posted 2006-02-01 20:38:31 »

Ouch! I cannot assume that my users will switch vm's
and this is what I see with client vm 1.5.0.06 on linux
when using absolute buffer access:
1  
2  
3  
---> arr:       703ms   47M vertices/s
---> buf:       1596ms  21M vertices/s
---> pnt:       1240ms  27M vertices/s

Roughly 100% difference!
$%#! Marketing!! :-(
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #21 - Posted 2006-02-01 23:28:38 »

While the peak performance of the HotSpot client compiler generally isn't as good as with the server compiler, I have found that for non-trivial inner loops it is possible to get very good performance from it rivalling C/C++ speed. The VertexArrayRange demo in the jogl-demos workspace does exactly this and even back in 2002 achieved 90% of C++ speed. In 2003 we showed a skinning algorithm ported from C++ to Java running at something like 85% of the speed of C++. All of these presentations are archived on the JOGL home page. I wouldn't make snap judgements based on microbenchmarks.
Offline mekong

Junior Newbie





« Reply #22 - Posted 2006-02-03 14:51:23 »

Sorry, didn't want to make you repeat yourself  Smiley

I got your point, but it happens that in my application I am basically doing
just this: a = a*b, and I do this or something comparably simple a few times
on big sequential data, before I shove it to GL.
So I thought buffers must be *ideal* for this, and I was just very surprised,
that these ops do *not* get optimized into the imho pretty obvious x86 asm.
Apparently it is better with server vm, and so I will consider sticking to buffers
and hope that v6 client will eventually set buffers on steroids...
*praying to the sun god* ...  Roll Eyes Wink
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #23 - Posted 2006-02-03 14:55:37 »

SIMD will propably dwarf any results possible these days in Java.

4 MULs at the price of 1...

Somebody write me a wrapper Smiley

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

CopyableCougar4 (24 views)
2014-08-22 19:31:30

atombrot (34 views)
2014-08-19 09:29:53

Tekkerue (30 views)
2014-08-16 06:45:27

Tekkerue (28 views)
2014-08-16 06:22:17

Tekkerue (18 views)
2014-08-16 06:20:21

Tekkerue (27 views)
2014-08-16 06:12:11

Rayexar (65 views)
2014-08-11 02:49:23

BurntPizza (41 views)
2014-08-09 21:09:32

BurntPizza (33 views)
2014-08-08 02:01:56

Norakomi (42 views)
2014-08-06 19:49:38
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!