Java-Gaming.org    
Featured games (91)
games approved by the League of Dukes
Games in Showcase (579)
games submitted by our members
Games in WIP (500)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  How to do object pooling right?  (Read 3846 times)
0 Members and 1 Guest are viewing this topic.
Offline Varkas
« Posted 2013-01-22 12:31:40 »

Since my raytracer needs to create a lot of vector objects, I thought it might be a good idea to create a vector pool. Surprisingly, it wasn't. Or maybe I'm just too stupid.

I tried an ArrayList as pool container and pool access like this (V3 is a math. 3d vector type):

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
    private static final ArrayList <V3> pool = new ArrayList<V3>(8192);
   
    public static V3 make(final V3 other)
    {
        V3 v= null;
        synchronized(pool)
        {
            if(pool.size() > 0)
            {
                v= pool.remove(pool.size()-1);
            }
        }

        if(v== null)
        {
            v= new V3(other);
        }
       
        return v;
    }
   
    public static void put(final V3 v)
    {
        synchronized(pool)
        {
            pool.add(v);
        }        
    }


This turned out to be much slower than just "new()"ing all the objects and rely on the gc to clean up well. Has object pooling become obsolete with modern JVMs? Am I doing it wrong? Is there a sort of pool container which work without synchronization?

if (error) throw new Brick(); // Blog (german): http://gedankenweber.wordpress.com
Offline Roquen
« Reply #1 - Posted 2013-01-22 12:51:19 »

Do you have numbers to show you need to think about this?  Remember that we now have escape analysis & scalar replacement (as least for trivial cases...which I'd except is what most of your vector allocations are going to look like).
Offline deepthought
« Reply #2 - Posted 2013-01-22 12:55:09 »

Try kappa's bag. It'll work a lot faster than an array list.

jocks rule the highschools. GEEKS RULE THE WORLD MWAHAHAHA!!
captain failure test game
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Grunnt

JGO Wizard


Medals: 55
Projects: 9
Exp: 5 years


Complex != complicated


« Reply #3 - Posted 2013-01-22 12:58:59 »

Am I doing it wrong? Is there a sort of pool container which work without synchronization?

I assume you are using threads, right? Otherwise I wouldn't bother with synchronization. I also think that removing objects from an ArrayList is actually relatively slow, and you might be better off using a Bag for this (as order does not matter for an object pool), e.g. the one used in the Artemis framework or as described here on the forum.

Offline Varkas
« Reply #4 - Posted 2013-01-22 13:00:54 »

Do you have numbers to show you need to think about this?  Remember that we now have escape analysis & scalar replacement (as least for trivial cases...which I'd except is what most of your vector allocations are going to look like).

The old state was 24 FPS.

Then I had a need to create a new vector object as a scaled copy of another vector in one of the innermost loops. Well, two copies of two vectors. With the new + copy operation it dropped to 18 FPS.

I figured it's the object creation (but I didn't profile it) and implemented the pool. I first thought I had made some sort of mistake, FPS dropped to 2. I then modified the code to have no pooling, but calls to the pool methods:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
    public static V3 make(final V3 other)
    {
        V3 v= null;
        /*
        synchronized(pool)
        {
            if(pool.size() > 0)
            {
                mine = pool.remove(pool.size()-1);
            }
        }
        */

        // System.err.println("Vectors in pool: " + pool.size());
       
        if(mine == null)
        {
            v= new V3(other);
        }
        else
        {
            v.set(other);
        }
       
        return v;
    }
   
    public static void put(final V3 v)
    {
        /*
        synchronized(pool)
        {
            pool.add(v);
        }
        */

    }


With that I was back at 18 FPS.

So my findings:

- 2x new is slow, and makes my FPS drop from 24 to 18 in the used scene
- pooling is slow, due to the synchronized blocks.

I'm looking for a faster pool concept to get rid of the cost for 2x new() of the vector in the inner loop, to get back to about 24 FPS again (minus the actual cost to scalke the vectors).


@Grunnt: Yes, the raytracer uses threads, currently 8 threads are using the vector pool in parallel. I could give each thread a pool, maybe that works better, but it complicates the code somewhat. Removing elements from the array list is slow unless it's the last element. At least I think so. That's why my pool takes elements from the back of the list.

if (error) throw new Brick(); // Blog (german): http://gedankenweber.wordpress.com
Offline Roquen
« Reply #5 - Posted 2013-01-22 13:22:01 »

If you're going to pool, you want one pool allocator per thread...not shared.  Two reasons: lower probability that memory within each pool are near to each other and no concurrency issues (and no currency ops).  Oh yeah...and small objects suck, so the fastest solution would be to have a flat array per thread and just work in-place.

EDIT: And there is the middle-ground of fake structs a la Riven's lib
Offline theagentd
« Reply #6 - Posted 2013-01-22 13:45:31 »

If you're going to pool, you want one pool allocator per thread...not shared.  Two reasons: lower probability that memory within each pool are near to each other and no concurrency issues (and no currency ops).  Oh yeah...and small objects suck, so the fastest solution would be to have a flat array per thread and just work in-place.

EDIT: And there is the middle-ground of fake structs a la Riven's lib
Ditch the synchronization and use ThreadLocal!

I think the main reason for using pooling is to reduce the load on the GC. This is especially important for games since huge pauses all the time are unacceptable. The performance hit is an acceptable cost for that.

Myomyomyo.
Offline Roquen
« Reply #7 - Posted 2013-01-22 13:49:20 »

Yes to no sync ... but I don't see a reason to use threadlocal...accessing isn't free...just have the manager in the worker thread instance.
Offline ClickerMonkey

JGO Coder


Medals: 20


Game Engineer


« Reply #8 - Posted 2013-01-22 14:07:23 »

Here's my memory pool:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
public class MemoryPool<T> implements Memory<T>
{

   private Factory<T> factory;
   private T[] pool;
   private int size;
   private int maximum;
   private int increase;

   public MemoryPool( T[] pool, Factory<T> factory, int maximum, int increase, boolean fill )
   {
      this.pool = pool;
      this.factory = factory;
      this.maximum = maximum;
      this.increase = increase;
     
      if ( fill ) {
         while ( size < pool.length ) {
            pool[size++] = factory.create();
         }
      }
   }
   
   
   @Override
   public T alloc()
   {
      return (size == 0 ? factory.create() : pool[--size]);
   }

   @Override
   public void free( T item )
   {
      if (size < pool.length) {
         pool[size++] = item;
      } else if (size < maximum) {
         pool = Arrays.copyOf( pool, size + increase );
         pool[size++] = item;
       }
   }

}


If I have different threads, I declare a staticly accessible pool per thread so I don't share between threads or need to use synchronize. ThreadLocal can be fine, as far as performance goes it has been improved on, but it won't be better than just placing/passing an instance of the Pool around appropriately.

Offline princec

JGO Kernel


Medals: 282
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #9 - Posted 2013-01-22 14:09:44 »

General rule of thumb - don't use pools unless construction is expensive (that is, you do work in the constructor), or the class size is "large" (say, over 1kb per instance), or you are running on Android. Otherwise you'll find you are more or less duplicating the work of JVM.

Cas Smiley

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline ClickerMonkey

JGO Coder


Medals: 20


Game Engineer


« Reply #10 - Posted 2013-01-22 14:12:48 »

General rule of thumb - don't use pools unless construction is expensive (that is, you do work in the constructor), or the class size is "large" (say, over 1kb per instance), or you are running on Android. Otherwise you'll find you are more or less duplicating the work of JVM.

Cas Smiley

I would have to agree with this in most cases, however there are instances that once you allocate a number of objects within a time frame it will cause GC unnecessarily (something like that). JBullet (the Java port to Bullet) was experiencing this so they created their own pooling mechanism (using bytecode injection, so they could always guarantee an object would be put back on the pool without the programmer needing to explicitly do it). They pool vectors, matrices, quaternions, etc. It all depends on the size of the object in memory and how many you want each frame (in essence).


Offline princec

JGO Kernel


Medals: 282
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #11 - Posted 2013-01-22 14:14:38 »

Yes, you're right; if you are creating over 100kb* of instances per frame, use a pool.

Cas Smiley

* (some arbitrary figure but you get the idea - profile first!)

Offline Varkas
« Reply #12 - Posted 2013-01-22 14:27:58 »

Yes to no sync ... but I don't see a reason to use threadlocal...accessing isn't free...just have the manager in the worker thread instance.

My tests support this. With a ThreadLocal pool I was at 13 FPS (worse than with a direct new() and pure gc) and ThreadLocal.get() was a fairly expensive call in this scenario.

Yes, you're right; if you are creating over 100kb* of instances per frame, use a pool.

'tis happening "pixels x scene objects x light sources" which can result in a few million per frame. But it still seems new() and gc are the fastest of the tested scenarios. Keeping the pool as part of the WorkerThread and handing a reference to all methods which need to call V3.make() seems unwieldy though, and I think I stick with a plain new() instead. Vectors are quite cheap to generate.

Some day I assume I'll make V3 instances references into an array of doubles. That might be the best solution overall.


if (error) throw new Brick(); // Blog (german): http://gedankenweber.wordpress.com
Offline Roquen
« Reply #13 - Posted 2013-01-22 14:48:59 »

As I mentioned earlier...there's a good chance the 'new' and initialization isn't occurring in a fair number of cases due to escape analysis and scalar replacement.
Offline Varkas
« Reply #14 - Posted 2013-01-22 14:53:38 »

[...] escape analysis and scalar replacement.

I don't know what these terms mean. Can you explain briefly, or give me pointers to explanations? Thanks Smiley

if (error) throw new Brick(); // Blog (german): http://gedankenweber.wordpress.com
Offline ClickerMonkey

JGO Coder


Medals: 20


Game Engineer


« Reply #15 - Posted 2013-01-22 15:16:12 »

I would still suggest using an array as a stack approach like I recommended... since newing in java is a relatively cheap operation you need to ensure your alloc and free logic are as efficient as possible. Even if your pool takes a little more time to alloc when compared to newing in raw time remember you save time by avoiding calling the GC (well, frequency of GC can drastically decreases).

Offline princec

JGO Kernel


Medals: 282
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #16 - Posted 2013-01-22 16:03:45 »

[...] escape analysis and scalar replacement.

I don't know what these terms mean. Can you explain briefly, or give me pointers to explanations? Thanks Smiley

The very latest Java 7 Server VM can detect objects which only live in the scope of, say, a for() loop ("escape analysis"), and they are never allocated on the heap at all, but are instead flattened on to the stack, just like a C++ programmer might choose to do manually ("scalar replacement"). It makes for some great speedups with GC-bound algorithms.

Escape analysis also removes synchronizeds where it can, too. It's pretty nifty.

BUT: As I say... latest Java 7 Server VM only. Oh, and Excelsior JET.

Cas Smiley

Offline Roquen
« Reply #17 - Posted 2013-01-22 16:05:10 »

More info:

First off to see if either of these are helping you, try running with them disabled: -XX:-DoEscapeAnalysis.  Also you might want to periodically run from the command line instead of directly through whatever IDE...cause you want to see what default (or you're chosen) options are giving you instead what you might be inheriting from the IDE.

Very inexact descriptions:  Escape analysis attempts to determine if it's impossible for an object to out-live the stack frame of when it's created.  If an object doesn't escaped, then it cannot outlive the creating frame and further transforms are possible.  The most common is to not use the standard heap to allocate and use a very cheap 'stack allocator' (potentially the call stack) instead.  This along with initialization are the creation overhead.  Further hotspot is an aggressive inliner so for many simple classes escaped instances will have initialization inlined dropping overhead to close to what one would have if all the fields of the object were simply local variables.

Scalar replacement is another potential transform that can be applied to an escaped instance.  That's where it's determined that the object itself can disappear and be replace by the equivalent of just its (used) fields on the stack.

EDIT: It's too bad that nobody distributes a precompiled debug version of hotspot, as it has useful diagnostics...like dumping info about what has been transformed. (such as -XX:+PrintEliminateAllocations in this case).
Offline sproingie
« Reply #18 - Posted 2013-01-22 16:42:07 »

BUT: As I say... latest Java 7 Server VM only. Oh, and Excelsior JET.

It's in 1.6 but required the  -XX:+DoEscapeAnalysis option, whereas I believe it's on by default in 1.7.  And yes, server VMs only, and not Dalvik at all.
Offline princec

JGO Kernel


Medals: 282
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #19 - Posted 2013-01-22 16:48:11 »

I'm not sure it actually did much in 1.6 but I stand to be corrected.

Cas Smiley

Offline ClickerMonkey

JGO Coder


Medals: 20


Game Engineer


« Reply #20 - Posted 2013-01-22 23:08:15 »

EDIT: It's too bad that nobody distributes a precompiled debug version of hotspot, as it has useful diagnostics...like dumping info about what has been transformed. (such as -XX:+PrintEliminateAllocations in this case).

Sure you can!

Here are several options I use when debugging:
-server
-XX:+PrintGCDetails
-XX:-PrintGCTimeStamps
-XX:+TraceClassUnloading
-XX:+PrintCompilation                           <-- shows when java byte code has been transformed to machine code

Offline Varkas
« Reply #21 - Posted 2013-01-22 23:24:00 »

Thanks a lot for the info on escape analysis and scalar replacement. In combination they will exactly get rid of those unwanted new()s of local helper vector variables.

I believe that I ran my tests with the client vm (=default in my IDE), but I'll do more tests soon.

Thanks again, I didn't know that the hotspot compiler is able to do that!

if (error) throw new Brick(); // Blog (german): http://gedankenweber.wordpress.com
Offline Roquen
« Reply #22 - Posted 2013-01-23 07:36:52 »

@ClickerMonkey:  The debug build you have to compile yourself and has MORE options.  For this discussion the release build doesn't have the option that spews out what "new"s are found to not escape their calling stack frame, not the one that tells you which have had scalar replacement applied.  Of course you could poor over the assem output...but that's much more work. (Oh and that requires linux or compile the plug-in yourself.)
Offline princec

JGO Kernel


Medals: 282
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #23 - Posted 2013-01-23 11:18:51 »

Also, who cares about the output! The idea of all the fancy things the JVM does is so you can write code that looks sane, and it takes care of trying to make it fast. If it's not fast, it's likely a different approach is needed, rather than poring over machine code output; particularly as the behaviour will simply change from one JVM to the next.

Cas Smiley

Offline Roquen
« Reply #24 - Posted 2013-01-23 11:54:40 »

I general I agree, my point being before spending time making stuff faster, first insure that what you're attempt to address is part of the performance problem..and if indeed it is, then check that the plan you have in mind has a reasonable expectation of being a win and evaluate if the expected engineering cost is deemed to be worthwhile. When you're thinking about arm wrestling you're compiler you need to know what you're up against and where to focus your efforts.
Offline Kerai

Junior Member


Medals: 4
Projects: 1



« Reply #25 - Posted 2013-05-20 12:12:59 »

Server VM (and therefore EscapeAnalysis) is not present in client VM - which is the only one in 32-bit JRE on Windows.

This also does not apply to Dalvik. They did testing on escape analysis but I don't know why didn't they include it. It gives soo much performance gain, ugh.

Thankfully, on Linux machines (and probably Mac too) server VM is present (with -server switch) and on 64-bit JRE on Windows server VM is also default. (Sadly almost no-one has 64 bit JVM on Windows)

Good thing is (if you are not targeting Android or iOS) you can ship JRE with server VM on Windows with your app legally. Bad thing is it weights much... You can remove some things from it, but rt.jar still weights a f**kton...
http://www.oracle.com/technetwork/java/javase/jre-7-readme-430162.html

Again, good thing is, There are unofficial OpenJDK builds for Windows - and you can strip them off of everything that is unnecessary. Including contents of rt.jar - but you should be careful not to remove something important.
https://github.com/alexkasko/openjdk-unofficial-builds

edit: see posts below
Offline Nate

JGO Kernel


Medals: 129
Projects: 3
Exp: 14 years


Esoteric Software


« Reply #26 - Posted 2013-05-20 15:34:32 »

I bundle an OpenJDK JRE with Spine. Weighs ~13MB.
https://code.google.com/p/libgdx/wiki/BundlingJRE

Offline Cero
« Reply #27 - Posted 2013-05-20 15:48:25 »

also server vm is just one dll to copy  persecutioncomplex

Offline princec

JGO Kernel


Medals: 282
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #28 - Posted 2013-05-21 00:15:59 »

Behold!

http://downloads.puppygames.net/SetupUltratron.exe
http://downloads.puppygames.net/Ultratron_MacOSX.zip
http://downloads.puppygames.net/Ultratron.tar.gz

That's your baseline sizes pretty much. I couldn't get them much smaller without "cheating".

<edit>And if you were wondering without having to delve into the .sh files, the commandline args look like:
    -server \
    -XX:+TieredCompilation \
    -XX:Tier2CompileThreshold=70000 \
    -XX:+DoEscapeAnalysis \
    -Xms64m \
    -Xmx256m \
    -Xincgc \

Cas Smiley

Offline Kerai

Junior Member


Medals: 4
Projects: 1



« Reply #29 - Posted 2013-05-21 15:14:38 »

A few more fine jvm options, that you should at least try
1  
2  
3  
4  
5  
6  
-XX:MaxPermSize=20M // this or 30m, typically my apps never needed more for perm size
-XX:MaxInlineSize=512
-XX:FreqInlineSize=512
-XX:InlineSmallCode=2000
-XX:+UseFastAccessorMethods
-XX:-DontCompileHugeMethods


back to OpenJDK

For Linux: OpenJDK already contains server VM. Should be pre-installed. If not, user can install it with one command line, no need to bundle it.

For MacOSX: Probably contains server VM already, no need to bundle.

For Window: 32 bit constains only client vm, you should bundle openjdk.
If user somehow has 64 bit jvm, then it has server vm only, no need to bundle - but users don't usually have 64 bit vm...

http://dwn.keraj.net/OpenJRE-7u6-windows-i386.7z - 8mb
- rt.jar is not compressed - 24mb - so 7zip can compress it more efficiently. (resulting in 8mb package)
Pages: [1] 2
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (42 views)
2014-04-15 18:08:23

BurntPizza (38 views)
2014-04-15 03:46:01

UprightPath (54 views)
2014-04-14 17:39:50

UprightPath (36 views)
2014-04-14 17:35:47

Porlus (53 views)
2014-04-14 15:48:38

tom_mai78101 (75 views)
2014-04-10 04:04:31

BurntPizza (134 views)
2014-04-08 23:06:04

tom_mai78101 (234 views)
2014-04-05 13:34:39

trollwarrior1 (195 views)
2014-04-04 12:06:45

CJLetsGame (203 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!