Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (475)
Games in Android Showcase (106)
games submitted by our members
Games in WIP (530)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Tiny object performance overhead  (Read 7741 times)
0 Members and 1 Guest are viewing this topic.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Posted 2006-09-22 21:34:59 »

I read in some article* of a JVM engineer that creating new objects was 'almost at the cost of shifting a pointer'.

* I tried hard to find the article, but sometimes java.sun.com is kinda hard to wade through


Further, the GC is considered so intelligent and efficient, that its effect should be 'noise' even in performance-critical code.

Combining these two, would almost make you think allocating and discarding tiny objects is nearly free, or at least have a small impact.


I decided to give it a test, in a real-world application which has its bottleneck in some sphere<->triangle method.
Basic vector-math (Vec3) was implemented like:

1  
2  
3  
public static final Vec3 add(Vec3 a, Vec3 b) {
    return new Vec3(a.x + b.x, a.y + b.y, a.z + b.z);
}


When I was writing this code it seemed horribly inefficient.

The next code, shows the algorithm:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
      Vec3 ba = sub(b, a);
      Vec3 ca = sub(c, a);
      Vec3 pa = sub(p, a);
      float snom = dot(pa, ba);
      float tnom = dot(pa, ca);
      if (snom <= 0.0f && tnom <= 0.0f)
         return a;

      Vec3 cb = sub(c, b);
      Vec3 pb = sub(p, b);
      float unom = dot(pb, cb);
      float sdenom = dot(pb, sub(a, b));
      if (sdenom <= 0.0f && unom <= 0.0f)
         return b;

      Vec3 pc = sub(p, c);
      float tdenom = dot(pc, sub(a, c));
      float udenom = dot(pc, sub(b, c));
      if (tdenom <= 0.0f && udenom <= 0.0f)
         return c;

      Vec3 n = cross(ba, ca);

      Vec3 ap = sub(a, p);
      Vec3 bp = sub(b, p);
      float vc = dot(n, cross(ap, bp));
      if (vc <= 0.0f && snom >= 0.0f && sdenom >= 0.0f)
         return add(a, mul(snom / (snom + sdenom), ba));

      Vec3 cp = sub(c, p);
      float va = dot(n, cross(bp, cp));
      if (va <= 0.0f && unom >= 0.0f && udenom >= 0.0f)
         return add(b, mul(unom / (unom + udenom), cb));

      float vb = dot(n, cross(cp, ap));
      if (vb <= 0.0f && tnom >= 0.0f && tdenom >= 0.0f)
         return add(a, mul(tnom / (tnom + tdenom), ca));

      float u = va / (va + vb + vc);
      float v = vb / (va + vb + vc);
      float w = 1.0f - u - v;

      return add(add(mul(u, a), mul(v, b)), mul(w, c));


The following is the version where all Vec3 methods are inlined:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
      float bax = b.x - a.x;
      float bay = b.y - a.y;
      float baz = b.z - a.z;

      float cax = c.x - a.x;
      float cay = c.y - a.y;
      float caz = c.z - a.z;

      float pax = p.x - a.x;
      float pay = p.y - a.y;
      float paz = p.z - a.z;

      float snom = pax * bax + pay * bay + paz * baz;
      float tnom = pax * cax + pay * cay + paz * caz;
      if (snom <= 0.0f && tnom <= 0.0f)
         return a;

      float abx = a.x - b.x;
      float aby = a.y - b.y;
      float abz = a.z - b.z;

      float cbx = c.x - b.x;
      float cby = c.y - b.y;
      float cbz = c.z - b.z;

      float pbx = p.x - b.x;
      float pby = p.y - b.y;
      float pbz = p.z - b.z;

      float unom = pbx * cbx + pby * cby + pbz * cbz;
      float sdenom = pbx * abx + pby * aby + pbz * abz;
      if (sdenom <= 0.0f && unom <= 0.0f)
         return b;

      float pcx = p.x - c.x;
      float pcy = p.y - c.y;
      float pcz = p.z - c.z;

      float acx = a.x - c.x;
      float acy = a.y - c.y;
      float acz = a.z - c.z;

      float bcx = b.x - c.x;
      float bcy = b.y - c.y;
      float bcz = b.z - c.z;

      float tdenom = pcx * acx + pcy * acy + pcz * acz;
      float udenom = pcx * bcx + pcy * bcy + pcz * bcz;
      if (tdenom <= 0.0f && udenom <= 0.0f)
         return c;

      float nx = bay * caz - baz * cay;
      float ny = baz * cax - bax * caz;
      float nz = bax * cay - bay * cax;

      float apx = a.x - p.x;
      float apy = a.y - p.y;
      float apz = a.z - p.z;

      float bpx = b.x - p.x;
      float bpy = b.y - p.y;
      float bpz = b.z - p.z;

      float APBPx = apy * bpz - apz * bpy;
      float APBPy = apz * bpx - apx * bpz;
      float APBPz = apx * bpy - apy * bpx;

      float vc = nx * APBPx + ny * APBPy + nz * APBPz;
      if (vc <= 0.0f && snom >= 0.0f && sdenom >= 0.0f)
      {
         Vec3 r = new Vec3();
         float t = snom / (snom + sdenom);
         r.x = bax * t + a.x;
         r.y = bay * t + a.y;
         r.z = baz * t + a.z;
         return r;
      }

      float cpx = c.x - p.x;
      float cpy = c.y - p.y;
      float cpz = c.z - p.z;

      float BPCPx = bpy * cpz - bpz * cpy;
      float BPCPy = bpz * cpx - bpx * cpz;
      float BPCPz = bpx * cpy - bpy * cpx;

      float va = nx * BPCPx + ny * BPCPy + nz * BPCPz;
      if (va <= 0.0f && unom >= 0.0f && udenom >= 0.0f)
      {
         Vec3 r = new Vec3();
         float t = unom / (unom + udenom);
         r.x = cbx * t + b.x;
         r.y = cby * t + b.y;
         r.z = cbz * t + b.z;
         return r;
      }

      float CPAPx = cpy * apz - cpz * apy;
      float CPAPy = cpz * apx - cpx * apz;
      float CPAPz = cpx * apy - cpy * apx;

      float vb = nx * CPAPx + ny * CPAPy + nz * CPAPz;
      if (vb <= 0.0f && tnom >= 0.0f && tdenom >= 0.0f)
      {
         Vec3 r = new Vec3();
         float t = (tnom / (tnom + tdenom));
         r.x = cax * t + a.x;
         r.y = cay * t + a.y;
         r.z = caz * t + a.z;
         return r;
      }

      float u = va / (va + vb + vc);
      float v = vb / (va + vb + vc);
      float w = 1.0f - u - v;

      Vec3 r = new Vec3();
      r.x = u * a.x + v * b.x + w * c.x;
      r.y = u * a.y + v * b.y + w * c.y;
      r.z = u * a.z + v * b.z + w * c.z;
      return r;



After warming both loops for several seconds, allowing the JVM to inline and optimize, these are the results:

Objects:1548ms1553ms1551ms
Inlined:505ms500ms558ms

This is clearly not 'noise' anymore (timing difference wise).

Some of you guys (to be honest, including me) would say: doh! - but I kinda started to believe they really reduced the overhead of objects. Sadly this doesn't seem to be the case as of yet.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #1 - Posted 2006-09-22 21:48:07 »

Found Jeffs remarks on this topic:

http://wiki.java.net/bin/view/Games/JeffOnPerformance#Do_I_need_to_avoid_garbage_colle
Quote
This means you are free today to create objects just to pass in and out of method calls
or hold temporary values, a practice which makes your code a whole lot neater, less buggy,
and simpler to maintain.



I'll continue my search for the article about the pointer-shift...
I found a quote of it, on another website:
Quote
Garbage Collection

The garbage collector has been greatly improved: creating a new object is now an incredibly
cheap operation, in most cases equivalent to shifting a pointer in memory. Don't necessarily be
afraid of creating many short-lived objects, they will be garbage-collected very efficiently.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Matzon

JGO Knight


Medals: 19
Projects: 1


I'm gonna wring your pants!


« Reply #2 - Posted 2006-09-23 02:17:28 »

are you sure that the methods are inlined? - else you'd have a method overhead in the object test

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline kappa
« League of Dukes »

JGO Kernel


Medals: 74
Projects: 15


★★★★★


« Reply #3 - Posted 2006-09-23 02:56:49 »

you know i had the exact same impression that creating small objects was free, just today i was trying to decide wheather to send 9 float as objects or directly as floats.

1  
2  
3  
4  
public void someMethod(float vec1x, float vec1y, float vec1z,
                          float vec2x, float vec2y, float vec2z,
                          float vec3x, float vec3y, float vec3z){
}


or wrap the values in Vector3f objects

1  
2  
public void someMethod(Vector3f a, Vector3f b, Vector3f c) {
}


clearly the second version is much nicer and cleaner but requires creating 3 more objects(of Vector3f), so according to your test first version would be more optimal performance wise?

Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2006-09-23 10:44:06 »

are you sure that the methods are inlined? - else you'd have a method overhead in the object test

Yup, running Xprof shows no sign of these methods anymore, they are interpretated a few times, then dissappear (0.4% of the ticks)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline CommanderKeith
« Reply #5 - Posted 2006-09-23 11:18:43 »

Very interesting stats.

Try Java 6, apparently 'small object creation' has become much more efficient.  see:

http://www.javalobby.org/java/forums/t66270.html

PS: I'm sure you know but to to avoid warming up loops, try the VM with the -server option (only works on windows with JDK VM however).

Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #6 - Posted 2006-09-23 11:19:11 »

you know i had the exact same impression that creating small objects was free, just today i was trying to decide wheather to send 9 float as objects or directly as floats.

1  
2  
3  
4  
public void someMethod(float vec1x, float vec1y, float vec1z,
                          float vec2x, float vec2y, float vec2z,
                          float vec3x, float vec3y, float vec3z){
}


or wrap the values in Vector3f objects

1  
2  
public void someMethod(Vector3f a, Vector3f b, Vector3f c) {
}


clearly the second version is much nicer and cleaner but requires creating 3 more objects(of Vector3f), so according to your test first version would be more optimal performance wise?




New Object Loop
1  
2  
3  
4  
5  
6  
7  
8  
            int p = values.length - 1;
            while (p > 12)
            {
               Vec3 a = new Vec3(values[p--], values[p--], values[p--]);
               Vec3 b = new Vec3(values[p--], values[p--], values[p--]);
               Vec3 c = new Vec3(values[p--], values[p--], values[p--]);
               r += fancyCalc(a, b, c);
            }


Used Object Loop
1  
2  
3  
4  
5  
6  
7  
8  
            int p = values.length - 1;
            while (p > 12)
            {
              a.load(values[p--], values[p--], values[p--]);
              b.load(values[p--], values[p--], values[p--]);
              c.load(values[p--], values[p--], values[p--]);
              r += fancyCalc(a, b, c);
            }


Many Floats Loop
1  
2  
3  
4  
5  
            int p = values.length - 1;
            while (p > 12)
            {
               r += fancyCalc(values[p--], values[p--], values[p--], values[p--], values[p--], values[p--], values[p--], values[p--], values[p--]);
            }


update: Float Array Loop
1  
2  
3  
4  
5  
            int p = values.length - 1;
            while (p > 12)
            {
               r += fancyCalc(values, p -= 9);
            }




Client VM 1.4Server VM 1.4-------Client VM 1.5Server VM 1.5-------Client VM 1.6Server VM 1.6
New Object Loop2266ms1453ms2188ms1354ms1427ms1094ms
Used Object Loop1404ms656ms1326ms447ms588ms281ms
Many Floats Loop1265ms328ms1278ms246ms420ms230ms
Float Array Loop?ms?ms1206ms250ms310ms219ms


Fancy calc
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
   private static final float fancyCalc(float ax, float ay, float az, float bx, float by, float bz, float cx, float cy, float cz)
   {
      float dotAB = ax * bx + ay * by + az * bz;
      float dotBC = bx * cx + by * cy + bz * cz;
      float dotCA = cx * ax + cy * ay + cz * az;

      return (dotAB + dotBC) * dotCA + (1.0f - dotCA);
   }

   private static final float fancyCalc(Vec3 a, Vec3 b, Vec3 c)
   {
      float dotAB = a.x * b.x + a.y * b.y + a.z * b.z;
      float dotBC = b.x * c.x + b.y * c.y + b.z * c.z;
      float dotCA = c.x * a.x + c.y * a.y + c.z * a.z;

      return (dotAB + dotBC) * dotCA + (1.0f - dotCA);
   }

   private static final float fancyCalc(float[] buf, int off)
   {
      float ax = buf[off + 0];
      float ay = buf[off + 1];
      float az = buf[off + 2];

      float bx = buf[off + 3];
      float by = buf[off + 4];
      float bz = buf[off + 5];

      float cx = buf[off + 6];
      float cy = buf[off + 7];
      float cz = buf[off + 8];

      float dotAB = ax * bx + ay * by + az * bz;
      float dotBC = bx * cx + by * cy + bz * cz;
      float dotCA = cx * ax + cy * ay + cz * az;

      return (dotAB + dotBC) * dotCA + (1.0f - dotCA);
   }


Ofcourse the body of fancyCalc is a bit too large to measure only the overhead of the way it is invoked, but it's more 'real world' this way, instead of yet another 'micro benchmark'

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #7 - Posted 2006-09-23 12:01:18 »

Very interesting stats.

Try Java 6, apparently 'small object creation' has become much more efficient.  see:

http://www.javalobby.org/java/forums/t66270.html

PS: I'm sure you know but to to avoid warming up loops, try the VM with the -server option (only works on windows with JDK VM however).

The Server VM takeas even longer to warm up. Anyway, I'm giving the VM more than enough time to warm up, so which VM is used doesn't really matter.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline CommanderKeith
« Reply #8 - Posted 2006-09-23 12:26:19 »

Wow, quick reply!

That is disappointing, I thought hotspot would turn the 'new object' code into the 'direct' code.    Well at least object creation has gotten better in Java 6.  How badly do these bottlenecks affect you, because in all of my games it's the blitting to the screen that takes most of the time.

I wonder why the 1.6 Client VM is so much quicker than the 1.5 equivalent when doing the 'direct' method? 

PS: oops, I thought the server VM did all possible native code compilation AND inlining.  So inlining must still be done dynamically at runtime by the server VM


Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #9 - Posted 2006-09-23 12:39:42 »

Offtopic:

I found out that
FloatBuffer.get(int) is more than twice as slow in 6.0 (compared to 1.5) (both client VM)

fancyCalc FloatBuffer client 1.5: ~1000ms
fancyCalc FloatBuffer client 1.6: ~2400ms <--- ?? serious regression Shocked Angry

fancyCalc FloatBuffer server 1.5: ~285ms
fancyCalc FloatBuffer server 1.6: ~290ms

FloatBuffer = direct, native-ordered buffer


Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Martin Strand

Junior Member





« Reply #10 - Posted 2006-09-23 13:00:08 »

PS: oops, I thought the server VM did all possible native code compilation AND inlining.  So inlining must still be done dynamically at runtime by the server VM
It does more agressive inlining but still needs to warm up. You can use -Xcomp to have methods compiled the first time they're invoked, that would get rid of the warmup loop.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #11 - Posted 2006-09-23 13:04:59 »

PS: oops, I thought the server VM did all possible native code compilation AND inlining.  So inlining must still be done dynamically at runtime by the server VM
It does more agressive inlining but still needs to warm up. You can use -Xcomp to have methods compiled the first time they're invoked, that would get rid of the warmup loop.

Indeed, yet it often results in crappy optimized code, as the VM didn't have enough time to properly analyze the code-paths and adjust the optimizing process with that data.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline g666

Junior Member





« Reply #12 - Posted 2006-09-23 18:46:46 »

Well i didnt ever see any examples that showed you could now create small objects for not much more of a cost than reusing them, so i didnt ever believe it, im not sure why any1 did. Smiley

desperately seeking sanity
Offline CommanderKeith
« Reply #13 - Posted 2006-09-24 05:49:34 »

Well I've been told many times here (& read elsewhere) that object pooling doesn't give any performance boost - since the gc is so efficient & object creation is swift.

So object pooling can still be a good idea (if object creation is causing the bottleneck). 

And what will you do with your vector-math code Riven - persevere with temporary objects or use primitives or object pooling?

Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #14 - Posted 2006-09-24 10:43:44 »

I wrote this vector-math code for this test only. I always had a gut-feeling it would be dead-slow, so this was the only place that created all those objects.

Next test I'll do will be with an ObjectPool. I have my doubts about your statement object-pooling still being feasible. We'll see.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Linuxhippy

Senior Member


Medals: 1


Java games rock!


« Reply #15 - Posted 2006-09-24 17:06:42 »

However object pooling time is quite consistent and also does not hurt scalability on many CPUs.
I am working for a larger company and my job is to tune the stuff other (cheaper *lol*) programmes produce - if you're running on a 32-64CPU machine generating garbage is VERY expensive and HURTS concurrency a lot. However managing memory yourself means ... well you've to take care Wink
Have a look at javolution, a nice framework for fast object pooling :-)

lg Clemens
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #16 - Posted 2006-09-25 10:19:55 »

Um, are you sure you've got the right end of the stick here?

I thought the claim was that the garbage collection of small objects is now practically free, as small as shifting a pointer.

Off the top of my head, it is clearly impossible for *object creation* to be that cheap - you have to initialise lots of data in memory (think how much data an object actually contains under the hood if it contains merely a simple float)

malloc will be first against the wall when the revolution comes...
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #17 - Posted 2006-09-25 21:07:33 »

Off the top of my head, it is clearly impossible for *object creation* to be that cheap - you have to initialise lots of data in memory (think how much data an object actually contains under the hood if it contains merely a simple float)

Think about it in terms of allocation versus initialization blah^3. Allocation is reserving address space for the object, and initialization is assigning actual values to fields, etc... So, allocation can indeed be as fast as a pointer bump.

I've done this in C++ code where I've written custom allocators for a routine. The routine allocates some millions of nodes over it's relatively short (1 second) execution time. The allocator has a pre-allocated memory pool. When it needs to allocate a node, it simply bumps a pointer. No nodes get deallocated until the very end of the routine, at which point they are all "deallocated" by simply resetting the pointer to the top of the pool. This reduced allocation/deallocation times to just about nil.

You can do something similar in Java by creating an object pool, but those objects are still something that the gc is aware of.

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #18 - Posted 2006-09-25 21:19:00 »

However object pooling time is quite consistent and also does not hurt scalability on many CPUs.

It depends on what your pools look like. If they're MT-safe, that definitely hurts scalability. For example, Java heaps are tuned to be extremely fast for multi-threaded allocations. (They blow the bog-standard C++ allocators out of the water). They can do all sorts of dirty tricks like segmenting different areas of heap address space per thread to reduce contention. There are tricks you can do with object pools too (like creating threadlocal pools, but the cost of a threadlocal lookup isn't zero), but they aren't trivial and do involve other kinds of overhead.

Most people can just forget about allocation and pooling unless they're creating millions of objects / second, or using a class that has heavyweight initialization (e.g. database connections).

About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Offline princec

JGO Kernel


Medals: 339
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #19 - Posted 2006-09-26 12:02:17 »

Pools are for objects that are expensive to construct and/or initialise.

Cas Smiley

Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #20 - Posted 2006-09-26 12:11:02 »

Appearantly this object is expensive to construct and/or initialize:

1  
2  
3  
4  
public class Vec3
{
  public float x,y,z;
}


See my benchmark: compare "new object" (1354ms) and "used object" (447ms).

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Jeff

JGO Coder




Got any cats?


« Reply #21 - Posted 2006-10-18 17:52:08 »

Ya gotta be **really* careful with micro-benchmarks in java.  Its very very easy to get meaningles results.

I'm swamped now but if I find time later Ill take a look at this particular example.

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #22 - Posted 2006-10-18 17:56:18 »

I'm fully aware of that, and have run into this in real-world applications, and turned it into a bechmark to show the results with you guys without uploading large packages of code, with at least a dozen dependancies.

Even in realworld cases the results of the changes in architechture were very similar, so please don't think of it as yet-another-benchmark that the JVM isn't handling properly yet.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline pepijnve

Junior Member




Java games rock!


« Reply #23 - Posted 2006-10-19 08:54:17 »

I second Riven's comment on the expensive Vec3 construction. I recently reworked some C++ code that used some vector math classes like the Vec3 class. All the Vec3 operators (+-*/) allocated new Vec3 instances (stack allocation). Initially I converted this to 'new Vec3()' in the java code, but the performance was terrible. The algorithm in question was causing lots of very shortlived Vec3 instances to be allocated inside inner loops. I then reworked the code to reuse Vec3 instances as much as possible. This improved performance a lot, but the elegance of the code dropped Smiley Unfortunately I can't seem to find my test results...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #24 - Posted 2006-10-19 10:09:48 »

Off the top of my head, it is clearly impossible for *object creation* to be that cheap - you have to initialise lots of data in memory (think how much data an object actually contains under the hood if it contains merely a simple float)

Think about it in terms of allocation versus initialization blah^3. Allocation is reserving address space for the object, and initialization is assigning actual values to fields, etc... So, allocation can indeed be as fast as a pointer bump.

I've done this in C++ code where I've written custom allocators for a routine. The routine allocates some millions of nodes over it's relatively short (1 second) execution time. The allocator has a pre-allocated memory pool. When it needs to allocate a node, it simply bumps a pointer. No nodes get deallocated until the very end of the routine, at which point they are all "deallocated" by simply resetting the pointer to the top of the pool. This reduced allocation/deallocation times to just about nil.

You can do something similar in Java by creating an object pool, but those objects are still something that the gc is aware of.

Bad description on my part, but what I was trying to do was allude to the fact that a java object that merely contains a float also contains many other bytes imposed by the language. Whereas you have to re-initialize only the float with pooling, with new'ing you have to initialize a bunch of other data.

So, I took "object creation" as used in this discussion to mean "allocation + initialization of required JVM/language/platform data".

No? Yes? Maybe?

malloc will be first against the wall when the revolution comes...
Offline princec

JGO Kernel


Medals: 339
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #25 - Posted 2006-10-19 13:52:33 »

Once again, it is probably the case that escape analysis and stack allocation will cure most of this. Due in Java 7 isn't it? I've seen it working in Jet and it pretty much does the trick performance wise.

Cas Smiley

Offline walter_bruce

Junior Newbie




Performance matters.


« Reply #26 - Posted 2006-12-05 17:32:56 »

Escape analysis will help and be a nice addition but it is no panacea.  It should easily handle the trivial cases shown in typical microbenchmarks where an object is created, used once, and thrown away all within a single method.  For harder cases, for example where the temporary object is used to marshal arguments for a possibly polymorphic method call, it remains to be seen how often escape analysis can handle this for real code in large projects.  And of course, if the object has any significant lifespan, for example if the object is part of a larger object, then escape analysis cannot help.  It does not allow objects to be inlined into other objects.

The small object overhead is still significant and while escape analysis is a good thing, it only fixes one aspect of a wider problem.
Offline CommanderKeith
« Reply #27 - Posted 2007-04-30 05:22:44 »

Quote
I read in some article* of a JVM engineer that creating new objects was 'almost at the cost of shifting a pointer'.

* I tried hard to find the article, but sometimes java.sun.com is kinda hard to wade through

I think CaptainJester just found the article you were talking about:

Quote
Check out this article on how the garbage collecter works: http://www-128.ibm.com/developerworks/java/library/j-jtp01274.html


The 1.0 and 1.1 JDKs used a mark-sweep collector, which did compaction on some -- but not all -- collections, meaning that the heap might be fragmented after a garbage collection. Accordingly, memory allocation costs in the 1.0 and 1.1 JVMs were comparable to that in C or C++, where the allocator uses heuristics such as "first-first" or "best-fit" to manage the free heap space. Deallocation costs were also high, since the mark-sweep collector had to sweep the entire heap at every collection. No wonder we were advised to go easy on the allocator.

In HotSpot JVMs (Sun JDK 1.2 and later), things got a lot better -- the Sun JDKs moved to a generational collector. Because a copying collector is used for the young generation, the free space in the heap is always contiguous so that allocation of a new object from the heap can be done through a simple pointer addition, as shown in Listing 1. This makes object allocation in Java applications significantly cheaper than it is in C, a possibility that many developers at first have difficulty imagining. Similarly, because copying collectors do not visit dead objects, a heap with a large number of temporary objects, which is a common situation in Java applications, costs very little to collect; simply trace and copy the live objects to a survivor space and reclaim the entire heap in one fell swoop. No free lists, no block coalescing, no compacting -- just wipe the heap clean and start over. So both allocation and deallocation costs per object went way down in JDK 1.2.


in this thread:

http://www.java-gaming.org/forums/index.php?topic=16512.msg130580;topicseen#msg130580

Online Riven
« League of Dukes »

JGO Overlord


Medals: 742
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #28 - Posted 2007-04-30 13:15:29 »

Thanks for backing up that statement.

It simply shows there is more to object-creation than just allocation. Even an object with an 'empty' constructor has significant overhead.  At least the object-header has to be written (as it's not a struct) which might require fetching the class-id, or something else entirely...

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline t_larkworthy

Senior Member


Medals: 1
Projects: 1


Google App Engine Rocks!


« Reply #29 - Posted 2007-04-30 14:16:55 »

Object pooling in JOODE has speed it up by an order of magnitute. 

Runesketch: an Online CCG built on Google App Engine where players draw their cards and trade. Fight, draw or trade yourself to success.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

ctomni231 (34 views)
2014-07-18 06:55:21

Zero Volt (30 views)
2014-07-17 23:47:54

danieldean (25 views)
2014-07-17 23:41:23

MustardPeter (27 views)
2014-07-16 23:30:00

Cero (42 views)
2014-07-16 00:42:17

Riven (44 views)
2014-07-14 18:02:53

OpenGLShaders (33 views)
2014-07-14 16:23:47

Riven (34 views)
2014-07-14 11:51:35

quew8 (30 views)
2014-07-13 13:57:52

SHC (66 views)
2014-07-12 17:50:04
HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54

HotSpot Options
by Roquen
2014-05-06 15:03:10

Escape Analysis
by Roquen
2014-04-29 22:16:43

Experimental Toys
by Roquen
2014-04-28 13:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!