Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (477)
Games in Android Showcase (107)
games submitted by our members
Games in WIP (535)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
   Home   Help   Search   Login   Register   
  Show Posts
Pages: [1]
1  Game Development / Performance Tuning / Re: Tiny object performance overhead on: 2006-12-05 17:32:56
Escape analysis will help and be a nice addition but it is no panacea.  It should easily handle the trivial cases shown in typical microbenchmarks where an object is created, used once, and thrown away all within a single method.  For harder cases, for example where the temporary object is used to marshal arguments for a possibly polymorphic method call, it remains to be seen how often escape analysis can handle this for real code in large projects.  And of course, if the object has any significant lifespan, for example if the object is part of a larger object, then escape analysis cannot help.  It does not allow objects to be inlined into other objects.

The small object overhead is still significant and while escape analysis is a good thing, it only fixes one aspect of a wider problem.
2  Game Development / Performance Tuning / Re: Microbenchmark - new vs. reuse on: 2003-08-25 21:44:40
--- Field (4) faster than local variables (1) under -client
I've profiled the code using VTune which has the side benefit of allowing one to view the assembly code produced by the hotspot compiler.  I've posted the resulting assembly code for the local variable and field routines here (sorry for the strange formatting):
http://www.graphics.cornell.edu/~bjw/CPTLocalVarClient.txt
http://www.graphics.cornell.edu/~bjw/CPTFieldClient.txt
I'm not an x86 assembly expert but perhaps there is an expert out there who can analyze the differences.  One thing I noticed is that local variable code computes the results using fp registers and then copies the results using int registers while the field code uses fp registers throughout.

--- Field (4) much slower than local variables (1) under -server
This turns out to be an inlining effect.  Hotspot inlines method (1) by default but not method (4).  If I disable all inlining (using the -XX:MaxInlineSize=1 -XX:FreqInlineSize=1 flags) then the local var method slows down to 0.038 us (or just a hair faster than field).  However I could not find any parameter setting that would convince hotspot to inline the field method (4) the same way it is inlining method (1) by default.

Incidently I don't think its actually any more difficult to generate the SSE/2 instructions instead of x87 for the floating point code (in fact the SSE/2 code is simpler and probably easier to generate).  Hotspot -server does not use the SIMD parts of SSE/2, just the scalar instructions as shown in the assembly code linked below.  I think SSE/2 fp code is a feature that is likely to migrate down into the client JVM in the next version, especially if enough people request it.
http://www.graphics.cornell.edu/~bjw/CPTLocalVarServer.txt
3  Game Development / Performance Tuning / Re: Microbenchmark - new vs. reuse on: 2003-08-22 20:09:33
I ran some more tests to see why the cost of synchronization seemed to vary so much and it seems to depend on whether you are running on a single processor or dual processor machine.  My previous tests were run on a dual processor which I didn't mention because it didn't seem relevant since the test is entirely single threaded (ie the other processor just sits idle).  However I've gone back and redone my timings (with fewer other applications open) on single and dual processor machines which are otherwise similar and here are the results:





























 JVM 1.4.2   client   client  |   server   server  
 1.7GHz P4   single   dual   |   single   dual  
(1) Local var  ---------0.0770.076|  0.0120.012
(2) New  --------------0.0700.141|  0.0560.124
(3) ThreadLocal  -----0.1020.100|  0.0430.039
(4) Field  -------------0.0430.042|  0.0450.043
(5) Field sync  -------0.0570.231|  0.0550.178
(6) TempStack  ------0.1210.128|  0.0450.047
(7) TempStack param 0.0530.072|  0.0160.016

  • Most results are similar except for the cost of new (2) and synchronization (5) are much higher on a dual processor machine.
  • Adding synchronized to a method is virtually free on a single processor (assuming no contention), but fairly expensive on a dual processor
  • Using new (2) on single processor machine under the client JVM seems to be reasonably fast.  Only the field (4) and field sync (5) methods are faster and not by that much.  However on a dual processor or when using -server, then there are other techniques that are much faster than using new.
  • TempStack param (7) also seems to slows down somewhat on a dual processor for reasons I don't understand, but only under -client, not -server.
  • Would we see the same slowdowns on a single processor machine with HyperThreading enabled?  (It was not enabled in any of my tests.)  I'll try to test this if I can find a suitable machine.

I wonder if the JVM actually generates different code on a single vs. dual processor machine or if there is something else going here (cache effects? context switching?).
4  Game Development / Performance Tuning / Re: Microbenchmark - new vs. reuse on: 2003-08-21 14:45:36
Thanks for the additional data points.  A few observations
  • All three client JVMs seem to preserve the oddity that using a field (4) is cheaper than using local variables (1).  I still don't understand why this is the case.
  • The relative cost of a synchronized method (5) seems to be lower on MacOSX and much lower under Redhat as compared to my Windows results.
  • The relative cost of new and garbage collection seems slightly lower under MacOSX but significantly higher under Redhat.  It still large enough in all cases to be a potential bottleneck in truly performance-critical code.
  • Under MacOSX, -client and -server are not significantly different which is not surprising since my understanding was that -server is ignored under the current MacOSX JVM.
5  Game Development / Performance Tuning / Microbenchmark - new vs. reuse on: 2003-08-20 19:53:49
I've created a little microbenchmark test the relative costs of a few different ways of getting access to a temporary object inside a short method, and I thought the results might be of interest to other people.

My test computes the cross product of two 3D vectors and returns the result in the first vector.  This is a small but real-world-useful operation, and it requires some temporary space for the cross product.  I coded up multiple versions that used different techniques to get the needed temporary space as follows:
    [1] Local var.  This method just used local double variables for its temporary space.  This is the only version that does not use an object (a Vector3d) for its temporary space.
    [2] New.  Allocate a new Vector3d each time the method is called.  
    [3] ThreadLocal.  Get a temporary object using a ThreadLocal object
    [4]  Field.  Get temporary object stored in a private field.
    [5]  Field sync.  Synchronized method which gets its temporary from a private field as in [4].
    [6] TempStack.  Get temporary object from a TempStack which is essentially a object pool where objects must be returned in the reverse-order that they were gotten.  TempStack is obtained using a ThreadLocal
    [7] TempStack param.  Use a TempStack passed in an explicit extra parameter.  Ugly in that it requires an extra parameter but can be relatively fast.

Method 4 is not thread-safe and methods 3, 4, and 5 cannot be used in recursive methods.  Method 2 is the cleanest of the object based methods, but how does its performance compare to the other?  Here are some timings from my 1.7GHz Pentium 4 machine:











Test  +++++++++++   JVM 1.4.2 -client             JVM 1.4.2 -server
(1) Local var0.0760.015
(2) New0.1440.120
(3) ThreadLocal0.1000.039
(4) Field0.0480.040
(5) Field sync0.2050.216
(6) TempStack0.1270.047
(7) TempStack param0.0690.016

Times are in microseconds per method call and you can get the complete source code here http://www.graphics.cornell.edu/~bjw/CrossProductTest.java

A few thing to note from the results
  • As others have noted, under 1.4.2 -server is much faster for floating point code than -client
  • The difference between (1) and (2) gives the approximate cost of allocating and garbage collecting the temporary Vector3d object.  Allocation increases the cost of the cross product method by a factor between 2 (client) and 8 (server), so it is still a very significant cost in this case.
  • The synchronized method is the most expensive in all cases, so it is still best to avoid synchronization when possible.
  • We use a technique similar to (7) in performance-critical sections of our own code, and I would happily change the code to something cleaner like (2), if the cost was small.   However the cleaner object-based techniques are still significantly slower.
  • Although I expected (1) to be the fastest, under -client it turns out to be actually slower than (4) and (7) for reasons I don't understand.
  • The field method (4) seems to be relatively slow under -server, for reasons I also don't understand.

Caveat: This is a microbenchmark and performance may be different in real applications. I think garbage collection is a wonderful thing, and I do not advocate abandoning it for object pools except when really necessary for performance reasons (preferably after profiling your code first).  Comments and critiques are welcome.
Pages: [1]
 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

pw (37 views)
2014-07-24 01:59:36

Riven (38 views)
2014-07-23 21:16:32

Riven (26 views)
2014-07-23 21:07:15

Riven (28 views)
2014-07-23 20:56:16

ctomni231 (59 views)
2014-07-18 06:55:21

Zero Volt (50 views)
2014-07-17 23:47:54

danieldean (42 views)
2014-07-17 23:41:23

MustardPeter (44 views)
2014-07-16 23:30:00

Cero (60 views)
2014-07-16 00:42:17

Riven (57 views)
2014-07-14 18:02:53
HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54

HotSpot Options
by Roquen
2014-05-06 15:03:10

Escape Analysis
by Roquen
2014-04-29 22:16:43

Experimental Toys
by Roquen
2014-04-28 13:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!