I've created a little microbenchmark test the relative costs of a few different ways of getting access to a temporary object inside a short method, and I thought the results might be of interest to other people.
My test computes the cross product of two 3D vectors and returns the result in the first vector. This is a small but real-world-useful operation, and it requires some temporary space for the cross product. I coded up multiple versions that used different techniques to get the needed temporary space as follows:
 Local var. This method just used local double variables for its temporary space. This is the only version that does not use an object (a Vector3d) for its temporary space.
 New. Allocate a new Vector3d each time the method is called.
 ThreadLocal. Get a temporary object using a ThreadLocal object
 Field. Get temporary object stored in a private field.
 Field sync. Synchronized method which gets its temporary from a private field as in .
 TempStack. Get temporary object from a TempStack which is essentially a object pool where objects must be returned in the reverse-order that they were gotten. TempStack is obtained using a ThreadLocal
 TempStack param. Use a TempStack passed in an explicit extra parameter. Ugly in that it requires an extra parameter but can be relatively fast.
Method 4 is not thread-safe and methods 3, 4, and 5 cannot be used in recursive methods. Method 2 is the cleanest of the object based methods, but how does its performance compare to the other? Here are some timings from my 1.7GHz Pentium 4 machine:
|Test +++++++++++ ||JVM 1.4.2 -client ||JVM 1.4.2 -server |
|(1) Local var||0.076||0.015|
|(5) Field sync||0.205||0.216|
|(7) TempStack param||0.069||0.016|
Times are in microseconds per method call and you can get the complete source code here http://www.graphics.cornell.edu/~bjw/CrossProductTest.java
A few thing to note from the results
- As others have noted, under 1.4.2 -server is much faster for floating point code than -client
- The difference between (1) and (2) gives the approximate cost of allocating and garbage collecting the temporary Vector3d object. Allocation increases the cost of the cross product method by a factor between 2 (client) and 8 (server), so it is still a very significant cost in this case.
- The synchronized method is the most expensive in all cases, so it is still best to avoid synchronization when possible.
- We use a technique similar to (7) in performance-critical sections of our own code, and I would happily change the code to something cleaner like (2), if the cost was small. However the cleaner object-based techniques are still significantly slower.
- Although I expected (1) to be the fastest, under -client it turns out to be actually slower than (4) and (7) for reasons I don't understand.
- The field method (4) seems to be relatively slow under -server, for reasons I also don't understand.
Caveat: This is a microbenchmark and performance may be different in real applications. I think garbage collection is a wonderful thing, and I do not advocate abandoning it for object pools except when really necessary for performance reasons (preferably after profiling your code first). Comments and critiques are welcome.