You should take a look at the Java3D internals and the way it works. Both statements are entirely accurate and not contradictory. I'll try a simple example to show you why.
Java3D works with a message queue internally. Say you do a setCoordinate() call. That takes your geometry and generates an object to be placed on the queue as well as making a local copy. This may require a copy and some simple checks (eg enough coords for vertex count etc). The code then exits back to runtime code. This part is relatively fast and efficient. There is some synchronisation here to make sure that two objects don't clobber each other as they're being placed on the queue. Nothing that a simple synchronized keyword on the add method and a synchronised block around the queue itself (in case the render thread is pulling objects off the internal queue as well).
As some point the render thread kicks in. It loops for items on the message queue and processes them one at a time until there is nothing left. There's a lot of processing that goes on here. For example, for almost all the geometry it will pull the arrays, generate interleaved versions and a lot of other stuff behind the scenes. After all these are done, then the rest of the cull/draw render cycle takes place. However, due to the multi-threaded nature of that portion, it's quite possible (though not definite as it depends on the number of CPUs and/or setting of some system properties) that there are more synchronised blocks to go through while it is doing culling, state/transparency sorting etc.
In addition to this very basic management, there's another extra layer of complexity dealing with the behaviour system. At the start of the frame it has to do, potentially a lot of work to find out what behaviours are to be triggered. Potentially that means having to clear the message queue to pick up any transform updates above the behaviour locations and/or active viewplatform. I haven't checked into the picking code yet, but I suspect it too may have to flush the message queue before each pick.
That's why the outside code can be quite fast, yet it causes a much slower execution speed. The reason that Xith3D and AV3D are much faster than J3D is that there is no queue in the middle. Both of these scene graphs do not permit updates at any time other than at a very specific point in the app-cull-draw cycle. Java3D does not have this restriction. That means it has to be very careful about when those updates are placed on the queue and when they're pulled of.
As for the code, simple - head to http://www.xj3d.org
and download the current version of Xj3D. Install. Open any VRML or X3D file using the included Xj3DBrowser. If you are using a Win32 box, two icons are installed - on for Java3D, one for OpenGL. Take a look at the contents and you'll see the only difference is the commandline switch. There's a FPS counter on the lower left corner. That will give you the simplest check for the basic numbers. Go as small or as large a file as you like. We run everything from a simple rotating box (thus only transform update, no geometry), right up to full cityscapes that about 120 megs of raw geometry/animation data and over a gig of textures.