During my navigation of the java3d sources I noticed the IMPRESSIVE (a nightmare) copyDataToSurfacexxx() function of D3dUtil.cpp used to copy from a direct3d surface to the offscreen buffer.
I suppose that the above function is the bottleneck of Offscreen rendering with Java3D/directx.
The OGL implementation uses readPixels() and therefore it's probably faster due to the ogl implementation.
Why don't reimplement it by using specific hardware-accelerated copy functions of DirectX ?
I found something interesting here: http://www.geocities.com/foetsch/d3d8screenshot/d3d8screenshot.htm