None of that is actually "I/O", in the context of Threads. In fact threading will normally slow these operations down considerably from cache pollution. glGet might conceivably block but it only needs to be done once per frame typically and it's very fast really. Not anything like enough to give you a 25% framerate boost.
I agree those wouldn't be near enough to explain 25% boost I only thought they may be part of it along with the buffer swap. I cant see how transfering megs of data around per second can't incurr some waste in CPU time. Unless there is some magic in Jogl and NIO that I don't know about.

That's dead right - in fact this is probably where you're losing all your efficiency. The buffer swap should be the very last thing you do before you start doing "logic" again. During "logic" there should be no need to call any GL API commands. Not that all GL commands actually are blocked - clientside GL commands can be executed without any trouble.
If vsync is enabled this command may block though and that's for certain somewhere where you might waste a considerable number of cycles doing nothing, unless the vsync has been implemented correctly.
Cas

The non-thread version does the bufferswap, then makes the call to the dispatcher for calculations, just as it should to avoid the potential swap blocking.
I just did another experiment that proves there is definite waits going on within the Jogl and/or nio calls.
I removed all calculations and reduced the game to nothing but scene rendering. Just geometry pumping.
Non-Threaded:
FPS 187
Threaded:
FPS 187
This means there has to be lost CPU cycles that can only be regained by using another thread. In both cases I removed the call to begin dispatching for calculations. If there were no loss, then the FPS rates would be equal when the calculations were done as well, but they aren't.