Pretend that you have three depots. One is called Java, the other, GL, and the last depot, Video Card.
If you imagine every function call being a single lorry which can carry an unlimited amount of data travelling from one depot to the next you can get a pretty good idea of what the efficiency will be like. Especially if you think that there's a 20 mile drive between the depots and only one lorry fits at a time.
IOW, if you want to construct a rotation matrix, consider this:
You first have to create it on the Java side, then you have to send it to GL; then you have to send it to the video card.
Alternatively you can get GL to create it and send it to the card directly. Saving you a 20 mile lorry drive.
Cas

(under influence of champagne again)