This is a tutorial covering how to get SLI (and presumably Crossfire) working with your OpenGL based game!
SLI and Crossfire are technologies of Nvidia and AMD respectively that enable users to cram in multiple GPUs into their computers and have them work in parallel to give (hopefully) almost linear scaling with the number of GPUs. In practice there are some problems, for example driver compatibility problems and micro-stuttering, but there sure are some heavy FPS wins possible. For good performance, the single one thing you have to avoid is data-dependency between frames. Since I have an Nvidia Geforce GTX 295, I can't tell you exactly how to do this with a Crossfire setup, but it should be doable to AMD drivers as well.
Enabling SLI for your game is easy as pie with Nvidia cards and can be done without any third-party programs like Nvidia Inspector.
1. Double-click the Nvidia icon on your tray bar to fire up the Nvidia Control Panel.
2. In the left-most panel, go to "Manage 3D settings" (under "3D settings").
Now, do either 3a or 3b:
3a. (the lazy way) In the Global Settings tab, find "SLI performance mode" and set it to "Force alternative frame rendering 2". This will enable SLI for all programs using DirectX or OpenGL, so remember to disable it after you're done!
3b. (the right way) Go to Program Settings and create a profile for java.exe and javaw.exe. Then find "SLI performance mode" and set it to "Force alternative frame rendering 2". This will only enable it for Java programs, but it might still screw up Java2D OpenGL acceleration if you're unlucky...
4. Run your Java program and enjoy much higher frame rates!
For most of you, that'll all you need to do. However, if you're using FBOs for anything (shadow maps, offscreen rendering, whatever) you'll notice that instead of a 90% performance boost, you got a 75% performance penalty!!! Surely there's something fishy going on here.
The problem is that since there's no hand-made SLI profile made for your game by Nvidia, the driver makes assumptions. In our case, bad assumptions. FBOs effectively cause your GPU to generate a texture on its own. Alternative frame rendering (AFR) makes each GPU work on every other frame. Therefore when we finish a frame, all render target textures are synchronized between all GPUs to ensure that they all have the generated textures, and there goes our parallelism. Not only can the GPUs not work in parallel at all, the copying overhead completely kills our frame-rate.
Our savior is an OpenGL extension called ARBInvalidateSubdata! This extension allows you to invalidate the contents of a texture, framebuffer or texture! At the end of the frame, just before we complete a frame and do a buffer swap, we invalidate all textures that we've rendered to that aren't needed in the next frame! That tells the driver that those textures won't need to be synchronized between GPUs, so we get perfect scaling!