Sorry for the late reply. This is a great tool, thanks theagentd!
Is there any way to modify this for a graphics card that only supports OpenGL 2.1?
Not really. You could check for support for the ARB_timer_query extension specifically, which is supported by Nvidia's OGL 2.1 GPUs, but not AMD's or *shiver* Intel's older GPUs. It's worth noting that according to the Steam Hardware Survey, the number of people having GPUs that don't support OGL3 is 2.22%, and a large chunk of those either don't have powerful enough hardware to run your game and/or have Nvidia's OGL 2.1 GPUs.
btw, @agentd : did you consider double/tripple-buffering ? i figured, query-objects can stall when used too early. could be a nice addition to your tool.
it's also nice to still track the cpu-times together with the timer-query, which gives us a better feel for cpu/gpu work balance.
The profiler already avoids stalls completely by checking if the result is already available, so it never has to freeze the CPU thread. It buffers query objects for as long as it needs before reading them back, which is generally 1 or 2 frames.
It makes no sense to combine this with a CPU time profiler, as that information is 1) useless if we're GPU limited (random OGL commands will stall giving inaccurate results) and 2) we're usually not interested in profiling both CPU and GPU performance at the same granularity. It makes no sense to do GPU profiling of your CPU-side game logic. I do have a separate CPU profiler which does the same but for CPU time, minus the delayed readback, and my threading library has per-thread profiling of tasks.
here's a question : did you get the timer-query to work with multithreading ? i get all kind of funky results, even more when using doublebuffering.
I have no idea why this is relevant. You most likely only make your OpenGL calls from a single thread, so why worry about multithreading? You insert these time stamp queries into the command queue of OpenGL so that you can get at what time THE GPU reaches that part of the command queue, so even if you make your queries from multiple OpenGL contexts/threads, you'd still be inserting all these queries into the same command queue. I don't get what your problem is. Of course, there's no guarantee of the ordering of OpenGL calls made from different threads unless you force it using glFinish() or so.