Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (524)
Games in Android Showcase (127)
games submitted by our members
Games in WIP (593)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Proper GPU profiling with LWJGL  (Read 2242 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd

« JGO Bitwise Duke »


Medals: 361
Projects: 2
Exp: 8 years



« Posted 2014-05-11 00:46:38 »

Profiling is important when optimizing games. When using OpenGL through LWJGL, it's important to know that the CPU and GPU work asynchronously on very different tasks, so depending on which one of them that is your bottleneck you may get very confusing results. You will not get correct values if you simply measure the time it takes to run an OpenGL command since the command is just added to the OpenGL queue until the queue is full, which is very fast. If the GPU is unable to keep up with the commands submitted, it'll eventually block the application until the GPU has caught up and there's room in the queue. That means that OpenGL may block on random OpenGL commands or even on Display.update(), which means that you could get a huge CPU time spike for something as simple as glBindTexture() or a similar command.

The solution is to go directly to the source and ask the GPU how long time stuff took to execute for the GPU. This is tricky to do without affecting performance negatively. One idea is to call an OpenGL command, then use glFinish() to force the CPU to block while the GPU executes all queued up commands, but this causes both a CPU stall and then a GPU stall since the command queue will be completely empty once glFinish() returns leading to massively reduced performance. A better solution is to use OpenGL asynchronous queries to query the time at which the GPU reaches certain points in the command queue. We then delay reading back the result until the GPU has finished the frame we're profiling.

When profiling on the CPU we can do the following:
1  
2  
3  
4  
5  
long startTime = System.nanoTime();

//run code to profile

long timeTaken = System.nanoTime() - startTime; //Time taken in nano seconds


We can do the same thing with GPU queries, but the asynchronous nature of queries complicates it all a bit:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
int startQuery = glGenQueries();
glQueryCounter(startQuery , GL_TIMESTAMP);

//Run OpenGL commands to profile

int endQuery = glGenQueries();
glQueryCounter(endQuery, GL_TIMESTAMP);

//Make sure enough time passes so the above OpenGL commands have been processed by the GPU

if(glGetQueryObjectui(endQuery, GL_QUERY_RESULT_AVAILABLE) == GL_TRUE){ //The result is available
   
    long startTime = glGetQueryObjectui64(startQuery, GL_QUERY_RESULT);
    long endTime = glGetQueryObjectui64(endQuery, GL_QUERY_RESULT);
    long timeTaken = endTime - startTime;
}


I have developed two small static classes that can take care of GPU profiling. The use of these two classes is as follows (extra brackets added for clarity):

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
//No special initialization required

public void render(){
   
    GPUProfiler.startFrame();
   
    GPUProfiler.start("Render game");
    {
        GPUProfiler.start("Render tiles");
        renderTiles();
        GPUProfiler.end();

        GPUProfiler.start("Render objects");
        renderObjects();
        GPUProfiler.end();
    }
    GPUProfiler.end();

    GPUProfiler.start("Render UI");
    renderUI();
    GPUProfiler.end();

    GPUProfiler.endFrame();

    /*
    The GPUProfiler keeps track of previous frames until
    they've been rendered by the GPU and their results are
    ready to be read back. The following line dumps the
    timings of all completed frames to System.out.
    */


    GPUTaskProfile tp;
    while((tp = GPUProfiler.getFrameResults()) != null){
       
        tp.dump(); //Dumps the frame to System.out.
        //or use the functions of GPUTaskProfile to extract information about the frame:
        //getName(), getStartTime(), getEndTime(), getTimeTaken(), getChildren()
        //Then you can draw the result as fancy graphs or something.

        GPUProfiler.recycle(tp); //Recycles GPUTaskProfile instances and their OpenGL query objects.
    }
}


Here's some example output of dump() when using my GPU profiler with the engine I'm writing. Note that the CPU time taken to execute all my OpenGL functions is around 1.5ms, with Display.update() blocking for around 25 ms after each frame making it impossible to tell what's actually keeping the GPU so busy.
Quote
Frame 510 : 29.112ms
    WSWGraphics.render() : 29.105ms
        Render scene : 5.953ms
            Clear G-buffer : 0.061ms
            Render tiles : 2.81ms
            Render models : 1.623ms
            Render ocean : 0.07ms
            Render decals : 0.048ms
            Render glowing models : 0.936ms
        Linearize depth : 0.085ms
        Scene lighting : 17.37ms
            Unshadowed point lights : 0.0ms
            Unshadowed cone lights : 0.0ms
            Unshadowed directional lights : 0.0ms
            Shadowed point lights : 17.345ms
                Shadow map pass 0 : 2.038ms
                    Shadow map initializing : 0.044ms
                    Shadow map rendering : 0.88ms
                    Shadow map prefiltering : 0.0ms
                    Light rendering : 1.11ms
                Shadow map pass 1 : 5.408ms
                    Shadow map initializing : 0.037ms
                    Shadow map rendering : 4.701ms
                    Shadow map prefiltering : 0.0ms
                    Light rendering : 0.666ms
                Shadow map pass 2 : 1.834ms
                    Shadow map initializing : 0.036ms
                    Shadow map rendering : 1.152ms
                    Shadow map prefiltering : 0.0ms
                    Light rendering : 0.641ms
                Shadow map pass 3 : 1.303ms
                    Shadow map initializing : 0.034ms
                    Shadow map rendering : 1.053ms
                    Shadow map prefiltering : 0.0ms
                    Light rendering : 0.213ms
                Shadow map pass 4 : 4.298ms
                    Shadow map initializing : 0.036ms
                    Shadow map rendering : 3.143ms
                    Shadow map prefiltering : 0.0ms
                    Light rendering : 1.116ms
                Shadow map pass 5 : 2.458ms
                    Shadow map initializing : 0.034ms
                    Shadow map rendering : 1.757ms
                    Shadow map prefiltering : 0.0ms
                    Light rendering : 0.662ms
            Shadowed cone lights : 0.0ms
            Shadowed directional lights : 0.0ms
        Render transparency : 0.442ms
            Render effects : 0.267ms
                StreakRenderer : 0.011ms
                ParticleRenderer : 0.011ms
                ShockwaveRenderer : 0.0ms
            Resolve : 0.173ms
        Render distortions : 0.219ms
            Render effects : 0.012ms
            Blur : 0.206ms
        Postprocessing : 2.988ms
            MergeProcessor : 0.438ms
            MotionBlurProcessor : 0.333ms
            BloomRenderer : 0.84ms
            ToneMapper : 0.416ms
            FXAAProcessor : 0.52ms
            TemporalSSAAProcessor : 0.436ms
        Screen copy : 0.141ms
    Render UI : 0.005ms
        Render UI elements : 0.003ms
        Distort UI : 0.0ms
Seems like I have too many shadow mapped lights. =P

Here are the two classes:
GPUProfiler: http://www.java-gaming.org/?action=pastebin&id=930
GPUTaskProfile: http://www.java-gaming.org/?action=pastebin&id=931

Myomyomyo.
Offline ra4king

JGO Kernel


Medals: 355
Projects: 3
Exp: 5 years


I'm the King!


« Reply #1 - Posted 2014-06-28 21:56:17 »

I'd like to note that this feature is for OpenGL 3.3+ only.

Very invaluable feature though, thank you for bringing it to my attention and thank you for your utility classes!

EDIT: You used your engine.etc.pool.Pool class for this, I'm assuming this is a normal object pool with a set number of initial instances?
Secondly, what's the difference between java.util.ArrayList and your FastArrayList?

EDIT2: More researched has led me to the GL_TIME_ELAPSED query. Why not use that instead of getting the time before and after?

Offline theagentd

« JGO Bitwise Duke »


Medals: 361
Projects: 2
Exp: 8 years



« Reply #2 - Posted 2014-06-29 01:09:42 »

EDIT: You used your engine.etc.pool.Pool class for this, I'm assuming this is a normal object pool with a set number of initial instances?
Secondly, what's the difference between java.util.ArrayList and your FastArrayList?
Shiiit, I had no idea I had those dependencies.

FastArrayList is functionally identical to ArrayList, but has a few performance pros. The most important one is that clear() does not null out each element (it works similarly to ByteBuffer.clear() in that it just sets the size to 0). It's definitely not required in any way, and it should be possible to simply change all FastArrayLists to normal ArrayLists as the method names are identical.

Pool and Poolable simply provide a normal object pool to avoid generating too much garbage, and should be easy to implement. Again, not necessary; it's just a (possibly premature) optimization.

EDIT2: More researched has led me to the GL_TIME_ELAPSED query. Why not use that instead of getting the time before and after?
GL_TIME_ELAPSED has a big problem. You cannot run nested GL_TIME_ELAPSED queries.
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
glBeginQuery(GL_TIME_ELAPSED, query);

...

    glBeginQuery(GL_TIME_ELAPSED, nestedQuery); //ILLEGAL!
    ...
    glEndQuery(GL_TIME_ELAPSED);

...

glEndQuery(GL_TIME_ELAPSED);


Hence, emulating GL_TIME_ELAPSED with GL_TIMESTAMP is much more flexible.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline PandaMoniumHUN

JGO Coder


Medals: 32
Exp: 3 years


White-bearded OGL wizard


« Reply #3 - Posted 2014-06-29 09:23:58 »

This is awesome, thank you very much for your work theagentd! Wink
One question though, what about the license? Can I integrate these into my engine if I give credit?

My Blog | Jumpbutton Studio - INOP Programmer
Can't stress enough: Don't start game development until you haven't got the basics of programming down! Pointing
Offline theagentd

« JGO Bitwise Duke »


Medals: 361
Projects: 2
Exp: 8 years



« Reply #4 - Posted 2014-06-29 13:12:07 »

This is awesome, thank you very much for your work theagentd! Wink
One question though, what about the license? Can I integrate these into my engine if I give credit?
It's here to be used, knock yourself out. It's not much, but if you feel like giving me credit then I'm not gonna say no to ending up on a credits screen. =P

Myomyomyo.
Online BurntPizza

« JGO Bitwise Duke »


Medals: 271
Exp: 5 years



« Reply #5 - Posted 2014-06-29 13:26:59 »

Maybe the Don't Ask Me About It LicenseTongue
Offline ra4king

JGO Kernel


Medals: 355
Projects: 3
Exp: 5 years


I'm the King!


« Reply #6 - Posted 2014-06-29 20:06:01 »

I really like the WTFPL Smiley

Offline roland
« Reply #7 - Posted 2014-11-09 21:16:51 »

Sorry for the late reply. This is a great tool, thanks theagentd!
Is there any way to modify this for a graphics card that only supports OpenGL 2.1?
Offline basil_

« JGO Bitwise Duke »


Medals: 70
Exp: 12 years



« Reply #8 - Posted 2014-11-09 22:26:22 »

nice! i too did not see this post. Smiley

glGetQueryObjecti()
is available since opengl 1.5 but will return integer-values while the new tokens
GL_TIME_ELAPSED
and
GL_TIMESTAMP
are ment to return 64bit values. they work only with
glGetQueryObjecti64()
which is gl33+, sorry.

btw, @agentd : did you consider double/tripple-buffering ? i figured, query-objects can stall when used too early. could be a nice addition to your tool.

it's also nice to still track the cpu-times together with the timer-query, which gives us a better feel for cpu/gpu work balance.

here's a question : did you get the timer-query to work with multithreading ? i get all kind of funky results, even more when using doublebuffering. persecutioncomplex
Offline theagentd

« JGO Bitwise Duke »


Medals: 361
Projects: 2
Exp: 8 years



« Reply #9 - Posted 2014-11-10 00:22:55 »

Sorry for the late reply. This is a great tool, thanks theagentd!
Is there any way to modify this for a graphics card that only supports OpenGL 2.1?
Not really. You could check for support for the ARB_timer_query extension specifically, which is supported by Nvidia's OGL 2.1 GPUs, but not AMD's or *shiver* Intel's older GPUs. It's worth noting that according to the Steam Hardware Survey, the number of people having GPUs that don't support OGL3 is 2.22%, and a large chunk of those either don't have powerful enough hardware to run your game and/or have Nvidia's OGL 2.1 GPUs.

btw, @agentd : did you consider double/tripple-buffering ? i figured, query-objects can stall when used too early. could be a nice addition to your tool.

it's also nice to still track the cpu-times together with the timer-query, which gives us a better feel for cpu/gpu work balance.
The profiler already avoids stalls completely by checking if the result is already available, so it never has to freeze the CPU thread. It buffers query objects for as long as it needs before reading them back, which is generally 1 or 2 frames.

It makes no sense to combine this with a CPU time profiler, as that information is 1) useless if we're GPU limited (random OGL commands will stall giving inaccurate results) and 2) we're usually not interested in profiling both CPU and GPU performance at the same granularity. It makes no sense to do GPU profiling of your CPU-side game logic. I do have a separate CPU profiler which does the same but for CPU time, minus the delayed readback, and my threading library has per-thread profiling of tasks.

here's a question : did you get the timer-query to work with multithreading ? i get all kind of funky results, even more when using doublebuffering. persecutioncomplex
I have no idea why this is relevant. You most likely only make your OpenGL calls from a single thread, so why worry about multithreading? You insert these time stamp queries into the command queue of OpenGL so that you can get at what time THE GPU reaches that part of the command queue, so even if you make your queries from multiple OpenGL contexts/threads, you'd still be inserting all these queries into the same command queue. I don't get what your problem is. Of course, there's no guarantee of the ordering of OpenGL calls made from different threads unless you force it using glFinish() or so.

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline basil_

« JGO Bitwise Duke »


Medals: 70
Exp: 12 years



« Reply #10 - Posted 2014-11-10 00:48:02 »

i'm not talking about sending queries form different threads to a shared que, that's pretty much pointless and asking for trouble. a shared gl-context has it's own command que. same point profiling that as any other.

ofc, profiling the cpu at the same time is only interesting when you do cpu heavy stuff side by side, or chase performance hogs.
Offline theagentd

« JGO Bitwise Duke »


Medals: 361
Projects: 2
Exp: 8 years



« Reply #11 - Posted 2014-11-10 00:51:49 »

You can have thread-local profiler instances in that case.

Myomyomyo.
Pages: [1]
  ignore  |  Print  
 
 

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

toopeicgaming1999 (21 views)
2014-11-26 15:22:04

toopeicgaming1999 (19 views)
2014-11-26 15:20:36

toopeicgaming1999 (7 views)
2014-11-26 15:20:08

SHC (24 views)
2014-11-25 12:00:59

SHC (24 views)
2014-11-25 11:53:45

Norakomi (24 views)
2014-11-25 11:26:43

Gibbo3771 (22 views)
2014-11-24 19:59:16

trollwarrior1 (36 views)
2014-11-22 12:13:56

xFryIx (74 views)
2014-11-13 12:34:49

digdugdiggy (52 views)
2014-11-12 21:11:50
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!