Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (541)
Games in Android Showcase (133)
games submitted by our members
Games in WIP (603)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  how can i tell if my JOGL code is being hardware accelerated?  (Read 9902 times)
0 Members and 1 Guest are viewing this topic.
Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Posted 2012-05-20 20:46:15 »

I have some JOGL code that is looking a bit slow... is there  a way to detect if the code is being executed by the GPU?

Alternately, I am doing lots of calls to glVertex3f() - could this be a reason for the slowdown?

Offline ra4king

JGO Kernel


Medals: 356
Projects: 3
Exp: 5 years


I'm the King!


« Reply #1 - Posted 2012-05-20 23:51:07 »

Immediate mode is "slow" compared to other ways you could render with OpenGL. However, it isn't by any means slow to process to a couple hundred sprites (as long you have a decent graphics card). How many times are you calling glVertex3f?

Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #2 - Posted 2012-05-21 10:36:20 »

Thanks.

In the display method I call glVertex about 350 times. I am drawing a grid of translucent quads in a loop and rotating the view. I'm getting 80 FPS with a Radeon 3450 card.

But I was getting jerky animation with just one quad. The performance smoothes out if I force the animator to use the screen refresh rate oddly enough. Perhaps something is choking on the graphics card that doesn't happen if i force a slower frame rate. Basically I know nuffing.

I didn't know about 'immediate mode'. I've just been hacking the sample on wikipedia as my starting point. Can you suggest a mode that would be smoother?

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline gouessej
« Reply #3 - Posted 2012-05-21 16:31:00 »

Hi

Use GLProfile.isHardwareRasterizer() (only in JOGL 2.0) to know whether your OpenGL implementation is really hardware accelerated.

ra4king is right, immediate mode is slow but the difference with retained mode (vertex arrays, VBOs, VAOs) is "small" when you draw only a very few things. The example I put into Wikipedia uses immediate mode as far as I know.

Are you sure v-sync is disabled? Maybe there is nothing wrong in your case.

You should look at the demos and the examples, some of them use VBOs:
https://github.com/sgothel/jogl-demos

Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #4 - Posted 2012-05-21 16:46:29 »

Thanks for the sample on Wikipedia - it has been very helpful. I have now found out what retained mode rendering is and am converting my code to use buffers. I'm currently getting a jerky 79FPS, so I will report back how much better the buffered implementation runs.

Shall I update wikipedia with a buffered version when I understand what I am doing? Although the the article clearly states that the rendering method is only for demo purposes, it would be better if it just demonstrated the right way.

I'm trying to see how scenes and effects influence the frame rate, so I have vsynch switched off intentionally, but when I switch it on I do get a smoother ride.

Offline gouessej
« Reply #5 - Posted 2012-05-21 18:37:48 »

My example in Wikipedia is not wrong, it is just a very simple one. If people need better examples, they should look at ours.

Actually, when v-sync is on, the frame rate should be close to 60. If you disable it, the frame rate won't be capped. If you get 79 frames per second with v-sync off, it is a bit worrying.

Offline theagentd

« JGO Bitwise Duke »


Medals: 366
Projects: 2
Exp: 8 years



« Reply #6 - Posted 2012-05-21 18:42:18 »

My example in Wikipedia is not wrong, it is just a very simple one. If people need better examples, they should look at ours.

Actually, when v-sync is on, the frame rate should be close to 60. If you disable it, the frame rate won't be capped. If you get 79 frames per second with v-sync off, it is a bit worrying.
Getting 79 FPS WITH v-sync is even more worrying in my opinion. ^_^'

Myomyomyo.
Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #7 - Posted 2012-05-21 21:01:24 »

Sorry, I didn't mean to imply the wikipedia example was wrong, just that it would be better and more informative if it showed a scalable way of doing things. The sample was very helpful and certainly got me started. But quick googling of JOGL samples almost always shows dozens of 'basic' samples showing immediate mode, whereas it looks like the retained mode versions would only be a few more lines and ultimately so much more useful.

When I set vsynch I do get 60FPS so it is behaving as one would expect. With one quad, rendering is jerky at 270FPS (without vsynch). With 225 quads the rate drops to 79FPS and still jerky. (my statement about 350 vertices previously was obviously a little inaccurate!)

Online Riven
« League of Dukes »

« JGO Overlord »


Medals: 848
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #8 - Posted 2012-05-22 16:54:19 »

I had my old CRT monitor set to 80Hz for ages. V-sync doesn't automatically mean 60Hz.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline Cero
« Reply #9 - Posted 2012-05-22 16:58:08 »

I had my old CRT monitor set to 80Hz for ages. V-sync doesn't automatically mean 60Hz.


Absolutely, I can't even look at CRTs at 60Hz, I get a instant headache.
I used at least 75Hz, 100 if possible.

Not hoping that many people still use CRTs though...

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline gouessej
« Reply #10 - Posted 2012-05-22 19:58:41 »

Sorry, I didn't mean to imply the wikipedia example was wrong, just that it would be better and more informative if it showed a scalable way of doing things. The sample was very helpful and certainly got me started. But quick googling of JOGL samples almost always shows dozens of 'basic' samples showing immediate mode, whereas it looks like the retained mode versions would only be a few more lines and ultimately so much more useful.
I see what you mean but it's true for OpenGL examples in general and that's why using the retained mode in all basic examples would be confusing, some people would conclude that JOGL is more complicated than plain OpenGL in C.

When I set vsynch I do get 60FPS so it is behaving as one would expect. With one quad, rendering is jerky at 270FPS (without vsynch). With 225 quads the rate drops to 79FPS and still jerky. (my statement about 350 vertices previously was obviously a little inaccurate!)
I succeeded in obtaining more than 500 FPS with hundreds of thousands quads with TUER alpha version on a similar graphics card, maybe something is wrong in your driver.

Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #11 - Posted 2012-05-22 20:18:24 »

I have changed to using retained mode and actually the performance seems to have slowed - the retained mode version is 20FPS slower. I will have a look at some more samples and see if I can get a new driver for my Radeon.

I am getting a GT430 card next week so that is another route to test this.

Anyway my display method does this:

        gl.glEnableClientState(GL2.GL_VERTEX_ARRAY);
        gl.glEnableClientState(GL2.GL_COLOR_ARRAY);

        gl.glVertexPointer(3, GL.GL_FLOAT, 0, vertexBuffer);
        gl.glColorPointer(4, GL.GL_FLOAT, 0, colorBuffer);

        gl.glDrawArrays(GL2.GL_QUADS, 0, 2500);
       
        gl.glDisableClientState(GL2.GL_VERTEX_ARRAY);
        gl.glDisableClientState(GL2.GL_COLOR_ARRAY);

Offline theagentd

« JGO Bitwise Duke »


Medals: 366
Projects: 2
Exp: 8 years



« Reply #12 - Posted 2012-05-23 00:12:18 »

I had my old CRT monitor set to 80Hz for ages. V-sync doesn't automatically mean 60Hz.
I know that, but how many CRT monitors refresh at 79Hz? And it should be a safe assumption that most people use a monitor that doesn't flicker your eyes to oblivion nowadays, especially if they have used computers enough to do programming on them.

I have changed to using retained mode and actually the performance seems to have slowed - the retained mode version is 20FPS slower. I will have a look at some more samples and see if I can get a new driver for my Radeon.

I am getting a GT430 card next week so that is another route to test this.

Anyway my display method does this:

        gl.glEnableClientState(GL2.GL_VERTEX_ARRAY);
        gl.glEnableClientState(GL2.GL_COLOR_ARRAY);

        gl.glVertexPointer(3, GL.GL_FLOAT, 0, vertexBuffer);
        gl.glColorPointer(4, GL.GL_FLOAT, 0, colorBuffer);

        gl.glDrawArrays(GL2.GL_QUADS, 0, 2500);
       
        gl.glDisableClientState(GL2.GL_VERTEX_ARRAY);
        gl.glDisableClientState(GL2.GL_COLOR_ARRAY);

glDrawArrays take how many vertices you have, are you sure you have 2500 of them (= 625 quads)?

Myomyomyo.
Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #13 - Posted 2012-05-23 10:24:28 »

"glDrawArrays take how many vertices you have, are you sure you have 2500 of them (= 625 quads)?"

You are right - I was overestimating the quad count by 30%. Adding a counter to my init method to actually verify the number of verteces I was generating was the obvious solution. This has brought the performance back to the level of the immediate mode version. I will scale up my quad count to see if i can start seeing a boost from retained mode, and I will also try replacing my quads with pairs of triangles - perhaps my ageing graphics card would work better with old-fashioned triangles. And next week I will have a second graphics card to test on.

BTW, my monitor is definitely set to 60hz.

Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #14 - Posted 2012-05-26 13:14:09 »

I have worked on my code some more and made the grid of quads scalable. I find my Radeon 3450 drops to 1FPS when I get to 6,860,000 verteces (a filled cube of 70 x 70 x70 units each made of five translucent quads). Java throws IllegalAccess errors if I try to draw a bigger cube. It is a bit surprising - I would have thought the 256MB graphics card could store larger arrays than that.

A few little questions - I allocate an array on the card for each scene which I then run. Obviously allocating arrays on the card will consume resources, how to I deallocate the arrays I no longer need?

What is faster - a display list or a vertex array?


Offline davedes
« Reply #15 - Posted 2012-05-26 14:30:19 »

6.8 million vertices... Shocked

Offline sproingie

JGO Kernel


Medals: 202



« Reply #16 - Posted 2012-05-26 14:42:56 »

A VA is faster, especially if you turn it into a VBO, which uses almost exactly the same API.   You deallocate a VBO with glDeleteBuffers (found in GL15), then freeing the client resources (which in Java you do by simply not referencing it anymore).
Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #17 - Posted 2012-05-26 17:39:03 »

OK, I'll look at VBOs, thanks for the pointer.

Offline theagentd

« JGO Bitwise Duke »


Medals: 366
Projects: 2
Exp: 8 years



« Reply #18 - Posted 2012-05-26 23:52:55 »

6.8 million vertices... Shocked
I have an Nvidia GTX 460M and I can render around 7 million vertices forming 3.5 million triangles (I'm rendering quads, 4 vertices and 2 triangles per quad) at 60 FPS. I'm using static VBOs+IBOs.

Myomyomyo.
Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #19 - Posted 2012-05-27 00:24:01 »

My card is very old (and is a cut-down laptop version too), and my code is very naive, but i do not get that performance. I am also deliberately adding features to make the card work harder - verteces are not reused, everything is translucent and so on. I converted my code from VAs to VBOs tonight and compared the results between the two methods.

7500     74       75
60000    35       36
202500   23       22
480000   16       17
937500   13       13
1620000  10       11
3840000  7        8
7500000  5        5
12960000 3        x

The first column is the number of verteces, the second is the VA rendering and the third column is the VBO rendering. The VBO performance might be slightly faster but it doesn't feel like a my programming efforts have been rewarded. My theory is that the processing on the card is so slow the benefit of putting the data on the card is masked by the vast processing time.

But if you get 60FPS on a GT460, 5 FPS on a Mobility 3450 actually sounds quite reasonable.

Offline theagentd

« JGO Bitwise Duke »


Medals: 366
Projects: 2
Exp: 8 years



« Reply #20 - Posted 2012-05-27 02:57:11 »

My card is very old (and is a cut-down laptop version too), and my code is very naive, but i do not get that performance. I am also deliberately adding features to make the card work harder - verteces are not reused, everything is translucent and so on. I converted my code from VAs to VBOs tonight and compared the results between the two methods.

7500     74       75
60000    35       36
202500   23       22
480000   16       17
937500   13       13
1620000  10       11
3840000  7        8
7500000  5        5
12960000 3        x

The first column is the number of verteces, the second is the VA rendering and the third column is the VBO rendering. The VBO performance might be slightly faster but it doesn't feel like a my programming efforts have been rewarded. My theory is that the processing on the card is so slow the benefit of putting the data on the card is masked by the vast processing time.

But if you get 60FPS on a GT460, 5 FPS on a Mobility 3450 actually sounds quite reasonable.
It's a GTX 460M, not a GTX 460. It's a laptop, the M should stand for mobile or something. For reference, a brand new GTX 680 has around 7 times the raw processing power of my card. Your comparison still kind of holds, since your card is so old.

Mobility Radeon 3450: 40 GFLOPS, 6.4GB/sec bandwidth
GTX 460M: 518.4 GFLOPS, 60GB/sec bandwidth.

Gee. I wonder why my card is faster. Go figure. xD

Myomyomyo.
Online Riven
« League of Dukes »

« JGO Overlord »


Medals: 848
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #21 - Posted 2012-05-27 02:59:10 »

a filled cube of 70 x 70 x70 units each made of five translucent quads
fillrate!

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline theagentd

« JGO Bitwise Duke »


Medals: 366
Projects: 2
Exp: 8 years



« Reply #22 - Posted 2012-05-27 03:01:58 »

Last time I checked the development of monitors we hadn't invented holographic 3D monitors yet so a "70x70x70 unit cube" says nothing about how many pixels they actually cover on the screen after transformation, but yeah, it might be a bottleneck in this case (though the raw performance numbers disagree).

Myomyomyo.
Online Riven
« League of Dukes »

« JGO Overlord »


Medals: 848
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #23 - Posted 2012-05-27 03:24:29 »

Last time I checked the development of monitors we hadn't invented holographic 3D monitors yet so a "70x70x70 unit cube" says nothing about how many pixels they actually cover on the screen after transformation, but yeah, it might be a bottleneck in this case (though the raw performance numbers disagree).
You (condescendingly...) prove my point. I never said it was a decisive factor, just that it was a factor. Comparing vertex throughput is meaningless if you don't share information about fragment throughput.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social
Offline theagentd

« JGO Bitwise Duke »


Medals: 366
Projects: 2
Exp: 8 years



« Reply #24 - Posted 2012-05-27 03:38:13 »

Last time I checked the development of monitors we hadn't invented holographic 3D monitors yet so a "70x70x70 unit cube" says nothing about how many pixels they actually cover on the screen after transformation, but yeah, it might be a bottleneck in this case (though the raw performance numbers disagree).
You (condescendingly...) prove my point. I never said it was a decisive factor, just that it was a factor. Comparing vertex throughput is meaningless if you don't share information about fragment throughput.
Ah, sorry... I always manage to sound a lot more mean than I intend to... ._.

Myomyomyo.
Offline ra4king

JGO Kernel


Medals: 356
Projects: 3
Exp: 5 years


I'm the King!


« Reply #25 - Posted 2012-05-27 04:17:35 »

Welcome to the Internet, where emotion doesn't exist.... Emo

Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #26 - Posted 2012-05-27 13:26:57 »

I think the fillrate is an issue, at least I intend it to be! I get dramatic differences in the frame rate by altering the density of the cube (lower density means the cube is bigger so more elements are offscreen and not rendered). It suggests that the test is not increasing cubically in its demand but linearly as I am just seeing a small section of the cube that gets deeper with each run, not wider.

My objective is to create a scene generator that scales itself until it finds the the maximum performance of a given card. It looks like throwing polygons at the card is not enough - a modern card will run out of memory before it runs out of processing capacity. Possibly making my cube superdense might solve the issue (so I get cubic not linear scaling), but maybe I need to add more effects- like specular lighting (that sounds expensive...?).

Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #27 - Posted 2012-05-27 20:06:59 »

I corrected my test to increase the density of the cube (so the visible verteces go up as the cube of the scale), and get more sensible results (old constant density compared to constant volume):

7500     74       69
60000   35       13
202500  23       5
480000  16       2


Offline theagentd

« JGO Bitwise Duke »


Medals: 366
Projects: 2
Exp: 8 years



« Reply #28 - Posted 2012-05-28 00:04:22 »

I corrected my test to increase the density of the cube (so the visible verteces go up as the cube of the scale), and get more sensible results (old constant density compared to constant volume):

7500     74       69
60000   35       13
202500  23       5
480000  16       2

If you're measuring fill-rate, the number of vertices has no meaning whatsoever. I can fill my screen with 20 fullscreen sized quads = 80 vertices and get 60 FPS. You should be measuring overdraw instead, which is basically how many times a single pixel is drawn to. It's difficult to get a number on it, but you can visualize it by simply enabling blending and additively blend in a single color, for example a weak red so pixels with lots of overdraw will have a stronger red color.

Another tip is back-face culling if you haven't already enabled it.

Myomyomyo.
Offline ags1

JGO Wizard


Medals: 78
Projects: 3
Exp: 5 years


Make code not war!


« Reply #29 - Posted 2012-05-28 17:20:40 »

Yes I am trying to maximize the number of floating point operations done per pixel. So I have color and alpha gradients on each triangle (I've stopped using quads)and I draw transparent objects from back to front to force overdraw each and every time. On level 2 of my test, each pixel should be redrawn about 30 times. On level three this is nearer 50 redraws, and so on. The redraws go up linearly, but drop in performance is closer to cubic now, so the number of vertexes seems to be relevant.

Pages: [1] 2
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Mr.CodeIt (24 views)
2014-12-23 03:34:11

rwatson462 (55 views)
2014-12-15 09:26:44

Mr.CodeIt (46 views)
2014-12-14 19:50:38

BurntPizza (91 views)
2014-12-09 22:41:13

BurntPizza (113 views)
2014-12-08 04:46:31

JscottyBieshaar (83 views)
2014-12-05 12:39:02

SHC (92 views)
2014-12-03 16:27:13

CopyableCougar4 (101 views)
2014-11-29 21:32:03

toopeicgaming1999 (160 views)
2014-11-26 15:22:04

toopeicgaming1999 (163 views)
2014-11-26 15:20:36
Resources for WIP games
by kpars
2014-12-18 10:26:14

Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!