bk0
JGO n00b  Posts: 10
|
 |
«
on:
2012-01-28 22:23:47 » |
|
Hi everybody. I'm a bit stumped. It seems as if my program would resend the contents of my VBO each frame.Specs: AsRock M3N78D, nForce 720D, AM3 AMD Phenom II X6 1100T Black Edition, 3.3GHz Radeon HD 6950 2GB, PCI-E x16 2.0, DP Software: Updated drivers (CPU/Chipset as well as GPU) JOGL 2.0 RC5 Eclipse (shouldn't matter, though) Program setup: FPSAnimator is supposed to call display() as often as possible (up to 1000 FPS). Game uses a Octree, and Occlusion Culling as well as Frustum Culling are implemented. (On said Octree) All to-be-rendered triangles are stored in a VBO What I'd like to do: Render as many Triangles as possible. (just for starters) Right now I render Cubes. Benchmarking: Speed is directly influenced by the size of the VBO. 3 072 MB: I allocate using integer (32 Bit), so I get an overflow (=crash). 1 536 MB: 1 FPS (I get graphical errors, such as lines across the entire screen. ) 768 MB: 2 FPS (From here on all looks nice) 384 MB: 4 FPS 192 MB: 8 FPS 96 MB: 16 FPS 48 MB: 32 FPS 24 MB: 60 FPS12 MB: 118 FPS 6 MB: 200 FPS 3 MB: 350 FPS 1.5 MB: 510 FPS 750KB: 810 FPS The VBO is constructed ONCE and then no longer updated (I disabled updating for now). Clearly, this stuff should run faster. Could you have a look and maybe see a bug I've overlooked? Maybe something simple? I mean, 24 MB = 360k Triangles surely isn't anywhere the limit my Hardware has  My rendering Code looks like this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| public void display(GLAutoDrawable drawable) {
Now = System.nanoTime(); MSSinceLastFrame = (double) (Now - LastCall) / 1000000; LastCall = Now; System.out.println("MS since last call: " + MSSinceLastFrame + " FPS: " + 1000/MSSinceLastFrame);
do_look(); do_move(MSSinceLastFrame); GL2 currGL = drawable.getGL().getGL2(); currGL.glClear(GL.GL_COLOR_BUFFER_BIT | GL.GL_DEPTH_BUFFER_BIT); currGL.glLoadIdentity(); glu.gluLookAt(posx, posy, posz, lookatx, lookaty, lookatz, 0, 1, 0); currGL.glBindBuffer(GL.GL_ARRAY_BUFFER, vbo_handle); currGL.glEnableClientState(GL2.GL_VERTEX_ARRAY); currGL.glEnableClientState(GL2.GL_TEXTURE_COORD_ARRAY); currGL.glEnable(GL.GL_TEXTURE_2D); currGL.glBindTexture(GL.GL_TEXTURE_2D, texture.getTextureObject(currGL));
currGL.glVertexPointer(3, GL.GL_FLOAT, 5 * 4, 0); currGL.glTexCoordPointer(2, GL.GL_FLOAT, 5 * 4, 3 * 4); currGL.glDrawArrays(GL.GL_TRIANGLES, 0, 4 * Buffer.capacity()); currGL.glDisableClientState(GL2.GL_VERTEX_ARRAY); currGL.glDisableClientState(GL2.GL_TEXTURE_COORD_ARRAY);
currGL.glBindBuffer(GL.GL_ARRAY_BUFFER, 0); currGL.glBindTexture(GL.GL_TEXTURE_2D, 0); double tmp = System.nanoTime(); double mspassedwhilerendering = (tmp - LastCall) / 1000000; System.out.println("MS Passed on CPU: " + mspassedwhilerendering + " FPS (CPU): " + 1000/mspassedwhilerendering);
currGL.glFinish(); drawable.swapBuffers(); |
This is how I initialize things: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
| @Override public void init(GLAutoDrawable drawable) { drawable.setAutoSwapBufferMode(false); GL2 gl = drawable.getGL().getGL2(); glu = new GLU(); gl.glClearColor(0.0f, 0.0f, 0.0f, 0.0f); gl.glClearDepth(1.0f); gl.glShadeModel(GL2.GL_SMOOTH); gl.glEnable(GL.GL_DEPTH_TEST); gl.glDepthFunc(GL.GL_LEQUAL); gl.glEnable(GL.GL_TEXTURE_2D);
glu.gluLookAt(posx, posy, posz, lookatx, lookaty, lookatz, 0, 1, 0); glu.gluPerspective(45.0, SCREEN_WIDTH / SCREEN_HEIGHT, 1, 100); if (vbo_handle <= 0) { int[] tmp = new int[1]; gl.glGenBuffers(1, tmp, 0); vbo_handle = tmp[0]; } Buffer = Buffers.newDirectFloatBuffer(90*65536); int numBytes = Buffer.capacity() * 4; gl.glBindBuffer(GL.GL_ARRAY_BUFFER, vbo_handle); gl.glBufferData(GL.GL_ARRAY_BUFFER, numBytes, null, GL.GL_DYNAMIC_DRAW); gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0); } |
And for each Leaf in my Octree, I do this (just to be clear: This is done exactly once per leaf, and never repeated) 1 2 3 4
| int numBytes = 30 * 3 * cubes * 4; WWV.gl.glBindBuffer(GL.GL_ARRAY_BUFFER, WWV.vbo_handle); WWV.gl.glBufferSubData(GL.GL_ARRAY_BUFFER, myPos * 4, numBytes, WWV.Buffer); WWV.gl.glBindBuffer(GL.GL_ARRAY_BUFFER, 0); |
Do you see anything wrong? Or is there an example which I could look at? All tips/suggestions welcome  , and thanks for the help.
|
|
|
|
|
theagentd
JGO Wizard     Posts: 1392 Medals: 88
|
 |
«
Reply #1 on:
2012-01-28 22:52:50 » |
|
Your card "only" has 2GB of memory. You're probably not gonna get realtime performance with over 1GB of vertices. >_>
The whole point of VBOs is that the data should be stored in VRAM, unless GL_STREAM_DRAW is passed when the memory is allocated/uploaded with glBufferData in which case the driver is allowed to not store the data in VRAM. I experimented with this value on my NVidia GPU and there was no difference at all, regardless of what I chose, so it seems like at least NVidia ignores this value.
I'm pretty sure your problem is not a vertex bottleneck but a fragment bottleneck. Try to make your objects cover a smaller area (scale them smaller or something), disable anti-aliasing and disable lighting e.t.c to reduce the per-pixel cost. My laptop's GPU can draw around 1.4 million triangles per frame at 60 FPS without any texturing or anything. Considering you have a desktop computer and a desktop GPU, I'd estimate it to be 3-4x as fast, so around 5 million triangles per frame at 60 FPS would make sense. This again points at a fragment bottleneck.
|
There is no god.
|
|
|
bk0
JGO n00b  Posts: 10
|
 |
«
Reply #2 on:
2012-01-28 23:21:28 » |
|
Of course I wasn't expecting real-time performance with 1.5GB VRAM occupied  Okay, so we agree that the data is/should be residing in VRAM, that's good. I used GPU-Z 0.5.8 to look at my VRAM usage: The correct amount is used. But this doesn't show whether it resides there and there is a fragmentation bottleneck, or whether it is retransmitted and there is a bug in my implementation... Now, I don't really know anything about a "fragment bottleneck". Could you give me some link? Currently I use textured Triangles, without any lightning or transparency or normals etc. With deactivated textures the 24 MB scenario improves from 60 to 63 FPS. Not really what I hoped^^ No Triangles overlap (they do touch, though). But there are many triangles hidden behind other triangles, could this cause problems? I always imagined not rendering triangles would be faster than rendering them^^ Maybe you could also give me some tips on how to implement your suggestions? -cover a smaller area: You mean a smaller area on the screen? The cubes are the same measurements as in minecraft, and I run minecraft without stutters  -I never enabled VSync or anything else. I just did: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| public Main() { super(""); GLProfile glp = GLProfile.get(GLProfile.GL2); caps = new GLCapabilities(glp); canvas = new GLCanvas(caps);
canvas.addGLEventListener(render_VBO); canvas.addKeyListener(render_VBO); canvas.addMouseListener(render_VBO); canvas.addMouseMotionListener(render_VBO);
getContentPane().add(canvas);
setResizable(false); if (!isDisplayable()) setUndecorated(true); setExtendedState(JFrame.MAXIMIZED_BOTH); Toolkit tk = Toolkit.getDefaultToolkit(); Image image = tk.createImage(""); Point point = new Point(0, 0); Cursor cursor = tk.createCustomCursor(image, point, ""); setCursor(cursor);
setVisible(true); canvas.requestFocus(); canvas.requestFocusInWindow();
FPSAnimator animator = new FPSAnimator(canvas, 1000); animator.add(canvas); animator.start(); } |
Could it be that Vsync or anything else is active by default and needs to be deactivated? If so, is there a list of what starts as active? Thanks again for the tips, when googling for fragmentation bottleneck, I found this: http://developer.download.nvidia.com/assets/gamedev/docs/Graphics_Performance_Optimization.pdfI'll read through it and let you know. Greetings
|
|
|
|
|
Games published by our own members! Go get 'em!
|
|
lhkbob
JGO Neuromancer     Posts: 1174 Medals: 35
|
 |
«
Reply #3 on:
2012-01-28 23:29:33 » |
|
The count argument to glDrawArrays() represents the number of elements to be combined into primitives, not the number of bytes.
I think since you've packed 2-tuple tex coords and 3-tuple vertices, the number of elements to render is Buffer.capacity() / 5.
If this is indeed an error, it is likely the cause of your underwhelming performance. I have often experienced undefined and non-deterministic performance when there are errors like this. Some examples include weird slow downs, missing vertices, segfaults, etc. It all depends on where the memory is, and what the GPU tries to do to prevent invalid accesses, and how it recovers.
|
|
|
|
lhkbob
JGO Neuromancer     Posts: 1174 Medals: 35
|
 |
«
Reply #4 on:
2012-01-28 23:32:26 » |
|
Now, I don't really know anything about a "fragment bottleneck". Could you give me some link? Currently I use textured Triangles, without any lightning or transparency or normals etc. With deactivated textures the 24 MB scenario improves from 60 to 63 FPS. Not really what I hoped^^ No Triangles overlap (they do touch, though). But there are many triangles hidden behind other triangles, could this cause problems? I always imagined not rendering triangles would be faster than rendering them^^
Years ago I remember running into a problem where pushing 1 million over-lapphing triangles (i.e. they were hidden behind other triangles), was much slower than when the triangles were more evenly distributed around the screen. I think the GPUs have a fast path for performing quick depth checks when there aren't that many on the same pixel, or if the depths are far apart. If all of your triangles are packed together, it might have to go into slower, more accurate floating point checks.
|
|
|
|
bk0
JGO n00b  Posts: 10
|
 |
«
Reply #5 on:
2012-01-29 01:17:34 » |
|
@lhkbob: YES. That was it. (with the /5 instead of *4) Silly me, not reading the doc! And I'll make sure to use a special algorithm to ensure no triangles are hidden. Already have something in mind^^ Thanks again guys, I think that should allow me to continue forward with my game 
|
|
|
|
|
lhkbob
JGO Neuromancer     Posts: 1174 Medals: 35
|
 |
«
Reply #6 on:
2012-01-29 01:55:07 » |
|
Well, just make sure that whatever special algorithm you're using isn't more expensive than relying on the GPU. In my story about hidden triangles causing slow downs, it was 1 million triangles packed into a 200x200 area in a larger window.
That situation is pretty contrived and probably wouldn't show up in a real game.
|
|
|
|
theagentd
JGO Wizard     Posts: 1392 Medals: 88
|
 |
«
Reply #7 on:
2012-01-29 05:10:34 » |
|
Okay, approximating the cost of drawing fragments is pretty easy: - "Fragments" that are outside the screen cost nothing since the triangle is culled to the screen edges. - Triangles that do not cover any pixels (or MSAA sample positions) do not cost anything. - Triangles that pass the depth test have a cost depending on what shader/fixed functionality your running. - Triangles that do NOT pass the depth test still have a cost: - This cost mostly depends on whether Early-Z was used. Shaders that output a custom depth value per fragment have this disabled, meaning that the shader has to be run before the depth test. In this case the cost is almost the same as if it had passed the depth test. - With Early-Z the cost is lower, but still not free. I'd estimate it to about half the cost of simple shading.
Your GPU can fill a huge number of pixels per frame. My little laptop can handle around 79 million colored pixels per frame at 60 FPS, but this number drops insanely fast if you add texturing, lighting, e.t.c. For reference, a 1920x1080p screen is approximately 2 million pixels, so with 4 million triangles they should cover just a few pixels each for the total number of fragments to be low enough to not be a bottleneck, so overdraw is something you want to avoid. And again, your GPU is around 3-4 times faster than mine. xd
|
There is no god.
|
|
|
bk0
JGO n00b  Posts: 10
|
 |
«
Reply #8 on:
2012-02-24 13:40:19 » |
|
Hello again So far everything runs great and fast, too. Unless it crashes, that is  The error report is here: http://pastebin.de/23611According to this report, the error happens in renderBuffer(). Here's the source: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| void renderBuffer() { glBindTexture(GL2.GL_TEXTURE_2D, WWV.tex_handles[LODLevel]); glBindBuffer(GL.GL_ARRAY_BUFFER, vbo_handle); System.out.print("Rendering Chunk " + start_x + " " + start_y + " " + start_z); System.out.println(" vbo_handle: " + vbo_handle + " Buffer.capacity(): " + Buffer.capacity() + " WWV.tex_handles[LODLevel]: " + WWV.tex_handles[LODLevel]); Buffer.clear(); glGetBufferSubData(GL.GL_ARRAY_BUFFER, 0, Buffer.capacity()*4, Buffer); for (int i = 0; i < Buffer.capacity(); i++) { System.out.print(Buffer.get(i) + " "); } System.out.println();
glVertexPointer(3, GL.GL_FLOAT, 5 * 4, 0); glTexCoordPointer(2, GL.GL_FLOAT, 5 * 4, 3 * 4); System.out.println("Before drawing"); glDrawArrays(GL.GL_TRIANGLES, 0, Buffer.capacity() / 5); System.out.println("Done"); } |
Buffer is of type FloatBuffer, and is used to initially create the VBO. The crashes always happen when Buffer.Capacity() == 180 and WWV.tex_handles[LODLevel] == 5. Following is one output (crashes happen randomly) Rendering Chunk 352 768 448 vbo_handle: 11378 Buffer.capacity(): 180 WWV.tex_handles[LODLevel]: 5 368.0 784.0 448.0 0.0 0.0 368.0 768.0 448.0 0.0 0.0625 368.0 768.0 464.0 0.0625 0.0625 368.0 784.0 448.0 0.0 0.0 368.0 784.0 464.0 0.0625 0.0 368.0 768.0 464.0 0.0625 0.0625 352.0 784.0 448.0 0.0 0.0 352.0 768.0 448.0 0.0 0.0625 352.0 768.0 464.0 0.0625 0.0625 352.0 784.0 448.0 0.0 0.0 352.0 784.0 464.0 0.0625 0.0 352.0 768.0 464.0 0.0625 0.0625 368.0 768.0 464.0 0.0 0.0625 368.0 784.0 464.0 0.0 0.0 352.0 768.0 464.0 0.0625 0.0625 368.0 784.0 464.0 0.0 0.0 352.0 768.0 464.0 0.0625 0.0625 352.0 784.0 464.0 0.0625 0.0 368.0 768.0 448.0 0.0 0.0625 368.0 784.0 448.0 0.0 0.0 352.0 768.0 448.0 0.0625 0.0625 368.0 784.0 448.0 0.0 0.0 352.0 768.0 448.0 0.0625 0.0625 352.0 784.0 448.0 0.0625 0.0 368.0 784.0 448.0 0.0 0.0625 352.0 784.0 448.0 0.0 0.0 368.0 784.0 464.0 0.0625 0.0625 352.0 784.0 448.0 0.0 0.0 368.0 784.0 464.0 0.0625 0.0625 352.0 784.0 464.0 0.0625 0.0 368.0 768.0 448.0 0.0 0.0625 352.0 768.0 448.0 0.0 0.0 368.0 768.0 464.0 0.0625 0.0625 352.0 768.0 448.0 0.0 0.0 368.0 768.0 464.0 0.0625 0.0625 352.0 768.0 464.0 0.0625 0.0 Before drawing # # A fatal error has been detected by the Java Runtime Environment: (Rest of the error message, see pastebin)
Does anyone have an idea what could cause this? It isn't the texture, that works on other VBOs (I draw about 1800 VBOs each frame, about 1500 of them use this texture) It isn't the VBO itself, the data I grabbed seems okay (or do you see something wrong?) It shouldn't be the draw command, right? Fun fact: As long as I just look around in my world, the game never crashes. Only when I move the program sometimes crashes (as I said, it is random. It's not movement = crash). If that could be the problem: I use KeyListener, and if some Key is pressed, a boolean is set to true, and when it is released it is set to false. There is no actual movement during the rendering of the scene. It is strictly one Thread performing movement and view, and THEN starting to render the world... This should not lead to any problems with each other, right? Any Ideas? Maybe something I do completely wrong? Thanks again for any hints and tips 
|
|
|
|
|
theagentd
JGO Wizard     Posts: 1392 Medals: 88
|
 |
«
Reply #9 on:
2012-02-24 14:28:46 » |
|
Seems like an access violation problem. The functions are slightly different in JOGL, but shouldn't 1
| glGetBufferSubData(GL.GL_ARRAY_BUFFER, 0, Buffer.capacity()*4, Buffer); |
be 1
| glGetBufferSubData(GL.GL_ARRAY_BUFFER, 0, Buffer.capacity(), Buffer); |
since that method takes a FloatBuffer (?), so it will automatically multiply it by 4 bytes per float. That could produce a crash at random times.
|
There is no god.
|
|
|
Games published by our own members! Go get 'em!
|
|
lhkbob
JGO Neuromancer     Posts: 1174 Medals: 35
|
 |
«
Reply #10 on:
2012-02-24 14:54:25 » |
|
From the JVM crash, it looks like the call to glDrawArrays() is what is causing the problem, not glGetBufferSubData.
Usually a JVM crash during a call to glDrawArrays or glDrawElements is because you are attempting to a reference a vertex that is outside of the valid range in the vertex attribute VBOs or arrays.
The arguments in your pointer setup look fine, so the only thing I can think of is that when you allocate the VBO in vbo_handle, it has a size smaller than 4 * 180 (which is what you'd expect if you called glBufferData with Buffer). Looking at your original code, it appears as though you are packing multiple octree leaf data into a large VBO, so there is a chance that this is screwed up.
I would also recommend using a DebugGL wrapper around your GL to check for errors.
|
|
|
|
bk0
JGO n00b  Posts: 10
|
 |
«
Reply #11 on:
2012-02-24 15:33:43 » |
|
The VBOs are no longer packed together. But I followed your advice with DebugGL - and the call to glGetBufferSubData caused an error: GL_INVALID_OPERATION. If I read the specs right, this is only thrown if GL_INVALID_OPERATION is generated if the reserved buffer object name 0 is bound to target.http://www.opengl.org/sdk/docs/man/xhtml/glGetBufferSubData.xmlIt's already late in the evening, so my brain is half asleep - but I still can't figure out why the reserved buffer object name 0 should be bound to GL.GL_ARRAY_BUFFER!? Hm, on second thought I removed all the "debug" code, and this is what remained: 1 2 3 4 5 6 7
| glBindTexture(GL2.GL_TEXTURE_2D, WWV.tex_handles[LODLevel]); glBindBuffer(GL2.GL_ARRAY_BUFFER, vbo_handle); glVertexPointer(3, GL2.GL_FLOAT, 5 * 4, 0); glTexCoordPointer(2, GL2.GL_FLOAT, 5 * 4, 3 * 4);
glDrawArrays(GL2.GL_TRIANGLES, 0, Buffer.capacity() / 5); |
Clearly, this will no longer throw an error on glGetBufferSubData - but it should still fail (this was my original code before debugging, btw). And: It still fails, but WITHOUT an error message from DebugGL. Any ideas why OpenGL would fail so hard that no even the debugger gets a glimpse on what went wrong? As always, thanks for any tips  EDIT: @theagentd: Even if I remove the *4 in glGetBufferSubData I still get the same error. The error seems to be caused by the state of the VBO or something like that, not the call itself...
|
|
|
|
|
lhkbob
JGO Neuromancer     Posts: 1174 Medals: 35
|
 |
«
Reply #12 on:
2012-02-24 15:45:02 » |
|
Offsets, strides, and the contents of VBOs are not checked by the debugger. As an example of a contrived situation, I can have a texture vbo with half the elements of the vertex vbo. If both are configured as vertex attributes, and I attempt to render all of the vertices, the driver will start pulling in "texture" information from past the end of the shorter texture vbo.
Depending on how the layout of the vbo's are, you can walk into garbage vbo information, or get access violations which cause the JVM to crash. That is the case you're seeing, and why it tends to be unpredictable. Although that's the cause, I unfortunately don't have much advice for solving it accept to very carefully walk through the rendering and make sure the values passed to OpenGL are what you'd expect.
|
|
|
|
bk0
JGO n00b  Posts: 10
|
 |
«
Reply #13 on:
2012-02-24 16:10:18 » |
|
Ah, thank you. That at least explains why I don't get an error message  I'll try working out why I can't get data with glGetSubdata(), the error is probably hidden inside the VBO data in this case (as you said). I'll let you know what I screwed up when I find it.
|
|
|
|
|
bk0
JGO n00b  Posts: 10
|
 |
«
Reply #14 on:
2012-02-25 09:37:27 » |
|
Found it. Due to a logic bug, 2 chunks in my world got the same VBO handle (say, 292). I chunk put x bytes into the VBO, the other y bytes. Of course this crashed  Thanks again for the invaluable help. Especially that thingy with DebugGL, I have a feeling this will help me much in the future...
|
|
|
|
|
|