Java-Gaming.org    
Featured games (78)
games approved by the League of Dukes
Games in Showcase (426)
Games in Android Showcase (89)
games submitted by our members
Games in WIP (466)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  PBO performance problems  (Read 3746 times)
0 Members and 1 Guest are viewing this topic.
Offline mabraham

Junior Member





« Posted 2005-12-22 23:43:29 »

Hi,

I want to improve the texture update speed of my GL renderer by using PBOs.  Right now, with standard glTexSubImage2D() updates on a 1024x1024 BGRA texture, I'm getting around 125fps which isn't too bad.  However for some reason using a PBO as a pixel unpack buffer, mapping that to a ByteBuffer, and copying the contents into the mapped buffer, I get only 20fps!!??  The bizarre thing is, the same PBO path coded in C++ performs extremely well (up to 250fps), so what have I done wrong in the java/jogl version...

Please if someone could help that would be fab.  I have attached the GLEventListener implementation to this post.

PS: I'm using  JSR-231 beta 01 - October 27...
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #1 - Posted 2005-12-23 06:34:46 »

You shouldn't create a new DebugGL object each time; make one in your init() method and call drawable.setGL(). That will ensure it is used in your reshape, display, etc. methods.

I don't think that should be the cause of the slowdown though. From your example you're using static texture data, so there shouldn't be any issue with the setup of the data taking longer in Java than in C++. Are you 100% sure you've set up all of the pixel unpack modes, etc. which might be needed to ensure your texture data isn't undergoing any conversion after you send it down each time? I'm pretty ignorant of PBOs so I apologize if this suggestion is meaningless.

If you have side-by-side C++ and Java code with a significant performance difference, could you please zip it up, file a bug with the JOGL Issue Tracker (you'll probably need to be an Observer of the project to do so), and attach your test cases?
Offline mabraham

Junior Member





« Reply #2 - Posted 2005-12-23 10:52:33 »

Hi Ken,

First of all I'm absolutely amazed of your endeavour (and that of other forum members) in supporting us users!

You shouldn't create a new DebugGL object each time; make one in your init() method and call drawable.setGL(). That will ensure it is used in your reshape, display, etc. methods.

I don't think that should be the cause of the slowdown though. From your example you're using static texture data, so there shouldn't be any issue with the setup of the data taking longer in Java than in C++. Are you 100% sure you've set up all of the pixel unpack modes, etc. which might be needed to ensure your texture data isn't undergoing any conversion after you send it down each time? I'm pretty ignorant of PBOs so I apologize if this suggestion is meaningless.

OK so I've changed the DebugGL creation as per your advice but you were quite right this was not the cause of the performance problem.  In terms of setting up unpack modes, I don't think there's any difference between PBOs and traditional glTexSubImage2D.  The way the data is interpreted should be exactly the same...  Also there's my C++ test case which performs so well.  I get the performance hit the moment the PBO is mapped, via glMapBuffer().  It almost feels like JOGL is taking a copy from VRAM (despite the GL_WRITE_ONLY)...

If you have side-by-side C++ and Java code with a significant performance difference, could you please zip it up, file a bug with the JOGL Issue Tracker (you'll probably need to be an Observer of the project to do so), and attach your test cases?

https://jogl.dev.java.net/issues/show_bug.cgi?id=188

Cheers,
Matt.

PS: I tried the latest nightly build, makes no difference.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #3 - Posted 2005-12-24 03:47:11 »

Thanks for the report. I will try to look at it probably soon after the holidays, earlier if possible, but in the meantime I would strongly encourage you to look for any differences between your C++ and Java source. There is very little work being done by JOGL and unless something like vsync (see GL.setSwapInterval(0)) is getting in the way then there should be no performance difference between JOGL and C++. There is absolutely no weird behind-the-scenes memory management being done by JOGL.
Offline mabraham

Junior Member





« Reply #4 - Posted 2005-12-24 10:33:45 »

I've got the swap interval set (see attachment to original post), and also in the NVidia driver property page.  The C++ and Java do identical OpenGL commands, I really don't see why the difference in speed.  I've now tested this on two different PCs (one has a NVidia Geforce 4 Ti 4600 and the other has the 7800GT), both give the same indication.  Would be very grateful if you could look into this.

This may be a stupid question but would there be any difference in the way the default OpenGL context is set up between JOGL and the C GLUT library?
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #5 - Posted 2005-12-26 00:17:43 »

Yes, the pixel format selection code in JOGL is completely different than that in GLUT; however in most if not all cases it should produce identical results. It delegates down to the platform's pixel format selection routine like ChoosePixelFormat by default.

I don't think I can reproduce the slowdown on my machine (Quadro FX Go700, 81.85 drivers, current JOGL from the CVS repository). Here are the results:

C++ version:
Average frame time: 16.7571ms
Average frame time: 16.7511ms
Average frame time: 16.757ms
Average frame time: 17.3289ms
Average frame time: 16.7543ms
Average frame time: 16.7545ms
Average frame time: 16.7536ms
Average frame time: 16.7542ms
Average frame time: 16.7563ms
Average frame time: 16.7517ms

Java version (1.5.0_06):
Average frame time: 14.457143ms
Average frame time: 14.098592ms
Average frame time: 14.521739ms
Average frame time: 14.940298ms
Average frame time: 14.521739ms

So on my machine it looks like the Java version is actually faster than the C++ version. What driver version, etc. are you running? Could you try the current JOGL nightly build?
Offline mabraham

Junior Member





« Reply #6 - Posted 2005-12-26 18:34:02 »

I took my numbers using the latest NVidia drivers, 81.98 (81.95), but also with the 78.01.

Just to confirm, does your card support PBOs?  I will try running with the latest nightly build in a moment.

Also, dunno how important this might be but I am running a dual display setup.  One display is set to 1920x1200x32bpp, the other to 1600x1200x32bpp.  Having said that, the "other" machine with the Geforce 4600 is with just one display (1600x1200x32).

Thanks,
Matt.
Offline mabraham

Junior Member





« Reply #7 - Posted 2005-12-26 19:11:43 »

Guess what, on the dual display machine, turning off the second display (under Display Properties) helped, I now get timings similar between Java/JOGL and C++/GLUT, for PBO and traditional glTexSubImage...  Unfortunately I will need to be able to support dual display configurations!

Ken, you mentioned a difference in the way that JOGL and GLUT choose their pixel formats.  I'm not experienced in this, but maybe this is where the problem lies?

Note I'm using the JOGL nightly build that is currently available, dated 20th December.
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #8 - Posted 2005-12-26 23:50:30 »

Try printing out glGetString(GL_VENDOR), GL_VERSION, and GL_RENDERER for the two programs in dual-head mode. Is there any difference in e.g. the renderers? Do you have one or two cards in the dual-head machine? I suspect that there is more of a difference between how GLUT and the AWT sets up a window and how GLUT and JOGL choose pixel formats.

You can specify -Djogl.debug.WindowsGLDrawable to see which pixel format JOGL chooses; I don't know how you could get the same information out of GLUT.
Offline mabraham

Junior Member





« Reply #9 - Posted 2005-12-28 10:29:41 »

OK so here's what I get with the Java side (see attached updated code):

dual-head (1920x1200;1600x1200):

AWT-EventQueue-0: Using ChoosePixelFormat because multisampling not requested
AWT-EventQueue-0: Chosen pixel format (6):
GLCapabilities [DoubleBuffered: true, Stereo: false, HardwareAccelerated: true, DepthBits: 24, StencilBits: 0, Red: 8, Green: 8, Blue: 8, Alpha: 0, Red Accum: 16, Green Accum: 16, Blue Accum: 16, Alpha Accum: 16 ]
GLEventHandler.init(): GL_VERSION = 2.0.1
GLEventHandler.init(): GL_VENDOR = NVIDIA Corporation
GLEventHandler.init(): GL_RENDERER = GeForce 7800 GT/PCI/SSE2
GLEventHandler.init(): streaming texture image using PBO
Average frame time: 49.095238ms
Average frame time: 49.857143ms
Average frame time: 50.75ms
Average frame time: 50.8ms
Average frame time: 51.55ms


single-head (1920x1200):

AWT-EventQueue-0: Using ChoosePixelFormat because multisampling not requested
AWT-EventQueue-0: Chosen pixel format (6):
GLCapabilities [DoubleBuffered: true, Stereo: false, HardwareAccelerated: true, DepthBits: 24, StencilBits: 0, Red: 8, Green: 8, Blue: 8, Alpha: 0, Red Accum: 16, Green Accum: 16, Blue Accum: 16, Alpha Accum: 16 ]
GLEventHandler.init(): GL_VERSION = 2.0.1
GLEventHandler.init(): GL_VENDOR = NVIDIA Corporation
GLEventHandler.init(): GL_RENDERER = GeForce 7800 GT/PCI/SSE2
GLEventHandler.init(): streaming texture image using PBO
Average frame time: 4.932039ms
Average frame time: 4.8846154ms
Average frame time: 4.7652583ms
Average frame time: 4.815166ms
Average frame time: 4.787736ms
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #10 - Posted 2005-12-28 20:05:22 »

Well as you can see there is no difference in the renderer, pixel format, etc. being chosen when your system is in single- vs. dual-head mode. This points to something lower-level going on, possibly in the AWT. I'm not really sure how to best diagnose this problem further. Ideally we would go into the GLUT sources, instrument them similarly to make sure the same pixel format is being chosen, and then check to see if there are significant differences in how the HWND is being set up. Do you have the time / interest in digging deeper into this? It may take some time but it would probably help others doing dual-head work.

Can you still reproduce the slowdown on the single-head machine? Could you print out the same output from that machine? It might be easier to figure out what's going wrong there.
Offline mabraham

Junior Member





« Reply #11 - Posted 2006-01-04 22:40:08 »

Hey,

I have now managed to hack GLUT to output its PIXELFORMATDESCRIPTOR, in a format similar to what JOGL outputs with '-Djogl.debug.WindowsGLDrawable'.

Chosen pixel format (7):
   DoubleBuffered: 1
   Stereo: 0
   HardwareAccelerated: 0
   DepthBits: 24
   StencilBits: 0
   Red: 8
   Green: 8
   Blue: 8
   Alpha: 0
   Red Accum: 16
   Green Accum: 16
   Blue Accum: 16
   Alpha Accum: 16

The only difference I can make out is HardwareAccelerated=0 for GLUT, =1 for JOGL.
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #12 - Posted 2006-01-05 03:02:15 »

That's great. Have you checked both the JOGL and hacked GLUT code to see whether they are really using exactly the same PIXELFORMATDESCRIPTOR? JOGL reports it is using index 6 while GLUT reports 7, but I think JOGL is using 0-based indices while GLUT is using the Windows default 1-based indices.

My only suggestion here would be to write your own GLCapabilitiesChooser to force JOGL to use exactly the same index as GLUT and see if it changes the behavior. However I suspect that both are already really using the same pixel format.

I really don't know what the issue could be. It might be related to properties in the window class set up by the AWT or may have something to do with how  the AWT handles its multi-monitor support. If you already have a hacked GLUT then maybe the easiest thing to do would be to modify it to look more like the AWT's window setup code and see if you can induce the slowdown in your modified version of GLUT. Hacking the JDK is less easy although in theory doable. You can find the current Mustang JDK sources at http://mustang.dev.java.net/ and the native code in question is in src/windows/native/sun/windows/ . If you get a chance to download and briefly look at these sources I can probably help you tweak the GLUT code if you need it.
Offline mabraham

Junior Member





« Reply #13 - Posted 2006-01-05 11:00:32 »

I tried various PFDs in the hacked GLUT code, and all of them performed well (apart from occasional single-buffer screen flicker problems).  I also added a custom CapsChooser to my GLCanvas ctor, returning various values, none of them improved performance in the slightest.

I was since wondering whether this could be a threading issue in the NVidia drivers.  After all, I am actually having severe stability issues with this setup too, but that's entirely OT.  In case you're interested, NVidia drivers don't like dual-core CPUs at all (http://www.google.co.uk/search?q=forceware+dual+core+problem&start=0&ie=utf-8&oe=utf-8).  Anyway I am not convinced this is causing the original problem, as instead of using an animator I tried ticking the scene from the event dispatch thread which makes it no faster.

I haven't got the time just now to dig into the JDK sources.  I never tried building it, for starters.  Is that relatively easy?  By the way, GLUT's Win32 WNDCLASS setup is quite straightforward, I didn't see anything out of the ordinary in there.

If I had a decent ATI gfx card I'd give it a try but the one I have (Radeon 7000) won't do PBOs.  Anyone out there willing to help?
Offline mabraham

Junior Member





« Reply #14 - Posted 2006-01-05 20:19:56 »

I went to do some timings on "the other" machine with just one display attached to it, running off a Geforce 4 Ti 4600.  On that setup JOGL and GLUT achieve comparable PBO timings of 10..12ms.  This is in agreement with my findings on my primary machine when that is configured for single display mode (only that it's much faster).

However, I couldn't resist attaching a second display to that "other" box, and guess what the JOGL PBO timings go bad, same story as on my primary box.  I have attached the output of running GLUT and JOGL proggies for this scenario, including the PFD attributes.

I have just downloaded Mustang build 65, will install that first and see if it makes any difference.  If not then I'm going to try and build it from source, and then study the area in there that you pointed me at.
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #15 - Posted 2006-01-05 23:46:30 »

On the GeForce 4600 the chosen pixel formats are definitely different which you can see just from looking at the attributes. Could you please check to see whether you have indices which match between the two pieces of code? I think JOGL's is probably zero-based while that being printed from GLUT is one-based. If so, please change the GLUT one to be zero-based by subtracting 1 before printing it so we know what we're talking about. It would be instructive to write your own GLCapabilitiesChooser which forcibly chooses a particular pixel format so you can make JOGL's match GLUT's.

Again, I don't think that's the root cause of the slowdown.

I also doubt that multithreading issues are the cause of the slowdown. JOGL explicitly forces all OpenGL work onto one thread internally (when using the GLEventListener callback model) because of stability problems with multiple vendors' drivers in the face of multithreading. I think it's probably something going on in the AWT.
Offline mabraham

Junior Member





« Reply #16 - Posted 2006-01-06 10:29:16 »

I did write my own GLCapsChooser, forcibly choosing any one of the available PFDs.  Admittedly not on the GeForce 4600 system, but the 7800GT one.  Anyway it makes no difference to performance when using PBOs.

I tried building Mustang but am getting swamped here with strange error messages.  I don't want to bore you with the details but just in case you're interested: somehow, I can only run 'make' from within Cygwin's bash shell (not from cmd.exe directly) even though everything should be on the PATH.  I have also set up all other env vars correctly (I think), but am not getting past the sanity check.  Also it is trying to build the 64-bit target (am running XP x64 here) whereas I really only want the Win32 target (haven't had time to try and build JOGL for 64-bit).  Needless to say that VC7 won't build 64-bit binaries so god knows why it's trying to do that.

Apologies if all this sounds very incoherent...
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #17 - Posted 2006-01-06 17:40:19 »

I'm not sure that building Mustang is the most expedient way to track down what's going on, but I'd be glad to try to help you get it working. You might also consider posting on the Mustang feedback forum. I think it is to be expected that you have to run make from within the Cygwin shell on Windows. Please post the output from the sanity check failure. I think you can override which architecture the builds produce with "make ARCH_DATA_MODEL=32". I think you will need to first build the HotSpot sub-portion of the workspace (unless you started with the "control" build, which I have less experience with).
Offline mabraham

Junior Member





« Reply #18 - Posted 2006-01-06 21:39:43 »

Cool.  As far as I can see, AwtComponent (src/windows/native/sun/windows/awt_Component) is what I should be looking at.  Roundabout 8,000 lines of C(++) code!

There is AwtComponent::FillClassInfo(WNDCLASSEX *) which looks vaguely familiar, and other stuff too.  But I clearly lack the expertise to even remotely understand where the problem could be.  You mentioned you could help me to hack the GLUT code.  Do you have any concrete ideas?

I am tempted to write a non-GLUT Win32 testbed for my PBO problem, to have one code base less to worry about.
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #19 - Posted 2006-01-06 23:04:51 »

If you can point me either to your hacked GLUT or where you started from I can look at it with you. I think the first thing to try is to add all of the flags from the AWT's WNDCLASS to GLUT's to see if it's one of those which is causing the problem.
Offline mabraham

Junior Member





« Reply #20 - Posted 2006-01-07 10:50:49 »

glut_init.c, method __glutOpenWin32Connection().  I have changed this to look as follows:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
  /* Clear (important!) and then fill in the window class structure. */
  memset(&wc, 0, sizeof(WNDCLASS));
  wc.style         = 0;//CS_OWNDC;
 wc.lpfnWndProc   = (WNDPROC)__glutWindowProc;
  wc.hInstance     = hInstance;
  wc.hIcon         = LoadIcon(hInstance, "GLUT_ICON");
  wc.hCursor       = 0;//LoadCursor(hInstance, IDC_ARROW);
 wc.hbrBackground = NULL;
  wc.lpszMenuName  = NULL;
  wc.lpszClassName = classname;


The AWT code (Mustang b65, awt_Component.cpp line 384 method FillClassInfo) looks like this:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
void AwtComponent::FillClassInfo(WNDCLASSEX *lpwc)
{
    lpwc->cbSize        = sizeof(WNDCLASSEX);
    lpwc->style         = 0L;//CS_OWNDC;
   lpwc->lpfnWndProc   = (WNDPROC)::DefWindowProc;
    lpwc->cbClsExtra    = 0;
    lpwc->cbWndExtra    = 0;
    lpwc->hInstance     = AwtToolkit::GetInstance().GetModuleHandle(),
    lpwc->hIcon         = AwtToolkit::GetInstance().GetAwtIcon();
    lpwc->hCursor       = NULL;
    lpwc->hbrBackground = NULL;
    lpwc->lpszMenuName  = NULL;
    lpwc->lpszClassName = GetClassName();
    //Fixed 6233560: PIT: Java Cup Logo on the title bar of top-level windows look blurred, Win32
   lpwc->hIconSm       = AwtToolkit::GetInstance().GetAwtIconSm();
}


My change to the GLUT WNDCLASS structure didn't change the performance...

As far as dumping the GLUT-chosen PIXELFORMATDESCRIPTOR, check out win32_glx.c method glxChooseVisual(), and insert the following at line 227:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
    fprintf( stderr, "Chosen pixel format (%d):\n", pf );
    fprintf( stderr, "   DoubleBuffered: %d\n"
                     "   Stereo: %d\n"
                     "   HardwareAccelerated: %d\n"
                     "   DepthBits: %d\n"
                     "   StencilBits: %d\n"
                     "   Red: %d\n"
                     "   Green: %d\n"
                     "   Blue: %d\n"
                     "   Alpha: %d\n"
                     "   Red Accum: %d\n"
                     "   Green Accum: %d\n"
                     "   Blue Accum: %d\n"
                     "   Alpha Accum: %d\n",
                     (match->dwFlags & PFD_DOUBLEBUFFER) != 0,
                     (match->dwFlags & PFD_STEREO) != 0,
                     (match->dwFlags & PFD_GENERIC_ACCELERATED) != 0,
                     match->cDepthBits,
                     match->cStencilBits,
                     match->cRedBits,
                     match->cGreenBits,
                     match->cBlueBits,
                     match->cAlphaBits,
                     match->cAccumRedBits,
                     match->cAccumGreenBits,
                     match->cAccumBlueBits,
                     match->cAccumAlphaBits );


Oh in case I haven't said this, I'm using the latest glut-3.7 libraries.
Offline mabraham

Junior Member





« Reply #21 - Posted 2006-01-10 00:19:09 »

Just to add that I have been able to reproduce the performance problem on a Dell Precision workstation with an NVidia Quadro 980XGL, again dual display configuration.  Driver version 81.65 (WHQL) if I remember correctly.

I'm beginning to wonder if this is a bug in the NVidia drivers, uncovered maybe by some peculiar setup of the AWT.  For what it's worth I should probably consider posting a link to this thread on some NVidia developers forum.
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #22 - Posted 2006-01-10 00:57:50 »

Please do raise this issue on the NVidia forums. It is very unlikely there is anything in JOGL specifically which is triggering this problem, which would probably make it difficult to find a workaround even if we tracked down the root cause. I'm pretty swamped at the moment so can't promise a quick response on helping to debug the problem from our end.
Offline mabraham

Junior Member





« Reply #23 - Posted 2006-04-16 15:19:04 »

I'm pleased to report that the latest NVidia Forceware drivers (84.25) for Windows XP Professional 64-bit have resolved this issue; PBO works as expected, on dual-head dual-core CPU configurations!  In general, I found this to be the best driver in conjunction with JOGL so far.  I haven't had a chance yet to confirm this with the latest 32-bit XP drivers.
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #24 - Posted 2006-04-16 19:17:21 »

Thanks for following up. This is a big relief as I didn't see any obvious differences in how JOGL and LWJGL were managing their OpenGL contexts and further couldn't see how any differences could manifest in this behavior. Please post whether the 32-bit drivers fix the problem there as well.
Offline mabraham

Junior Member





« Reply #25 - Posted 2006-04-17 15:03:39 »

More good news: I can confirm the problems did exist under Win32 (x86) with NVidia driver version 81.98 and have disappeared since I installed version 84.21!
Offline Ken Russell

JGO Coder




Java games rock!


« Reply #26 - Posted 2006-04-17 21:02:31 »

Thanks again. While there is probably something different between how JOGL and LWJGL manage their OpenGL contexts I have looked through and instrumented the source code for both libraries and don't see any obvious differences which could result in this. I'm glad to hear NVidia's latest drivers solve the problem and since the root cause is obviously there I have closed the associated JOGL bug as "won't fix".
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (75 views)
2014-04-15 18:08:23

BurntPizza (68 views)
2014-04-15 03:46:01

UprightPath (80 views)
2014-04-14 17:39:50

UprightPath (65 views)
2014-04-14 17:35:47

Porlus (81 views)
2014-04-14 15:48:38

tom_mai78101 (105 views)
2014-04-10 04:04:31

BurntPizza (165 views)
2014-04-08 23:06:04

tom_mai78101 (261 views)
2014-04-05 13:34:39

trollwarrior1 (210 views)
2014-04-04 12:06:45

CJLetsGame (220 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!