Orangy Tang
JGO Kernel      Posts: 2960 Medals: 37
Monkey for a head
|
 |
«
on:
2003-08-28 15:33:10 » |
|
Well its time to start optimising my graphics, i've got my lights and objects rendering nice and pretty, but performance gets terribly slow when i attempt anything other than a tiny test level. And since i'm not sure whats going to be the best option, i'm asking for favoured methods to speed things up. First stop: the profiler. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| VecriptProfile.txt: Exclusive Method Times (CPU) (virtual times) 1923807 sun.awt.windows.WToolkit.eventLoop 540070 net.java.games.jogl.impl.windows.WGL.SwapBuffers 266499 com.vecript.math.ConvexHull2D.renderShadowHull 240943 java.lang.StrictMath.acos 197238 net.java.games.jogl.impl.windows.WindowsGLImpl.glBegin 110363 net.java.games.jogl.impl.windows.WindowsGLImpl.glVertex3f 79736 com.vecript.core.entity.PointLight.getPenumbraVector 69594 net.java.games.jogl.impl.windows.WindowsGLImpl.glColor4f 67792 com.vecript.core.entity.PointLight.getUmbraVector 61921 net.java.games.jogl.impl.windows.WindowsGLImpl.glTexCoord2f 38634 java.util.ArrayList.get 31694 com.vecript.core.entity.PointLight.getDisplacedCenter 21952 net.java.games.jogl.impl.windows.WindowsGLImpl.glBindTexture 21752 com.vecript.math.ConvexHull2D.renderSolid 19150 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnd 18082 com.vecript.math.Vector2f.angle 13812 net.java.games.jogl.impl.windows.WindowsGLImpl.glDisable 11543 com.vecript.core.renderer.GameRenderer.drawGeometryPass 10276 com.vecript.math.ShadowFin.renderFin 9875 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnable 7607 com.vecript.core.renderer.GameRenderer.mergeShadowHulls 7006 java.util.prefs.WindowsPreferences.windowsAbsolutePath 5338 java.util.prefs.WindowsPreferences.WindowsRegOpenKey 3603 java.util.prefs.WindowsPreferences.WindowsRegQueryValueEx 2602 net.java.games.jogl.impl.windows.WindowsGLImpl.glClear 2602 java.security.AccessController.doPrivileged 1802 com.vecript.core.renderer.GameRenderer.findVisibleLights 1735 com.vecript.core.renderer.GameRenderer.render 1735 com.vecript.core.Shader.bind 1535 com.vecript.core.DebugConsole.setFramesPerSecondText 1468 net.java.games.jogl.impl.windows.WindowsGLImpl.glLightfv 1468 com.vecript.math.Circle.renderLightAlpha 1401 java.lang.Thread.currentThread 1268 java.lang.Math.acos 1268 java.awt.image.ComponentColorModel.getDataElements 1201 java.lang.StringBuffer.<init> 1134 java.util.prefs.WindowsPreferences.toWindowsName 1068 java.lang.FloatingDecimal.dtoa 1068 java.lang.StringBuffer.toString |
The above was grabbed with the -xprof options for the VM, and HP's handy-dandy JMeter to extract the information. Now the most obvious here are the ConvexHull and PointLight methods, as well as immediate mode gl calls sticking out like a sore thumb. (Ignore .SwapBuffers, i'm running at ~16fps so thats probably a red herring). At the moment the light and shadow rendering is the bottleneck, with the shadow geometry being built on the fly every frame and rendered in immediate mode. Shadow generation is quite lengthy, and at the moment is a brute force with no attempt to cull out any geometry. Rendering requires a z-fill & ambient pass, then a pass for every light. Optimisations that spring to mind are: - Some sort of spatial tree. Pretty obvious since i should be able to cull both invisible geometry and geometry out of a lights range in each pass.
- Getting rid of immediate mode in favor of something better. Plain old vertex arrays are probably going to be the best bet (no point caching results in vRam if they're going to change all the time).
- Some method of caching shadow geometry. Tricky, since every geometry-light pair has an associated shadow geometry. I can't think of a good way to do this that actually sounds like it'd be faster.
- Optimise my shadow geometry creation. Ugh, probably least favorite. The method is non-optimal in terms of new'd objects, but its some complicated maths that i don't like the idea of obscurificating it..
- Something else?
Spatial tree sounds good, but its more the getting rid of immediate mode i'm worried about - how on earth do I create this geometry on the fly in a nice and efficiant manner? Have one big buffer and fill it up as needed? Shadow geometry is rendered half without textures and half using a single texture, would a buffer for each be a good idea? It seems like i'd have to make it much bigger than needed and waste memory to get something efficiant.. Any pointers appreciated 
|
|
|
|
shawnkendall
JGO Ninja    Posts: 691 Medals: 2
Apathy Error: Don't bother striking any key.
|
 |
«
Reply #1 on:
2003-08-28 17:59:31 » |
|
That "java.lang.StrictMath.acos " isn't helping matters. What are you using it for?
I mean if you are creating new rotaiton matrices every frame for rotation that is one thing (but still can be minimizd ) but why "StrictMath" ?
|
|
|
|
swpalmer
JGO Kernel      Posts: 3438 Medals: 4
Where's the Kaboom?
|
 |
«
Reply #2 on:
2003-08-28 21:21:47 » |
|
Some Math methods delegate to StrictMath maybe this is one of them?
|
|
|
|
Games published by our own members! Go get 'em!
|
|
arm
JGO n00b  Posts: 13
Java games rock!
|
 |
«
Reply #3 on:
2003-08-29 00:24:30 » |
|
Excerpt from java.lang.Math: /** ... * A result must be within 1 ulp of the correctly rounded result. Results * must be semi-monotonic. * * ... */ public static double acos(double a) { return StrictMath.acos(a); // default impl. delegates to StrictMath } (J2SE) 1.4 math is slower than the corresponding routines in J2SE 1.3.1. You can use JNI to speed up math calculations, as described in http://www.javaworld.com/javatips/jw-javatip141.html
|
|
|
|
|
Orangy Tang
JGO Kernel      Posts: 2960 Medals: 37
Monkey for a head
|
 |
«
Reply #4 on:
2003-08-29 02:03:33 » |
|
Lemme see.. Math.acos is only called from Vector2f.angle(), but that is used with the shadow generation. Since its used to find relative light-occulder angles, its not something that can easily be cached (and ideally i'd just cache the whole shadow generated). Its only used 4 to 8 times per shadow depending on the positions.
So from the looks of things i need to either draw faster, or draw less and reduce the shadow calculations that way instead of trying to get the actual individual calculation faster.
|
|
|
|
abies
Sr. Member   Posts: 456
|
 |
«
Reply #5 on:
2003-08-29 05:16:22 » |
|
Please to not interpret java.lang.Math methods literally. As far as I understand, default implementation of them (one contained in src.zip) just delegates them to StrictMath - but hotspot can replace Math calls with specialized instruction - probably just replacing it with single fpu instruction (as it does not have to be really strict).
Now, I know that 1.4.x has some problems with Math performance - but this has nothing to do with what is inside Math.java class.
|
Artur Biesiadowski
|
|
|
shawnkendall
JGO Ninja    Posts: 691 Medals: 2
Apathy Error: Don't bother striking any key.
|
 |
«
Reply #6 on:
2003-08-29 07:18:01 » |
|
BTW, if you can just remove all your math and render calls completely, you can really get your frame rate up there! ;-)
"Ummm, yeah... if you could just remove the math and render calls, that'd be greaaat. Thanks. Oh, and I'm gonna need you to come in on Saturday..."
|
|
|
|
Orangy Tang
JGO Kernel      Posts: 2960 Medals: 37
Monkey for a head
|
 |
«
Reply #7 on:
2003-08-29 07:44:05 » |
|
So, er, no one has any practical hints on batching triangles with vertex arrays and byte buffers? Or ideas on how to cache results in some efficiant manner? Or am I going to get suggestions like 'use a LUT for cos'? 
|
|
|
|
shawnkendall
JGO Ninja    Posts: 691 Medals: 2
Apathy Error: Don't bother striking any key.
|
 |
«
Reply #8 on:
2003-08-29 08:34:10 » |
|
For static geometry, you need Vertex Buffer Objects. They are the fastest thing going for unchanging geometry. but JOGL does not support them without modifications. That is why the first thing I did was rebuild JOGL WITH support for VBO's (with much help from abies) Also, use vertex arrays whenever possible, they are faster for dynamic geometry than just calling glVertex3f 500 times. Bascially, get out of calling loops with gl calls in the them. The fastest is to move big blocks of geometry at a time with arrays and VBO's A quick Google on "glVertexPointer example" yielded http://www.movesinstitute.org/~mcdowell/mv4202/notes/lect14.pdfWhich a pretty darn good explanation - in PDF form. :-) You can get to the HTMl view from the Google search.
|
|
|
|
Orangy Tang
JGO Kernel      Posts: 2960 Medals: 37
Monkey for a head
|
 |
«
Reply #9 on:
2003-08-29 08:45:44 » |
|
Yeah, this was what i was hoping to get hints on, how people actually *practically* use vertex arrays for large amounts of dynamic geometry, particularly since the actual amount per frame may change by quite a bit depending on whats visible. VBOs aren't going to help in this case i think since the geometry is not really static.
Am i just going to have to bite the bullet and have a whopping great big byte buffer reused every frame? I feel like i'm going round in circles - everyone screams 'use XYZ!' when i'm asking 'how?'. :-/
Edit: and when I say 'how?' I don't mean just the gl spec, I mean in a way thats actually practical, such as how to organise byte buffers, size of, number, etc.
|
|
|
|
Games published by our own members! Go get 'em!
|
|
shawnkendall
JGO Ninja    Posts: 691 Medals: 2
Apathy Error: Don't bother striking any key.
|
 |
«
Reply #10 on:
2003-08-29 10:10:15 » |
|
If the number of elements changes from frame to frame, you will have to do like particle systems do and just make a maximum vertex array that you reuse up to the maximum each time. When you glDrawElements, you can say where to start and stop in that array. That is a fine way to do it :-)
|
|
|
|
NVaidya
Jr. Member   Posts: 95
Java games rock!
|
 |
«
Reply #11 on:
2003-08-29 10:14:36 » |
|
Don't know if this might help...But if math crunching seems to be one of the bottlenecks, then you might want to try the server option - I've been able to get a 2X speedup even with some fairly complex cases.
|
Gravity Sucks !
|
|
|
Orangy Tang
JGO Kernel      Posts: 2960 Medals: 37
Monkey for a head
|
 |
«
Reply #12 on:
2003-08-29 10:21:03 » |
|
If the number of elements changes from frame to frame, you will have to do like particle systems do and just make a maximum vertex array that you reuse up to the maximum each time. The more i think about this, the more this seems like the only option. It still feels slightly wrong though, but if its a common method i guess its just a speed vs. memory trade off..
|
|
|
|
abies
Sr. Member   Posts: 456
|
 |
«
Reply #13 on:
2003-08-29 11:45:44 » |
|
For static geometry, you need Vertex Buffer Objects. They are the fastest thing going for unchanging geometry.
I have heard that nvidia drivers do not optimize them as much as it should and VBOs end up being slower that display lists. Is it still a case ?
|
Artur Biesiadowski
|
|
|
elias
|
 |
«
Reply #14 on:
2003-08-29 13:23:21 » |
|
VBOs might very well be faster than ordinary arrays and immediate mode. Using VBO mappings you could end up with AGP memory saving yourself the extra copy, just like NV_vertex_array_range.
Regarding the acos: Do you _really_ need the angle and not simply cos(angle)? In which case you can get by with a much cheaper dot product.
- elias
|
|
|
|
Orangy Tang
JGO Kernel      Posts: 2960 Medals: 37
Monkey for a head
|
 |
«
Reply #15 on:
2003-08-29 14:10:55 » |
|
Well i'll experiment with VBO when its in a proper Jogl build, for now anythings gonna be faster than immediate mode  For acos, i'm not sure. Heres the actual bit of code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| public float clampToEdge(Vector2f edgeVector) { double penVsUm = penumbraVector.angle(umbraVector); double penVsEdge = penumbraVector.angle(edgeVector);
if (penVsUm > penVsEdge) { float ratio = (float)(penVsEdge / penVsUm); umbraIntensity = 1f - ratio; umbraVector.set(edgeVector); return umbraIntensity; } else return 0f; } |
Now I only really need the ratio of the angles, and the dot product did originally spring to mind. However the vectors aren't guranteed to be normalised, so i'm not sure if the dot product would actually work. But first i've got to get my basic quad tree working to see how much that helps..
|
|
|
|
elias
|
 |
«
Reply #16 on:
2003-08-30 01:17:53 » |
|
If you look in the Vector.angle code, you'd probably find that it normalizes the vector...
- elias
|
|
|
|
princec
« League of Dukes » JGO Kernel      Posts: 8089 Medals: 96
Eh? Who? What? ... Me?
|
 |
«
Reply #17 on:
2003-08-30 03:20:44 » |
|
I just said "Dot products work on any vectors, not just normalised ones." but realised that's completely untrue. So ignore me. Cas 
|
|
|
|
Orangy Tang
JGO Kernel      Posts: 2960 Medals: 37
Monkey for a head
|
 |
«
Reply #18 on:
2003-08-31 07:42:03 » |
|
Elias is right, although the .angle leaves the vectors unchanged, it divides though by the length so its effectily doing this same thing.. I changed the .clampToEdge to instead normalise all the vectors beforehand and use the dot product instead. Profiling this version without acos showed that the time spent in the shadow generation was cut approximatly in half  However the framerate seemed unchanged at ~16fps still. I also added a quadtree to cull out unneeded geometry. This not only reduces the number of light passes per frame (good, but not effective in the test level, and only 7 lights anyway so not important here) but also the amount of geometry casting a shadow from each light. The profiler suggests that i've cut the time calculating shadows down by a quater with this on, which is about right considering that most lights cover about a quater of the test level on avarage. Annoyingly, enabling the culling only sees an increase from ~16fps to ~18fps. Not the large increase i was hoping for. Current profile with both these in effect: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
| VecriptProfileNoCosFullCull.txt: Exclusive Method Times (CPU) (virtual times) 631997 sun.awt.windows.WToolkit.eventLoop 532977 net.java.games.jogl.impl.windows.WGL.SwapBuffers 16998 com.vecript.math.ConvexHull2D.renderShadowHull 12195 net.java.games.jogl.impl.windows.WindowsGLImpl.glBegin 5990 com.vecript.core.entity.PointLight.getUmbraVector 4371 net.java.games.jogl.impl.windows.WindowsGLImpl.glVertex3f 3346 com.vecript.core.entity.PointLight.getPenumbraVector 3238 net.java.games.jogl.impl.windows.WindowsGLImpl.glTexCoord2f 3022 java.util.ArrayList.get 2968 com.vecript.math.ConvexHull2D.intersectsRect 2914 com.vecript.core.spatialtree.QuadTreeBranchNode.findVisibleObjects 2860 com.vecript.core.entity.PointLight.getDisplacedCenter 2644 java.util.prefs.WindowsPreferences.WindowsRegOpenKey 2104 com.vecript.math.ConvexHull2D.renderSolid 1997 net.java.games.jogl.impl.windows.WindowsGLImpl.glColor4f 1511 java.lang.Thread.currentThread 1349 net.java.games.jogl.impl.windows.WindowsGLImpl.glDisable 1295 java.util.prefs.WindowsPreferences.WindowsRegQueryValueEx 1025 net.java.games.jogl.impl.windows.WindowsGLImpl.glBindTexture 971 java.util.prefs.WindowsPreferences.windowsAbsolutePath 917 java.security.AccessController.doPrivileged 917 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnable 917 com.vecript.core.renderer.GameRenderer.render 809 com.vecript.math.ShadowFin.renderFin 701 java.util.prefs.WindowsPreferences.toWindowsName 701 net.java.games.jogl.impl.windows.WindowsGLImpl.glClear 701 java.lang.ClassLoader.defineClass0 648 com.vecript.core.spatialtree.QuadTreeBranchNode.findVisibleLights 594 com.vecript.core.renderer.GameRenderer.drawGeometryPass 540 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnd 540 java.lang.StringBuffer.toString 540 sun.awt.windows.WGlobalCursorManager.findHeavyweightUnderCursor 540 java.awt.image.ComponentColorModel.getDataElements 486 java.lang.String.substring 432 java.io.FileOutputStream.writeBytes 432 net.java.games.jogl.impl.windows.WindowsOnscreenGLContext.swapBuffers 432 net.java.games.jogl.impl.GLContext.invokeGL 378 java.lang.StringBuffer.<init> 378 com.vecript.core.renderer.GameRenderer.mergeShadowHulls 324 java.util.prefs.WindowsPreferences.WindowsRegCloseKey 324 net.java.games.jogl.impl.windows.WindowsGLImpl.glLightfv 324 com.vecript.core.spatialtree.QuadTreeRoot.findVisibleObjects 270 java.lang.String.toCharArray 270 com.vecript.core.renderer.GameRenderer.findVisibleObjects 270 java.util.ArrayList.add |
One final test, on top of all this, I commented out the code to actually draw the shadows (but with all the calculations left in) to see what effect all the glBegin etc. calls were having. The fps then jumped to ~30fps from the previous ~18fps. Although 30fps is still slower than i'd like, I'm hoping that switching to vertex arrays is going to show this same kind of speed increase. Anyone any other ideas?
|
|
|
|
shawnkendall
JGO Ninja    Posts: 691 Medals: 2
Apathy Error: Don't bother striking any key.
|
 |
«
Reply #19 on:
2003-08-31 14:44:59 » |
|
Sounds like you're on track.  Careful analysis and chipping away at the stone....
|
|
|
|
|