Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (517)
Games in Android Showcase (123)
games submitted by our members
Games in WIP (578)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  Libgdx vs Stumpy: Sprites  (Read 7865 times)
0 Members and 1 Guest are viewing this topic.
Offline StumpyStrust
« Posted 2012-10-19 07:11:47 »

So now that I have my spritebacher working like a charm.  Grin I wanted to put it up against libgdx's sprite batcher and see if it is worth anything.

My spritebatcher simply uses vertex arrays and not VBO where I think libgdx uses VBO. I also am not using libgdx's vector class which should give a performance boost for libgdx but I do not know how much. I also may not be using the best method for rendering mass sprites in libgdx. Just batch.begin, render sprites, batch.end.

I will admit I looked at libgdx's sprite batcher to see how they did things and based mine similarly off theirs. (mainly just the rotation and texregion part  Wink).

Sorry for the large files but they both come packed with all natives and everything so....

Controls.
G to generate a bunch of particles/sprites
K to kill them.

libgdx test case

http://www.mediafire.com/?ccomdyb7inpvocy

Notes: no delta so seems slower but looks better.

On my desk top the performance for libgdx is about 60fps at 50k sprites. Very good. Once you go past this tho the performance drops considerably. 30fps at 100k. I think that, that ratio  is very nice.

On my laptop with integrated graphics everything is much slower. 50k 40fps or lower. 100k makes me cry.

Stumpy's test case

http://www.mediafire.com/?wl4w7lenk4f17i7

Notes:uses delta timing so it seems faster but looks bad due to delta can throw things off with low frame rates. Also, don't use the popup menu to generate particles/sprites as they will be different colors thing libgdx test case.

On my desk top performance is almost identical to libgdx except at 100k it is 1-3fps faster.

On my laptop libgdx is much faster then my sprite batcher 5-8fps. I think this is because my laptop gets fillrate limited almost instantly. In my sprite batcher I do absolutely nothing to lower the fill rate and I have no idea exactly what libgdx does but they are faster.

I would like you all to test the speeds to see what you get.


Offline davedes
« Reply #1 - Posted 2012-10-19 07:55:30 »

At 150,000 particles, both give me 20 FPS. At 5,000 particles, yours is at 50 FPS and LibGDX's is at 40 FPS.

The tests don't seem very consistent; for one, when the particles are clustered together the FPS spikes because my GPU seems to use some sort of culling optimization to reduce fill rate.

I've looked into LibGDX's sprite batch, and there are a few areas that could be improved (not necessarily for LibGDX's needs, but perhaps for a specialized game engine).
  • Mapped VBOs could be used; although Vertex Arrays (the default implementation in LibGDX) may be just as fast if not faster
  • A FloatBuffer could be used instead of putting it all into an array first
  • Calling begin() and setShader(..) will send uniform data to custom shaders, making them not necessarily optimal
  • There are a lot of redundant calls to glEnable, glBlendFunc, glUseProgram, glBindTexture, etc.
  • Advanced features like geometry shader, multi-texturing, multiple render targets, texture arrays, etc. could be employed on desktop.
  • Polygons could be drawn in the same batch as regular sprites since in the end it's all just textured triangles
  • A shader-based approach could be employed, like theagentd has done with his tile renderer.
  • There is no use of z-buffer for depth, so naturally you may run into texture swapping with multiple sprite sheets

Offline Cero
« Reply #2 - Posted 2012-10-19 15:01:51 »

  • Mapped VBOs could be used; although Vertex Arrays (the default implementation in LibGDX) may be just as fast if not faster
I have talked with Mario about that.
We did some benchmarking over a year ago and found that especially on modern NVidia cards VAO are horrible in performance, and some internet posts suggest its because its deprecated with OpenGL 3 or something. We ended up using VBOs, although its entirely possible that we screwed it up back then.

  • A shader-based approach could be employed, like theagentd has done with his tile renderer.
I recall that it needs OpenGL 3.

Advanced features like geometry shader, multi-texturing, multiple render targets, texture arrays, etc. could be employed on desktop.
Well yeah, since I have a big Desktop game using Libgdx now, I would welcome nice Desktop only features... however the guys have too much to do as it is.
@Nate Still waiting for particle system refactoring and fixing x3[/list]

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline StumpyStrust
« Reply #3 - Posted 2012-10-19 16:36:27 »

    At 150,000 particles, both give me 20 FPS. At 5,000 particles, yours is at 50 FPS and LibGDX's is at 40 FPS.

    The tests don't seem very consistent; for one, when the particles are clustered together the FPS spikes because my GPU seems to use some sort of culling optimization to reduce fill rate.
    sheets[/list]

    5,000 particles? I don't think they will let you generate that low.

    Do both libgdx and Stumpy do the clustering FPS thing? If so I really don't see an issue. If it is your GPU doing that I cannot help it.

    Yes it would be great to use modern opengl calls and strategies to speed things up but that would kill a bunch of people from using the app. (my lap top can only use opengl 2.0) Also, libgdx is meant for small devices that only support 2.0.

    I tried a VBO version using mapped VBO and it was slower. The bottle neck is either fillrate or cpu filling the arrays/updating particles. This seems to be true for both libgdx and mine.

    Offline davedes
    « Reply #4 - Posted 2012-10-19 17:00:47 »

    My computer only supports OpenGL 2.1 but, as with most other computers these days, supports a variety of 3.0+ extensions (GL_EXT_texture_array, GL_EXT_framebuffer_object, GL_EXT_geometry_shader4, GL_ARB_texture_float). In other words, things like theagentd's tile map shader are definitely possible even when 3.0+ is not present.

    That doesn't mean you need to rely on these extensions; but if they are present, you may as well make use of them since the vast majority of your audience will benefit from them. Those on old/shitty systems can turn down the quality settings (which might equate to capping your total particles or something).

    Geometry shaders will reduce CPU load. Texture arrays will reduce texture binds. Passing circles instead of quads/triangles can reduce fill-rate.

    Quote
    I tried a VBO version using mapped VBO and it was slower.
    This could be due to your code (i.e. not re-using buffers) or GPU (as Cero noted). Maybe you should post some of your code and implementation instead of saying "Here is what my sprite renderer looks like next to LibGDX, end of story."

    Quote
    Do both libgdx and Stumpy do the clustering FPS thing? If so I really don't see an issue. If it is your GPU doing that I cannot help it.
    It's not really a benchmark if it can be easily influenced by user input (and the results skewed).

    Also, I meant 50,000.

    Offline StumpyStrust
    « Reply #5 - Posted 2012-10-19 19:06:58 »

     
    Quote
    I tried a VBO version using mapped VBO and it was slower.
    This could be due to your code (i.e. not re-using buffers) or GPU (as Cero noted). Maybe you should post some of your code and implementation instead of saying "Here is what my sprite renderer looks like next to LibGDX, end of story."

    Well this was what my current sprite render looks like next to libgdx's. Not necessarily end of story but just wanted to see what other peoples results are as I can only test on 2 computers.
    I was reusing buffers but the main bottle neck, as I said, was filling the arrays with the data. VBOs I think also have a little more over head then vertex arrays, but I may be wrong. Also, my design of particles is not the most optimal as it is basically just rendering sprites.

    Quote
    Do both libgdx and Stumpy do the clustering FPS thing? If so I really don't see an issue. If it is your GPU doing that I cannot help it.
    It's not really a benchmark if it can be easily influenced by user input (and the results skewed).
    [/quote]

    How is what your GPU does user input? I don't really know how to make it so your GPU does not do certain things that I do not tell it to do. I am not doing any culling so...you may just have nice drivers. This was more of a, "what do you get with both given your current hardware"

    The reason why I am testing mine against libgdx is because libgdx is very fast and optimized for lower hardware specs. (from what I know) I am sorry if some how I am stepping on your toes but I do not claim to be pro in anything.


    Offline badlogicgames
    « Reply #6 - Posted 2012-10-19 20:59:13 »

    Cool stuff! Nice to see that we are still doing OK for the most part Smiley

    Couple of comments:

    I've looked into LibGDX's sprite batch, and there are a few areas that could be improved (not necessarily for LibGDX's needs, but perhaps for a specialized game engine).
    • Mapped VBOs could be used; although Vertex Arrays (the default implementation in LibGDX) may be just as fast if not faster
    OpenGL ES 2.0 sadly has no VBO mapping. I guess that would probably the fastest option. As it stands, VAs are tons faster Android than VBOs. The reason seems to be a bit weird on Android: if you use a single VBO and render multiple batches with it, you'll stall the GPU hard. I tried to fix this by using a pool of VBOs, but whatever i did, VAs always won by a large margin. For this reason we now use VAs exclusively.

    • A FloatBuffer could be used instead of putting it all into an array first
    Not on Android. On versions < 3, anything direct Buffer is totally f**ked. JNI overhead kills you on Android as well. And Dalvik can optimize tight array access rather well. Long story short: using an array is a factor 10-15 faster than using a direct Buffer on Android.

    • Calling begin() and setShader(..) will send uniform data to custom shaders, making them not necessarily optimal
    Jupp, i just couldn't figure out how to let the user decide which uniforms should be send to the shader. I guess i coud disable all uniform setting via a boolean flag though and have the user set the camera matrices. Hrmm, good input, thanks!

    • There are a lot of redundant calls to glEnable, glBlendFunc, glUseProgram, glBindTexture, etc.
    Yes, that is mostly to guarantee that after begin()/end() we leave OpenGL ES in a clean state. There are probably a few places where we could trim that down. I'd love to get a pull request Cheesy

    • Advanced features like geometry shader, multi-texturing, multiple render targets, texture arrays, etc. could be employed on desktop.
    Agreed. However, my time budget is only so big, and i'm not sure there could be such an elaborate 2D game that the current method is insufficient on the desktop. Any desktop machine that supports geometry shaders, multiple render targets, texture arrays is likely so beefy that the additional code paths are not worth the time imo.

    • Polygons could be drawn in the same batch as regular sprites since in the end it's all just textured triangles
    That is actually true. Polygon support was added by a third party, i didn't look to hard into it. I think i remember there being a bit of an issue with the indices array, which SpriteBatch generates once on startup as it is fixed. Not sure though.

    • A shader-based approach could be employed, like theagentd has done with his tile renderer.
    I actually had that at one point, turns out to be a lot slower than what we currently have, again, only on Android.

    • There is no use of z-buffer for depth, so naturally you may run into texture swapping with multiple sprite sheets

    Concious decision not to use the z-buffer with SpriteBatch. There's DecalBatch for that.

    Awesome feedback, thanks a bunch. I guess i'll try to fix a few of those issues.

    http://www.badlogicgames.com - musings on Android and Java game development
    Offline theagentd
    « Reply #7 - Posted 2012-10-19 22:13:57 »

    Requires OGL 3, spawn with the G key.
    http://www.mediafire.com/?bayay2l6snydi7r

    Myomyomyo.
    Offline davedes
    « Reply #8 - Posted 2012-10-19 22:34:25 »

    ... snip ...
    Yep, basically it comes down to this: LibGDX's sprite batch is sufficient for the vast majority of cases, and the performance gain from geometry shaders (or another technique) will likely be negligible and not worth losing portability/flexibility/ease-of-use.

    I would like to get some pull requests in, though, eg: mouse cursors, closeRequested, and other desktop features.

    Offline theagentd
    « Reply #9 - Posted 2012-10-19 22:44:12 »

    ... the performance gain from geometry shaders (or another technique) will likely be negligible and not worth losing portability/flexibility/ease-of-use.
    ...

    Myomyomyo.
    Games published by our own members! Check 'em out!
    Legends of Yore - The Casual Retro Roguelike
    Offline StumpyStrust
    « Reply #10 - Posted 2012-10-20 00:32:37 »

    How easy is it to render using the geometry shader?

    Can you abstrract it to

    drawImage/renderImage(blah blah blah....location size....blah blah blah)

    Huh

    If so...sweet  Grin. Still wont run on my laptop though....actually everything you have posted won't run on it. Stinkin integrated chip.

    Offline Nate

    JGO Kernel


    Medals: 153
    Projects: 4
    Exp: 14 years


    Esoteric Software


    « Reply #11 - Posted 2012-10-20 00:57:35 »

    @Nate Still waiting for particle system refactoring and fixing x3
    I know. Sad I'm neglecting libgdx and kryo a tiny bit lately to try and get some projects done that can make some money. Daddy gotta eat! I'll be back, no worries.

    I would like to get some pull requests in, though, eg: mouse cursors, closeRequested, and other desktop features.
    If you switch to LwjglFrame you can use Swing to do a lot of window related that LWJGL's window doesn't yet support (getting/setting size/minimized state, cursors, close requested, etc). Likely a small performance hit but I doubt it'd hurt most apps.

    Offline theagentd
    « Reply #12 - Posted 2012-10-20 01:16:03 »

    How easy is it to render using the geometry shader?

    Can you abstrract it to

    drawImage/renderImage(blah blah blah....location size....blah blah blah)

    Huh

    If so...sweet  Grin. Still wont run on my laptop though....actually everything you have posted won't run on it. Stinkin integrated chip.
    Easier in my opinion, but you have to make a shader that supports all the functions you want. This one just supports a position, a size (width and height) and a color. It then generates texture coordinates in the shader from 0 to 1. You could implement rotation and texture arrays (= as many different textures as you can store in VRAM which is a LOT) for example. To "render" something, just throw in the data needed for a sprite into a buffer and render all of them with glDrawArrays(). It's a lot more efficient since you don't have to upload 4 vertices per sprite.

    Performance is the biggest difference. Your two JARs can handle around 75-80k particles at 60 FPS on my comp, and around 60k when I hold down the mouse. Mine runs with 1100k particles at 60 FPS and is only very slightly affected by holding the mouse, but that's because I simply use all cores to update and write the sprite data to the VBO. That makes it RAM limited since there's so much processing power available. That's cheating you say? Okay, with one thread 600k particles at 60 FPS, 425k when I hold down the mouse.

    There's no magic on the CPU side. These three functions in my Particle class are the only things I run each frame :

    1  
    2  
    3  
    4  
    5  
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    14  
    15  
    16  
    17  
    18  
    19  
    20  
    21  
    22  
    23  
    24  
    25  
    26  
    27  
    28  
    29  
    30  
    31  
    32  
    33  
    34  
         public void pull(int mouseX, int mouseY){
             
             float dx = mouseX - x, dy = mouseY - y;
             float distSqrd = dx*dx + dy*dy;
             
             //float force = 0.01f / (float)Math.sqrt(distSqrd);
             float force = 2f / distSqrd;
             vx += dx * force;
             vy += dy * force;
          }

          public void update() {

             vx *= 0.999;
             vy *= 0.999;
             
             x += vx;
             y += vy;
             
             if((x < 0 && vx < 0) || (x > WIDTH && vx > 0)){
                vx = -vx;
             }
             
             if((y < 0 && vy < 0) || (y > HEIGHT && vy > 0)){
                vy = -vy;
             }
          }

          public void put(ByteBuffer data) {
             data.
             putFloat(x).putFloat(y).
             putFloat(width).putFloat(height).
             putInt(color);
          }


    This one is pretty simple, but you can expand it as much as you want. Texture arrays allows you to use as many textures as you want with mipmapping, custom texture coordinates to allow you to pick out parts of the texture, not just a whole layer; rotation done by the GPU per sprite, coloring, multitexturing, whatever you want. The only thing that you can't do is custom non-rectangular geometry. Rotation is fine, but if you need coordinates per point, there's pretty much no point in using a geometry shader to expand it. You might save a few bytes = gain a few FPS, but the win would be pretty minimal.

    Compared to no shaders/OGL 2 shaders memory usage is reduced a lot since we don't have to duplicate data between vertices. For the above stuff you'd need 4 2D coordinates per corner, and you'd need to duplicate the color data once per vertex. That's 4 x 2 x 4 bytes for the positions + 4*4 bytes of color = 32 + 16 = 48 bytes per sprite. I just use 4 floats and 4 bytes = 20 bytes per sprite. Since we're handling so much data, performance increases a lot by simply reducing it. Add that we don't need to calculate 4 positions on the CPU (the 4 corners) since we just drop in the center position and a width and a height. That's 8 saved float additions per sprite. If you want rotation you'd have to do that on the CPU too since you can't manipulate matrices and stuff between each sprite. Since GPUs are so fast it won't budge an inch from the extra load, but your CPU will take a huge hit if you do it there.

    Myomyomyo.
    Offline StumpyStrust
    « Reply #13 - Posted 2012-10-20 02:16:39 »

    hmm my comp is about 10 fps slower than yours but on your program it gets less then half of what you get.  Clueless

    Never said multi-threading was cheating.

    When I use the put method in ByteBuffers I lose 50% performance vs put(array)

    Hehe I can have particles that are not perfect squares but I don't think that is really a big deal. The biggest bottle neck I think is again filling arrays for the gpu. By using the geometry shader it seems that you can, as you have said, reduce the needed vertices from one at each corner to just one. I would really love to try and rewrite my sprite batcher using this method but I could not run it on my laptop which is what I code on mostly.

    How would you handle things like blinking particles, growing/shrinking, maxSizes/minSizes, maxFade/minFade, animated etc etc etc. Just have a particle with a bunch of variables? It is what I am doing now but I am worried about memory usage.

    Offline Jimmt
    « League of Dukes »

    JGO Kernel


    Medals: 136
    Projects: 4
    Exp: 3 years



    « Reply #14 - Posted 2012-10-20 02:30:56 »

    LibGDX one - 20-30 fps @150000 particles.
    Yours - 25-35 fps @150000 particles.
    Offline theagentd
    « Reply #15 - Posted 2012-10-20 11:49:07 »

    hmm my comp is about 10 fps slower than yours but on your program it gets less then half of what you get.  Clueless

    Never said multi-threading was cheating.

    When I use the put method in ByteBuffers I lose 50% performance vs put(array)
    Uh, how fast was it? You're not very clear... =S Also, what are your specs?

    How would you handle things like blinking particles, growing/shrinking, maxSizes/minSizes, maxFade/minFade, animated etc etc etc. Just have a particle with a bunch of variables? It is what I am doing now but I am worried about memory usage.
    Blinking particles: Update the color on the CPU each frame, or send the time it's been alive and generate the blinking effect there.
    Size changing: Just change the width and height variables? It's the most flexible way at least. If you have the time it's been alive (from blinking particles) you could use that too.
    Fading: Either update the color on the CPU or use the time it's been alive.
    Animated: Upload a time variable and pick out textures based on the time passed (veeery easy with texture arrays). You can loop animations too.


    EDIT: I tried a second technique, instancing. The idea was to draw the particles as instances of a 4 vertex quad. The data can still be uploaded per instance = per particle. Sadly, this proved to be a really bad idea. First of all it's less flexible since I'm not using a geometry shader (and if I did there's no point), and secondly performance was horrible. It was obviously not made for instancing stuff with only 4 vertices, more like 500< or so.

    Myomyomyo.
    Offline StumpyStrust
    « Reply #16 - Posted 2012-10-20 20:12:51 »

    I can do all the stuff I said just I don't know if it is most efficient.

    With my batcher I get 13-15fps slower then you (I was off before) but with your app at about 220k I drop below 60 fps.
    at 500k I get 30fps or lower.


    I have a Q6700 quad core (2.66 hz), 3 gig ram, and BFG Geforce GTS 250. (over clocked and 1 gig of vram) Nothing really nice but I cam play most games on high settings so w/e


    Offline theagentd
    « Reply #17 - Posted 2012-10-20 20:42:43 »

    Okay, I have a GTX 295 (though using only one GPU), so I guess you're hitting a GPU bottleneck. That's not really a bad thing though, since that means you have plenty of CPU time left for other stuff.

    Myomyomyo.
    Offline StumpyStrust
    « Reply #18 - Posted 2012-10-20 22:52:54 »

    I think it is a cpu bottle neck with a fillrate bottle neck as my cpu is 100% during the app...and probably gpu. I need a new computer this one is 5 years old...

    Offline davedes
    « Reply #19 - Posted 2012-10-20 22:59:01 »

    Another way to improve fill rate for circle-shaped particles is to not use quads (i.e. draw a circle made up of GL_TRIANGLE_FAN). I haven't tested this in practice, so the tradeoff may not be worth it, but it would be better suited for a geometry shader implementation.

    Quote
    Performance is the biggest difference. Your two JARs can handle around 75-80k particles at 60 FPS on my comp, and around 60k when I hold down the mouse. Mine runs with 1100k particles at 60 FPS and is only very slightly affected by holding the mouse, but that's because I simply use all cores to update and write the sprite data to the VBO. That makes it RAM limited since there's so much processing power available. That's cheating you say? Okay, with one thread 600k particles at 60 FPS, 425k when I hold down the mouse.
    Impressive. Last time I tested geometry shaders I didn't notice that much of an increase in sprite count; I'll have to give it another go.

    Can't run your tests since it uses GL 3.0. Would be interested to see a GL 2.0 version (through extensions).

    Offline theagentd
    « Reply #20 - Posted 2012-10-21 00:34:42 »

    Can't run your tests since it uses GL 3.0. Would be interested to see a GL 2.0 version (through extensions).
    That's not possible. There's no OGL 2.0 GPU that supports the extension. When going from DX9 to DX10 there was a huge hardware change. Before the GPUs had different types of cores for vertex processing and fragment processing which fits traditional rendering well. However, with DX9 came the possibility to do deferred lighting, which means that you first render the geometry and store its light data and then do the lighting with the extra information (normals, material, etc) stored in the first pass. The first pass is limited by vertex processing and ROP performance (lots of vertices, low per pixel math cost (no lighting), high amount of data stored per pixel (ROP load)). That meant that the pixel processors basically sat idle. In the second pass we had the reverse problem. We're drawing simple light geometry (few vertices), but they cover lots of pixels and require heavy calculations per pixel. In this case the vertex processors sit idle. For this reason GPUs moved over to unified architectures where there's just one type of core in the GPU but they can process both vertices and pixels. This allows them to be used more efficiently when the pixel-vertex load is unbalanced. It also opened up for GPU computing and geometry shaders, and removed all the silly limitations from vertex shaders (only fragment shaders could do texture reads). Instead of making a whole new core type for geometry shaders they just made them flexible enough to do everything.

    Myomyomyo.
    Offline davedes
    « Reply #21 - Posted 2012-10-21 08:30:23 »

    Quote
    There's no OGL 2.0 GPU that supports the extension.
    Are we talking about GL_EXT_geometry_shader4?

    Maybe I should have said GL 2.1+ version. My Mac supports geometry shaders through extension.

    Offline theagentd
    « Reply #22 - Posted 2012-10-21 18:04:56 »

    Geometry shaders is only supported by GPUs with a unified shader architecture (called Shader Model 5 in DirectX). If your Mac has a GPU that supports OGL 3 but no decent drivers, then that's a different story of course. I'd have to convert the shader to use an old GLSL version, but other than that it should work fine. It just feels bad to hack together stuff because Apple can't be bothered to make decent drivers...

    I'll see what I can do, but I'm not really familiar with so old geometry shaders.

    Myomyomyo.
    Offline StumpyStrust
    « Reply #23 - Posted 2012-10-24 01:51:15 »

    So I got a little excited and am working on making a geometry shader sprite batcher but all I have to go off of is what theagentd posted. I have it so points go into quads (used your shader theangentd sorry) but I do not use premultiplied alpha so I had to change the frag shader.

    For color I use bytes 1-255. Which makes things more confusing. Now I am not really sure wth is going on when it comes to color and textures as I either get blank screen or everything just taking on 255/1f (rgba). The performance boost is about 10 fps regardless of how many particles there are. Not as big as theangentd implmentation as he used multi threading.

    Reason why I also have fps drop when you click is be cause I normalize a vector in the physics which is a big performance killer.

    Offline theagentd
    « Reply #24 - Posted 2012-10-24 11:57:02 »

    Are you using custom shader attributes or the built-in ones? It's better to use custom ones. Can you post the code that you use to setup your rendering?

    Here's the code I used to set it up:

    Initialization code:
    1  
    2  
    3  
    4  
    5  
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    14  
       private static final int PARTICLE_SIZE = 20;
       private static final int POSITION_OFFSET = 0;
       private static final int SIZE_OFFSET = 8;
       private static final int COLOR_OFFSET = 16;
       
       private ShaderProgram geometryShader; //My custom shader program class

            //After shader has been loaded from file:
       int gPositionLocation = geometryShader.getAttribLocation("position"); //Calls GL20.glGetAttribLocation(program, position);
       int gSizeLocation = geometryShader.getAttribLocation("size");
       int gColorLocation = geometryShader.getAttribLocation("color");
         
       geometryShader.bind();
       glUniform2f(geometryShader.getUniformLocation("screenSize"), Display.getWidth(), Display.getHeight());


    Rendering code:
    1  
    2  
    3  
    4  
    5  
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    14  
    15  
    16  
    17  
    18  
    19  
          geometryShader.bind();
         
          glBindBuffer(GL_ARRAY_BUFFER, dataVBO);

          glVertexAttribPointer(gPositionLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, POSITION_OFFSET);
          glVertexAttribPointer(gSizeLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, SIZE_OFFSET);
          glVertexAttribPointer(gColorLocation, 4, GL_UNSIGNED_BYTE, true, PARTICLE_SIZE, COLOR_OFFSET);

          glEnableVertexAttribArray(gPositionLocation);
          glEnableVertexAttribArray(gSizeLocation);
          glEnableVertexAttribArray(gColorLocation);
         
         
          glDrawArrays(GL_POINTS, 0, particles.size());

         
          glDisableVertexAttribArray(gPositionLocation);
          glDisableVertexAttribArray(gSizeLocation);
          glDisableVertexAttribArray(gColorLocation);

    Myomyomyo.
    Offline StumpyStrust
    « Reply #25 - Posted 2012-10-28 05:35:01 »

    Sorry for not responding sooner but RL can take time away (got an 80% in discrete math).

    I decompiled your program just to see how you were doing everything and lets just say that I had an epiphany. I think I finally grasp how exactly shaders and such thing work in actual code (you know, after seeing some real code). I used your shader and ShaderProgram classes.

    This is how i set up the attributes.

    1  
    2  
    3  
    4  
    5  
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    14  
    15  
    16  
    17  
    18  
    19  
    20  
    21  
    22  
    23  
    24  
    25  
    26  
    27  
    28  
    29  
    private void render()
       {
          vertBuff.put(vertArray);
          vertBuff.flip();
          colBuff.put(colorArray);
          colBuff.flip();
          texBuff.put(sizeArray);
          texBuff.flip();
         
            //glVertexPointer(2, 0, vertBuff);
            glVertexAttribPointer(gPositionLocation, 2, false, 0, vertBuff);
          //glColorPointer(4,true, 0, colBuff);
            glVertexAttribPointer(gSizeLocation, 2, false, 0, texBuff);
            //glTexCoordPointer(2, 0, texBuff);
            glVertexAttribPointer(gColorLocation, 4, true,false, 0, colBuff);
           
            glEnableVertexAttribArray(gPositionLocation);
            glEnableVertexAttribArray(gSizeLocation);
            glEnableVertexAttribArray(gColorLocation);
           
            glDrawArrays(GL_POINTS, 0, draws);
            vertBuff.clear();
            colBuff.clear();
            texBuff.clear();
            vertIndex = 0;
          colIndex = 0;
          sizeIndex = 0;
          draws = 0;
       }


    I figured out why color was messed up. I am using 0-255 bytes for color but in the shaders they expect 0-1 (floats) by dividing the colors in the shader by 255 I get correct colors but the alpha is still off. I completely got rid of the textures stuff as they were never showing up.

    I also got different sized images working in the geometry shader by using the x and y from the size passed in.

    Here are the shaders.

    vert  Pointing same as yours
    1  
    2  
    3  
    4  
    5  
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    14  
    15  
    16  
    17  
    18  
    19  
    #version 330

    #define POSITION 0
    #define SIZE 1
    #define COLOR 2

    layout(location = POSITION) in vec2 position;
    layout(location = SIZE) in vec2 size;
    layout(location = COLOR) in vec4 color;

    out vec2 vPosition;
    out vec2 vSize;
    out vec4 vColor;

    void main(){
       vPosition = position;
       vSize = size;
       vColor = color;
    }



    geometry  Pointing changed a bit
    1  
    2  
    3  
    4  
    5  
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    14  
    15  
    16  
    17  
    18  
    19  
    20  
    21  
    22  
    23  
    24  
    25  
    26  
    27  
    28  
    29  
    30  
    31  
    32  
    33  
    34  
    35  
    36  
    37  
    38  
    39  
    40  
    41  
    42  
    43  
    #version 330

    layout(points) in;
    layout(triangle_strip, max_vertices = 4) out;

    uniform vec2 screenSize;

    in vec2[] vPosition;
    in vec2[] vSize;
    in vec4[] vColor;

    out vec2 texCoords;
    out vec4 color;

    vec4 toScreen(vec2 pos){
       vec4 result = vec4(pos * 2 / screenSize - 1, 0, 1);
       result.y = -result.y; //Flip y ftw
        return result;
    }

    void main() {
       //if i divide everything by 255 I get correct colors but not alpha.  
       color = vColor[0];
       
       gl_Position = toScreen(vec2(vPosition[0].x - vSize[0].x/3,vPosition[0].y - vSize[0].y/3));
       texCoords = vec2(0, 0);
       EmitVertex();
       
       gl_Position = toScreen(vec2(vPosition[0].x + vSize[0].x/3,vPosition[0].y - vSize[0].y/3));
       texCoords = vec2(0, 1);
       EmitVertex();
       
       gl_Position = toScreen(vec2(vPosition[0].x - vSize[0].x/3,vPosition[0].y + vSize[0].y/3));
       texCoords = vec2(1, 1);
       EmitVertex();
       
       gl_Position = toScreen(vec2(vPosition[0].x + vSize[0].x/3,vPosition[0].y + vSize[0].y/3));
       texCoords = vec2(0, 1);
       EmitVertex();
       
       
        EndPrimitive();
    }


    frag  Pointing changed
    1  
    2  
    3  
    4  
    5  
    6  
    7  
    8  
    9  
    10  
    11  
    12  
    13  
    #version 330

    uniform sampler2D color_texture;

    in vec4 color;

    out vec4 fragColor;

    void main()
    {
            //just playing with color here
       fragColor =  vec4(.3,.6,color.z*.04,0.1);
    }


    I am not using VBOs vertex arrays. 3 of them, one for position, size, and color. It was position, texture coords and color.

    Here is the full GeomBatch source maybe I am doing something wrong.
     
    http://pastebin.java-gaming.org/c0d2f129c22

    I am not sure if I should continue with making a geometry shader sprite batcher as right now using just vertex arrays it is rather fast and works on just about everything. I will say that the performance boost is nice (about 20+ fps at 50k+ sprites) but yours uses multi threading which makes the result look much better then they are without it and I kinda....ok really want to try an implement in my batcher for giggles.  Cool

    FYI once I get everything a bit more matured I am going to make a tutorial on how to make a sprite batcher first with just VA then VBO and if all goes well, geometry shaders. I think it is a big step going from glBegin, glEnd to VBO but once you get there, everything is nice.

    PS: How did you learn all this opengl stuff? did you start with C++? or is there some special, secret cave around here where they keep all the scrolls on this stuff?  Huh


    Offline badlogicgames
    « Reply #26 - Posted 2012-10-28 09:55:26 »

    I've never looked into geometry shaders, that looks hilariously simple and fun! To bad we can't add that to libgdx due to GLES 2.0 being a bit sucky.

    Stumpy, a tutorial would be rather neat Smiley

    http://www.badlogicgames.com - musings on Android and Java game development
    Offline theagentd
    « Reply #27 - Posted 2012-10-28 12:53:18 »

    @StumpyStrust

    Looks great. You don't really need VBOs, it's still a batcher since you're drawing all sprites with just one draw call. VBOs are mainly used to store data on the GPU, but if you're reuploading it every frame anyway you don't really need to use them.

    There's a parameter for glVertexAttribPointer() that controls whether the data should be normalized or not. It's the boolean value in the middle.

    1  
    2  
    3  
    glVertexAttribPointer(gPositionLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, POSITION_OFFSET);
    glVertexAttribPointer(gSizeLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, SIZE_OFFSET);
    glVertexAttribPointer(gColorLocation, 4, GL_UNSIGNED_BYTE, true, PARTICLE_SIZE, COLOR_OFFSET);

    Notice that only gColorLocation is normalized. You might have yet another problem somewhere if your alpha values aren't working out.


    I pretty much naturally started moving over to LWJGL from Java2D once I started scaling my 2D tiles and FPS dropped to around 4. Went over to LWJGL and I got a few hundred. Bought an OpenGL book (The OpenGL Superbible) to get me started. Sadly it was a bit outdated when it comes to shaders and OpenGL 3+, but the latest edition seems to have been updated with OpenGL 3 code. Anyway, after getting the basics I just did some 2D lighting stuff with FBOs for a while and learned about shaders, and THEN things got really interesting! I like books to get me started, but once you're up and running the internet is a much better source of up-to-date information.

    Myomyomyo.
    Offline StumpyStrust
    « Reply #28 - Posted 2012-10-29 04:32:54 »

    Well got everything working.... Grin Grin Grin Grin textures and all. The color was all right. That boolean was for whether it was an unsigned byte. Turns out, I had been doing something with the color in one of my shaders. The ones I posted were not the ones being used.. Cranky

    Got to say though that now I almost instantly get fillrate limited before cpu if the sprites are larger then 8+ pixels. So I think I will stay away from multithreading this as most will probably get fillrate limited first.

    Here are a bunch of screens while testing. FPS drops because of mass particles and particle physics. FYI, if you have dual monitors and have set them up properly, particle physics across them is simply delicious.








    I will try writing a tutorial on all this once I clean things up and have time. (probably next week end)


    This is for all the buzz from Roquen and DrHalfway about random number generation.
    Here is generating 50k particles at 6*6 pixels in size using javas Random class for randomizing velocity. If you want me to try using your PRNG algorithms just ask.  Grin



    And thanks a bunch theangentd.  Cool +9000!!!!!!!!

    Offline theagentd
    « Reply #29 - Posted 2012-10-29 21:14:00 »

    You're right about being fill rate limited, and by simply batching you're winning a lot of performance. However, although your particle rendering code is GPU limited it still uses a fair amount of CPU time, often just waiting for data to be written or read to RAM since that's one of the biggest bottlenecks here. It's possible to thread an actual game implementation in a much simpler way. For example, a very simple particle engine without any dynamic geometry collision detection doesn't depend on anything else in the game, so it's easy to just put the particle updating and buffer loading in it's own thread and forget about it. Just trigger an update when you start a frame and wait for it to finish at the end of the frame in case you have too many particles, then just draw them all on the normal thread. The win here is that in most cases this will make your particles 100% free when it comes to CPU time, since the particles are updated in parallel to the main game and a huge amount of computers have dual cores. It's also dead simple to implement since you're just calling particleSystem.update() from a different thread (or something like that) + some basic syncing logic between the main thread and the particle thread.

    Bonus info: Even hyperthreading helps a lot in this case since we're so memory limited. To simplify things a lot, hyperthreading allows the CPU to have two threads loaded at the same time and to quickly switch between them in case one of them hits a cache miss and needs to wait for data from RAM. It's not two cores, but it allows your one core to be utilized better and does do a lot of difference in this specific use case.

    There's also another real performance problem for a practical particle engine: creating and removing particles. If you create a lot of particles you might produce a lot of garbage which might cause hitches when the garbage collector kicks in and has to release megabytes of built-up garbage. For a smooth experiment it's better to pool particles. DO NOT use a LinkedList for this, use an ArrayList and use list.remove(list.size() - 1) to pick the last recycled particle to avoid shifting the whole array like list.remove(0) would. Pooling actually hurts performance very slightly, but a smooth experience is more important than a slightly higher average FPS with 100ms spikes now and then.

    Another huge performance problem is if you're using an ArrayList to hold your (alive) particles and a random particle in the middle of the list dies. A list.remove(index) causes all following particles to be shifted one step to the left. It's done with a very fast System.arraycopy() call, but it's still really slow if you have thousands of particles and a high number of them dies in a short period of time, which might cause FPS spikes or drops at inconvenient times. Instead use two ArrayLists and copy all surviving particles between them each frame. Please see this excellent example by Cas on how to implement such a system. It's meant to be used for game objects since they suffer the same problem, but the problem is actually even bigger for particles since we usually have a lot more particles than objects.

    That's pretty much all I have for now. Well, you could throw in real multithreading (with X threads instead of just one extra), but it just gets awfully complicated for no good reason due to fill rate limitations. As long as you solve the problem of creating and removing particles you should be fine without any threading at all and pretty much 100% fool proof with the above simple threading idea. This should give you a particle system with very stable CPU performance in a real game.

    Myomyomyo.
    Pages: [1] 2
      ignore  |  Print  
     
     
    You cannot reply to this message, because it is very, very old.

     

    Add your game by posting it in the WIP section,
    or publish it in Showcase.

    The first screenshot will be displayed as a thumbnail.

    DarkCart (14 views)
    2014-10-31 21:44:48

    DarkCart (18 views)
    2014-10-31 21:43:57

    TehJavaDev (40 views)
    2014-10-27 03:28:38

    TehJavaDev (30 views)
    2014-10-27 03:27:51

    DarkCart (44 views)
    2014-10-26 19:37:11

    Luminem (26 views)
    2014-10-26 10:17:50

    Luminem (30 views)
    2014-10-26 10:14:04

    theagentd (36 views)
    2014-10-25 15:46:29

    Longarmx (64 views)
    2014-10-17 03:59:02

    Norakomi (62 views)
    2014-10-16 15:22:06
    Understanding relations between setOrigin, setScale and setPosition in libGdx
    by mbabuskov
    2014-10-09 22:35:00

    Definite guide to supporting multiple device resolutions on Android (2014)
    by mbabuskov
    2014-10-02 22:36:02

    List of Learning Resources
    by Longor1996
    2014-08-16 10:40:00

    List of Learning Resources
    by SilverTiger
    2014-08-05 19:33:27

    Resources for WIP games
    by CogWheelz
    2014-08-01 16:20:17

    Resources for WIP games
    by CogWheelz
    2014-08-01 16:19:50

    List of Learning Resources
    by SilverTiger
    2014-07-31 16:29:50

    List of Learning Resources
    by SilverTiger
    2014-07-31 16:26:06
    java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
    Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!