Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (539)
Games in Android Showcase (132)
games submitted by our members
Games in WIP (603)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Particle Optimization  (Read 6211 times)
0 Members and 1 Guest are viewing this topic.
Offline StumpyStrust
« Posted 2012-01-20 04:57:02 »

So I have been goofing around with particles for a while. I made a simple system in regular java 2D and then redid the system in openGL using LWJGL.

Its 2D not 3D as the original was only 2D.

I am using the fixed function pipeline (glBegin glEnd )

I can get about 15k particles with a steady 60fps. At 30k it drops to 30fps.

I have not tried VBOs because there are very few detailed tutorials I can find on them that use LWJGL and are not out dated.

I have tried using vertex arrays and they would be faster if particles did not move. But, because particles move you have to rebuild the vertex array every frame.
(at least from my understanding)
Even while doing this I can get about the same performance as fixed function. If I do not rebuild every frame I get about 40-50k before the fps drops to 30 and lower.

Any tips/ideas from speeding up the system? Links to good tutorials on how to use VBOs or vertex arrays?  Huh

PS: It already looks great but I just want to speed it up enough that it could run on most computers.   Grin

Offline ryanm

Senior Devvie


Projects: 1
Exp: 15 years


Used to be bleb


« Reply #1 - Posted 2012-01-20 05:15:01 »

Vertex arrays should be far, far faster than glVertex calls. How are you using them? Can you describe your render loop?
Offline StumpyStrust
« Reply #2 - Posted 2012-01-20 05:39:34 »

Every frame I create an array of all the particles positions. The particles are textured triangle strips as from what I have heard, they are faster then quads.

This means that each particle will have 4 points each point has 3 coordinates. 4 * 3 = 12. 

Once thats done, I create a float buffer, put the array into it, Flip the buffer and drawArrays.

I do not know if making an array of 360k values every frame is what is slowing it. (30k particles * 12 = 360k)

If I don't remake the array every frame I get close to double the performance. But, stationary particles are kinda lame.  Cheesy

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Danny02
« Reply #3 - Posted 2012-01-20 16:23:28 »

u can use point spirits, which means u upload only a single vertex per particle.

You would do it either be using the point spirit extension of opengl which will expand a vertex for u to screen aligned quads, or you can use vertex/geometrie shader to expand the quad/triangle your self
Offline pitbuller
« Reply #4 - Posted 2012-01-20 17:36:32 »

With libGDX using VBO:s is dead simple. Just create mesh and its done. You want to do static vbo and update particles with shader. One time uniform should be enough. I get about 50k particles on android with that setup and its fill rate limited. Textured points are fastest but support is very flaky. Quads are lot simpler and better with big particles but triangles scale better with small particles, but uv coordinates is damn hard to get right and texture need some padding to avoid problems.

Vertex shader look something like that:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
 vec4 pos = a_position;
 v_texCoords = a_texCoord;
  float life = (time - a_time.x);
  if (life > 0.0){"
    float lifeLine = 1.0 - (life * a_time.y);//time is inversed
    v_color = a_color*lifeLine;
     pos += vec4(a_vel, 0.0, 0.0)*life;
     pos.y += life*life*-9.81*0.5;\n"

  }"
gl_Position = u_projectionViewMatrix * pos;
Offline StumpyStrust
« Reply #5 - Posted 2012-01-20 21:33:48 »

Yes I was going to try point sprites but every thing I have read about them says that they can sometimes work and sometimes not.

Oh and I was staying away from shaders because I thought that android did not support them. I also was staying away from them because they are kinda confusing.  Huh

And will libGDX work well with lwjgl?

It seems for me at least that the problem I am having is a lack of tutorials on how to use VBOs, shaders, and what not, with java. I just don't know enough to be able to convert c++ opengl code into lwjgl java code.  Cranky

Offline sproingie

JGO Kernel


Medals: 202



« Reply #6 - Posted 2012-01-20 21:48:00 »

Point sprites are very iffy propositions, and you're wise to steer clear of them.  You may find they look fine on one machine, and another one renders them at a fraction of the size you wanted.

Android supports OpenGL ES 2.0, which more or less supports nothing but shaders.  GDX also has an lwjgl backend.  GDX is a medium-level API however, and for sprite manipulation you don't actually have to muck with the shaders and VBOs and whatnot yourself.
Offline pitbuller
« Reply #7 - Posted 2012-01-20 23:57:58 »

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
   public void render() {
      time += Math.min(0.1f, Gdx.graphics.getDeltaTime());
      if (time > 4)
         time=0;
     
      Gdx.gl20.glClear(GL20.GL_COLOR_BUFFER_BIT);
      Gdx.gl20.glEnable(GL20.GL_TEXTURE_2D);
      Gdx.gl20.glEnable(GL20.GL_BLEND);
      Gdx.gl20.glBlendFunc(GL20.GL_SRC_ALPHA, GL20.GL_ONE);
      texture.bind(0);
      shader.begin();
         shader.setUniformMatrix("u_projectionViewMatrix", camera.combined);
         shader.setUniformf("time", time);
         particleMesh.render(shader, GL20.GL_TRIANGLES);
      shader.end();
   }

This java code here is all that you need to render particles with VBO + shader. Shader code is less than 10 lines. Initializing particles to mesh is about 10lines too. It's dead simple after you understand the pipeline. Fixed function pipeline is lot more confusing.

Just read this http://www.arcsynthesis.org/gltut/ and start full blown shader stuff with libgdx.
Offline theagentd

« JGO Bitwise Duke »


Medals: 365
Projects: 2
Exp: 8 years



« Reply #8 - Posted 2012-01-21 04:47:11 »

If I may tempt you with a screenshot:


1 000 000 particles, multi-threaded, 65 FPS on a laptop i5-2410 at 2.7GHz, and still CPU-limited.

You have a long way to go, young padawan!


Any decent GPU can process 1 000 000 triangles at 60 FPS. The real problem is fill-rate. I draw my particles as simple points. However, once I go over a point size of 4 it starts being GPU limited. This is the same pixel area as a 4x4 quad. Point smoothing further increases this to a 5x5 quad to be able to do its anti-aliasing and increases the cost of each pixel because of the coverage calculations. On top of this we also have blending which further increases the cost of rendering each pixel slightly.

All in all: 5x5 quads = 25 pixels per particle. 25 x 1 000 000 = 25 000 000 pixels to process each frame. For reference a 1920x1080p monitor has about 2 million pixels. I'm pretty much filling all pixels of such a screen 12.5 times.

If I turn of point smoothing with a point size of 4 the GPU-load decreases a lot, and I can manage a point size of 6 at 68 FPS. 6x6 (square) points x 1 000 000 particles = 36 000 000 pixels per frame. GPUs are awesome! =D

Particle rendering benefits a lot from OpenGL 3.0 hardware or hardware supporting the extensions needed from OpenGL 3.0. More specifically you can render your particles as points and then expand your points to quads (triangle strips) in a geometry shader.

So far I've focused on particle count and how to increase it. Just remember that many optimization attempts become worthless if your particles simply have a too large pixel area. If you have a 50x50 smoke particle on the screen, they cover 2 500 pixels each if rendered as a quad. Divide the earlier 36 000 000 pixels per frame and we get around 14 400 particles per second. This is of course a very rough estimate, but no amount of geometry shaders or CPU multi-threading is going to increase performance in this case. The only real optimization that you can do is to use more vertices. Your smoke particle texture is most likely round to not give an impression of actually just being a square texture. By approximating a circle using 16-32 vertices you can reduce the pixel area to something closer to the circle area equation (A=PI*r^2) instead of a square's area (side^2). The same 50x50 smoke particle can be rendered as a circle with a radius of 25. 50x50 is still 2 500 pixels, but a circle with r=25 has the area (3.1415*25^2) which is equal to ~1964 pixels. Suddenly we can have around 18 000 particles in a single frame! And yes, the number of vertices increased by 16-32x, but remember that we just pushed millions of vertices when we rendered smaller particles! 18 000 particles, each made of 16 triangles is still only 288 000 triangles. Additionally, these circles can be rendered using instancing which prevents the CPU/bandwidth nightmare of having to replicate all the particle data for each vertex.

TL;DR:
 - It's possible to render millions of particles per frame at 60 FPS if they are small enough.
 - The modern replacement for point sprites is a geometry shader that expands a point to a quad (a triangle strip with 2 triangles).
 - Larger particles are severely fill-rate limited, so rendering them as approximated circles reduces the pixel area by about 20%.

Myomyomyo.
Offline StumpyStrust
« Reply #9 - Posted 2012-01-21 06:03:41 »

Yes! Finally! These are the answers I have been looking for. And I know, I still have so much more to learn.  Cool

I will probably revamp the whole thing and use shaders.

I understand the fill rate and why 1,000,000 particles is simply ridiculous but I don't really want to have 1,000,000 that is crazy.  Shocked
I know that gpus can do a whole lot more than particles as most games particle systems are just one small part they are rendering. My gpu only uses about 10% or less with my current particle system.

My current system I uses textured triangle strips with some of the textures being fairly large. I also have some physics involved.
(not collisions with each particle but things like gravity wells and what not)
The physics drops fps by 1-6 when in use. I posted what I have in the showcase as its pretty fun to play with.

My goal is to make 10k run really smoothly on the android and 50-100k on the desktop. I have a long way to go.  Cranky

But really 1,000,000? their not fancy particles but still damn.



Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd

« JGO Bitwise Duke »


Medals: 365
Projects: 2
Exp: 8 years



« Reply #10 - Posted 2012-01-21 07:07:12 »

But really 1,000,000? their not fancy particles but still damn.
Well, there's more where that came from:

This is 2 000 000 blended and smoothed particles at 64 FPS running in only one thread on the exact same computer. I also had Ra4king test this version on his desktop computer and it managed 5 000 000 smoothed particles at 64 FPS, and around 9 000 000 unsmoothed particles at ~60 FPS.

Gasp! How did I do it? =D

Myomyomyo.
Offline ra4king

JGO Kernel


Medals: 356
Projects: 3
Exp: 5 years


I'm the King!


« Reply #11 - Posted 2012-01-21 08:24:02 »

I also had Ra4king test this version on his desktop computer and it managed 5 000 000 smoothed particles at 64 FPS, and around 9 000 000 unsmoothed particles at ~60 FPS.
Those results were from when I had the GTX 570. Running the GPUTest again, the GTX 580 gets me 9 million smoothed particles and 11 million unsmoothed particles at 60 FPS. Grin

Offline StumpyStrust
« Reply #12 - Posted 2012-01-21 08:27:55 »



I think my computer would die at like 500k.

Offline ra4king

JGO Kernel


Medals: 356
Projects: 3
Exp: 5 years


I'm the King!


« Reply #13 - Posted 2012-01-21 08:29:04 »



I think my computer would die at like 500k.
Hahahahahahahahahahahahahahahahahaha

Offline StumpyStrust
« Reply #14 - Posted 2012-01-21 08:59:56 »

Oh yeah!!! I present to you 1,000,000,000 particles with ambient occlusion, Depth of Field, Motion Blur, HDR, AAx32 and UberSamplingx9000!!!!!



What now?!?! Angry
Oh did you see the fps and cpu usage. Yeah I am a boss... Cranky

Offline theagentd

« JGO Bitwise Duke »


Medals: 365
Projects: 2
Exp: 8 years



« Reply #15 - Posted 2012-01-21 09:17:57 »

I also had Ra4king test this version on his desktop computer and it managed 5 000 000 smoothed particles at 64 FPS, and around 9 000 000 unsmoothed particles at ~60 FPS.
Those results were from when I had the GTX 570. Running the GPUTest again, the GTX 580 gets me 9 million smoothed particles and 11 million unsmoothed particles at 60 FPS. Grin
Grrr... Your graphics card is about as big as my laptop. -_-'


As you might have suspected the second version offloads everything except particle generation to the GPU. Particles are stored in a few textures and updated by pingponging between two sets of these with a shader. Rendering is done by generating a vertex for each possibly alive particle (= for all texels in the texture) and then checking if each particle is alive in a geometry shader. If it is it samples the needed data from the textures and passes it through to the fragment shader. It's a pretty crude attempt at using OpenGL for computing instead of graphics and it has some serious drawbacks because of it, but it's still ridiculously fast compared to updating the particles on the CPU and then copying them to the GPU each frame. For the 11 million unsmoothed particles Ra4king managed to achieve that's a lot of bandwidth saved. The data needed to render my minimal particles are 2D positions and RGBA colors, with alpha being equal to (lifeLeft / lifeTime) of the particle in question. For the CPU version 11 million 12 byte particles equals 126 MBs of data sent to the GPU each frame. At 60 FPS per second, that's 7.37GBs per second, which obviously is ridiculous. For the GPU version I just store all the data on the GPU in the first place. I have position and velocity in a RGBA 32-bit float texture (RG = X Y, BA = VX VY), color in a standard RGB texture and the current life and total life time of each particle in an RG16 texture.

Again, this is a huge hack and I just did it for fun. The proper way of doing GPU particles would be to use OpenCL (that's a C, not a G) to process the particles on the GPU. This would be faster, simpler and more flexible, but most importantly it would adapt a LOT better to varying particle count. The GPU version has to generate a vertex for each texel in the particle textures, regardless of whether the particle actually is alive or not. With OpenCL I could keep the data on the GPU like I do now, but using OpenCL I can pingpong between VBOs instead of textures. Throw in atomic counters and everything becomes so clean that it's almost scary. There's two problems with all this though:

 - How many particles are actually alive? I could count them using atomic counters on the GPU, but how do I get this back to the CPU to tell the GPU how many particles to render without killing performance? And
 - I haven't gotten around actually trying out OpenCL. xD


@StumpyStrust's latest post
Lol. And sorry for having spent (still am spending) way too much time of my life on optimizing particle engines. xD Also, my version would have had better anti-aliasing since GL_POINT_SMOOTH produces all possible shades = 256xAA, but I forgot your uber-sampling. Also note that 1 billion particles would at a minimum take 16 bytes per particle (2 floats for position, 2 floats for velocity, all have the same color and last forever), equaling almost 15 GBs of data. xDDD

Myomyomyo.
Offline R.D.

Senior Devvie


Medals: 2
Projects: 1


"For the last time, Hats ARE Awesome"


« Reply #16 - Posted 2012-01-21 12:03:52 »

I hate you so much :/ I get like 100.000 particles running around my window and you just so like 5.000.000...  Angry
I should just make you my OpenGL guru I guess Cheesy

 
Offline theagentd

« JGO Bitwise Duke »


Medals: 365
Projects: 2
Exp: 8 years



« Reply #17 - Posted 2012-01-21 15:54:25 »

I hate you so much :/ I get like 100.000 particles running around my window and you just so like 5.000.000...  Angry
I should just make you my OpenGL guru I guess Cheesy
I just happen to like optimizing stuff. That's why I haven't finished a single game yet. OpenGL (sadly) isn't going to get you a complete game, just some fancy colors. Wait, that sounds like drugs...

Myomyomyo.
Offline ra4king

JGO Kernel


Medals: 356
Projects: 3
Exp: 5 years


I'm the King!


« Reply #18 - Posted 2012-01-21 22:15:16 »

OpenGL (sadly) isn't going to get you a complete game, just some fancy colors. Wait, that sounds like drugs...
ROFLMAO Grin

Offline StumpyStrust
« Reply #19 - Posted 2012-01-22 11:19:19 »

So as I said before, in my system their is some physics involved and when its active I get a drop of 4-5fps. Now this is not much but I want more complex physics which would mean a greater drop in fps.

Example: all particles testing each other for collisions. A grid based system would improve performance but for a things like water and what not grid based systems would not look very good.

One I idea I have on improving the performance of the calculations is using  bitwise operations. Now from what I understand division is the biggest offender when it comes to speed. So would it be a good idea to cast things to ints so you could use bitwise operations on them? You would lose some accuracy but for some non real world physics simulations I don't think that would be a big problem.

Also, is casting an double/float to an int take that long?

Offline R.D.

Senior Devvie


Medals: 2
Projects: 1


"For the last time, Hats ARE Awesome"


« Reply #20 - Posted 2012-01-22 14:58:17 »

I hate you so much :/ I get like 100.000 particles running around my window and you just so like 5.000.000...  Angry
I should just make you my OpenGL guru I guess Cheesy
I just happen to like optimizing stuff. That's why I haven't finished a single game yet. OpenGL (sadly) isn't going to get you a complete game, just some fancy colors. Wait, that sounds like drugs...

That's one of the reason I really like you. I already have a question for you but I like to ask that via a PN Smiley
Offline theagentd

« JGO Bitwise Duke »


Medals: 365
Projects: 2
Exp: 8 years



« Reply #21 - Posted 2012-01-22 16:48:52 »

So as I said before, in my system their is some physics involved and when its active I get a drop of 4-5fps. Now this is not much but I want more complex physics which would mean a greater drop in fps.

Example: all particles testing each other for collisions. A grid based system would improve performance but for a things like water and what not grid based systems would not look very good.

One I idea I have on improving the performance of the calculations is using  bitwise operations. Now from what I understand division is the biggest offender when it comes to speed. So would it be a good idea to cast things to ints so you could use bitwise operations on them? You would lose some accuracy but for some non real world physics simulations I don't think that would be a big problem.

Also, is casting an double/float to an int take that long?
Having 1000 particles all affecting each other means doing 1000^2 test between them. A grid is a very good way of reducing the number of tests to a very low amount for each particle. Why wouldn't water particles work with grids? I've never heard of water with telekinesis...

Casting to an int, doing a bit-shift and then casting back to a float is definitely more expensive than a single float divide. Especially the float->int cast is slow in my experience. Either use floats or ints, don't cast too much between them...

Note that it is very hard to benchmark casting, since it will often be optimized away in minimal tests and it's also pretty hard to measure the difference. Micro benchmarks are really hard to do the right way and should only be used as hints, not facts in my opinion. I've gotta admit that they are funny as hell to do though. xD


I hate you so much :/
That's one of the reason I really like you. I already have a question for you but I like to ask that via a PN Smiley
TS--- TS--- TS--- TSUNDEREEEEEEEEEEEEE!!!!!!!!!!!!!!!!!!!! Obviously R.D. asked me out. I now officially have a tsundere lover. I consider myself the luckiest man on Earth.

(Joke... ._.)

Myomyomyo.
Offline R.D.

Senior Devvie


Medals: 2
Projects: 1


"For the last time, Hats ARE Awesome"


« Reply #22 - Posted 2012-01-22 19:27:59 »


TS--- TS--- TS--- TSUNDEREEEEEEEEEEEEE!!!!!!!!!!!!!!!!!!!! Obviously R.D. asked me out. I now officially have a tsundere lover. I consider myself the luckiest man on Earth.


xD I told you to not do that!!!!! >:|

Okay, enough off-topic ._.
Offline StumpyStrust
« Reply #23 - Posted 2012-02-01 19:41:06 »

Ok so some times you want particles to overlap and others not to in order to get some cool effects. Also most using a grid based system in 3d is much to great of a hassle and having a grid span a huge world in 3d seems a bit big. I don't know if it really is that hard because I have never programmed a massive game world but it seems very hard.

I have test my particle system with absolutely no rendering just particle updating/physics and with basic movement, I can get 50k-60k particles before a fps drop but with physics its 50k  that starts to drop fps. So rendering is a performance hit but the particle updating is also a huge bottle neck. So I am wondering how other people do their particle updates and what not to see if maybe my system has some issues in the updating.

Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

rwatson462 (30 views)
2014-12-15 09:26:44

Mr.CodeIt (20 views)
2014-12-14 19:50:38

BurntPizza (42 views)
2014-12-09 22:41:13

BurntPizza (76 views)
2014-12-08 04:46:31

JscottyBieshaar (37 views)
2014-12-05 12:39:02

SHC (51 views)
2014-12-03 16:27:13

CopyableCougar4 (49 views)
2014-11-29 21:32:03

toopeicgaming1999 (115 views)
2014-11-26 15:22:04

toopeicgaming1999 (105 views)
2014-11-26 15:20:36

toopeicgaming1999 (31 views)
2014-11-26 15:20:08
Resources for WIP games
by kpars
2014-12-18 10:26:14

Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!