Java-Gaming.org Hi !
Featured games (81)
games approved by the League of Dukes
Games in Showcase (513)
Games in Android Showcase (119)
games submitted by our members
Games in WIP (577)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  A particle buffer implementation  (Read 3588 times)
0 Members and 1 Guest are viewing this topic.
Offline theagentd
« Posted 2011-10-24 21:21:48 »

A few months ago I experimented with CPU particles drawn using OpenGL and tried to get as good performance as possible with them. I added all sorts of features to make it as quick as possible, for example multi-threading support, keeping particle data in "structs" using Riven's MappedObject, e.t.c. After having improved it little by little until now, I decided to tackle the biggest problem of it so far: the inflexibility of having a large buffer full of randomly dying particles.

Basically the problem is that any one particle can die at any moment due to its life time running out, leaving a gap in the particle buffer. If I want to draw it all with glDrawArrays, I'd either have to compact the buffer on the CPU or discard dead particles in a geometry shader, but in the shader approach, I'd have to send a lot more data to OpenGL each frame, not to mention the insane fragmentation that builds up in the buffer. I decided to try to keep the particle buffer compacted each frame using the CPU instead.

I'm going to simplify the example a little. Pretend I'm using an Particle[] to store my Particle instances. I keep track of its capacity (length) and its current size (used space), similar to a Buffer instance. If the buffer is full and I try to add a new Particle, I double the capacity similar to an ArrayList.

Now, the real difference in how I do things is how I update my particles. Particle updating happens partly in my createParticle() method (!!!). Why? I do the updating there to be able to locate a dying particle and return it instead of just adding it to the end. This obviously reduces fragmentation and reuses objects, but not that much. In an optimal scenario where 100 particles die per frame and 100 particles are created per frame, it will stay completely compact, but that is pretty unrealistic for explosions, bursts, e.t.c. Just having random life time pretty much guarantees that the number of dying and created particles will be different each frame.

So after creating all new particles, we have a half updated Particle array, which is compact up to the last updated Particle. I have a second method (finishUpdate()) which is supposed to update the remaining particles. First it just continues to update particles until it encounters a particle that dies. Then it continues updating and checks how many particles that die in a row (often just one, but it could be very many too). Then it continues updating and checks how many particles that -do NOT die are in a row. These still alive particles are copied to a lower index to keep it compacted using System.arraycopy(). I then repeat these last two until all Particles are updated. Haha, I guess that wasn't very clear...

TL;DR: I basically avoid the System.arraycopy on each remove that would've happened if I used ArrayList.remove(), only copying each alive Particle instance only once (or not at all if it was kept compact due to newly created particles in the first step).

Obviously things are slightly more complicated compared to how I described it above. I'm using Riven's MappedObject, so I also have to keep track of a ByteBuffer and copy around the data needed by the GPU (I have a Particle instance paired with a MappedObject containing only the data relevant to the GPU as it is much faster).

My original particle test simply created a new particle every time a particle died, which kept the number of particles constant and the buffer compacted. My current test, a firework simulator, creates particles for firework trails/tracers, and also lots of particles when the firework explodes. All particles have very random life times, but I do shoot fireworks at regular intervals, so the amount of reuse is still quite high.

The performance is great, being only marginally slower than my original particle test. The firework test only runs single-threaded at the moment, so I'm comparing it to my old particle test using only one thread too.
 - My old test runs very stable at 72-73 FPS with 510.000 particles.
 - The firework test runs at 69-72 FPS, with particles varying between 500.000 and 525.000.

In short: the performance is excellent compared to my old static test. Oh, did I mention that the fireworks look awesome?

How would you handle a dynamic particle system? Is there a better way? Keeping everything on the graphics card and updating it using draw commands would obviously be a lot faster, but is there any other way of handling the data on the CPU? I though about not keeping the array compacted, and instead generate a list of indices containing only the currently alive particles. This would however force me to loop over the whole array when updating, even if I only have a few Particles which would slow down rendering only a few indices. I'd also have to build that index array every frame, which would be pretty slow if I have many particles. Finally I'd have to send the whole data buffer to OpenGL each frame, instead of only the used part. I'm pretty sure that would be slower...

EDIT: Paint skills!

Notice how the right-most block isn't moved twice, only once.

Myomyomyo.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 816
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #1 - Posted 2011-10-24 21:37:16 »

Use glDrawElements instead of glDrawArrays. Create an VBO with indices. You don't have to move data around in the ByteBuffer of the MappedObject. You rebuild the entire index-buffer every frame, which will be blazing fast.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Online Mike

JGO Wizard


Medals: 86
Projects: 1
Exp: 6 years


Java guru wanabee


« Reply #2 - Posted 2011-10-24 22:11:56 »

In short: the performance is excellent compared to my old static test. Oh, did I mention that the fireworks look awesome?

Screenshot! Smiley

Mike

My current game, Minecraft meets Farmville and goes online Smiley
State of Fortune | Discussion thread @ JGO
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #3 - Posted 2011-10-24 22:34:24 »

Use glDrawElements instead of glDrawArrays. Create an VBO with indices. You don't have to move data around in the ByteBuffer of the MappedObject. You rebuild the entire index-buffer every frame, which will be blazing fast.
I mentioned that in my first post, but that would force me to iterate over lots and lots of dead particles, and it's basically already memory/cache performance limited. I'll try it out though, just not right now. xD


It looks awesome thanks to the HDR and my bloom filter. I also have a twinkling effect on the particles (which is quite slow though, requires a sin() xD). The FPS is extremely limited by my dead slow bloom filter (around 90).


These fireworks spew out 5000 particles each, and one is launched each update (at 60 FPS, 60 per second). I've disabled the bloom filter (T_T), and as you can see, I have 450 000 particles running at 69 FPS.

Myomyomyo.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 816
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2011-10-24 22:48:45 »

Use glDrawElements instead of glDrawArrays. Create an VBO with indices. You don't have to move data around in the ByteBuffer of the MappedObject. You rebuild the entire index-buffer every frame, which will be blazing fast.
I mentioned that in my first post, but that would force me to iterate over lots and lots of dead particles, and it's basically already memory/cache performance limited. I'll try it out though, just not right now. xD


Aww. I stopped reading after
Quote
How would you handle a dynamic particle system? Is there a better way?
Emo

Anyway, you have to iterate over them anyway, to determine what to cleanup/compact. Actually compacting it requires a lot of I/O, especially if you have a dead slot at index 0 (if you can't fill it with a new particle). At the rate you're producing/losing particles it's fair to assume you're touching most of the data, as your particles have a randomized lifespan. Have you considered moving the data at the end of the VBO to the beginning, instead of pushing everything back? Might save you a lot of I/O at the expense of more complex calculations.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline theagentd
« Reply #5 - Posted 2011-10-24 22:57:13 »

Haha, don't worry. My reason for not using indices are just assumptions. I don't have any real world data to prove it with. Yet. =D

Myomyomyo.
Offline cylab

JGO Ninja


Medals: 52



« Reply #6 - Posted 2011-10-25 08:44:35 »

I mentioned that in my first post, but that would force me to iterate over lots and lots of dead particles, and it's basically already memory/cache performance limited. I'll try it out though, just not right now. xD

Would you? Don't you only need to iterate over currently active particles?

Mathias - I Know What [you] Did Last Summer!
Offline theagentd
« Reply #7 - Posted 2011-10-25 17:18:12 »

Anyway, you have to iterate over them anyway, to determine what to cleanup/compact. Actually compacting it requires a lot of I/O, especially if you have a dead slot at index 0 (if you can't fill it with a new particle). At the rate you're producing/losing particles it's fair to assume you're touching most of the data, as your particles have a randomized lifespan.
Correct, but I also insert new particles into the buffer before the compacting, so I'm trying to avoid this first-particle-dies case. I also copy them in chunks using System.arraycopy() and ByteBuffer.put(ByteBuffer), which apparently is really fast. Everything is also done in a single pass, so it seems to be quite fast. xd

I only have to iterate over the particles that survived the last update. Basically:
Number of updates = the number of alive particles
Data copied = between 0 and the number of surviving particles after updating (worst case scenario obviously, usually much better)

How would putting the data in the end of the VBO help
Have you considered moving the data at the end of the VBO to the beginning, instead of pushing everything back? Might save you a lot of I/O at the expense of more complex calculations.
Nice idea! Fill in the holes with particles from the end of the list instead would indeed only have me copy a number of particles based on how many that died, not how many that survived.

I mentioned that in my first post, but that would force me to iterate over lots and lots of dead particles, and it's basically already memory/cache performance limited. I'll try it out though, just not right now. xD

Would you? Don't you only need to iterate over currently active particles?
Well, obviously, but these active particles are spread out over the whole buffer, so I thought I would have to iterate over the whole buffer to find the active ones. I now realized that I can just use the indices in the index buffer to find them. This still leaves me with having to send the whole data buffer to OpenGL each frame instead of only the active part, plus an index buffer, which would have to be 32-bit ints due to the number of particles...

I've been trying to profile my little firework test, but MappedObject's fork() makes the whole game a black box! T_T I think I've found out why I get so good performance though: 98-100% of the particles are already compacted after inserting new particles. The majority of all updates require no copying at all, and when I need to copy them it's only about 6 000 out of 255 000 total particles. Basically the more particles you create, the better the performance becomes. xD If I suddenly stop firing fireworks, the number of copies peak at 25 000, which isn't much at all in my opinion.

Myomyomyo.
Online Riven
« League of Dukes »

JGO Overlord


Medals: 816
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #8 - Posted 2011-10-25 18:04:26 »

32 bit index buffers are dog slow. I'd advice to not even bother in that direction.

I've been trying to profile my little firework test, but MappedObject's fork() makes the whole game a black box! T_T
Well yeah, it aint pretty.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline theagentd
« Reply #9 - Posted 2011-10-25 18:12:59 »

I could keep them in batches of 65536 to use short indices. I've also heard bad things about 32-bit indices...

Myomyomyo.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #10 - Posted 2011-10-25 18:23:02 »

In my profiling batching buffers with short indexes is quite a bit faster than using one 32 index one. It was something like 500k indexes IIRC. But this wasn't tested on new hardware. IIRC the 8800GT was the newest one tested and no ATi.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline SmokeNWrite

Junior Duke


Projects: 1



« Reply #11 - Posted 2011-11-24 01:26:51 »

Might start work on a new particle engine Smiley

Yeah, I think I might, hey when can you post links?
Offline theagentd
« Reply #12 - Posted 2011-11-24 05:05:13 »

Might start work on a new particle engine Smiley

Yeah, I think I might, hey when can you post links?

Did someone say links?

Myomyomyo.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Longarmx (49 views)
2014-10-17 03:59:02

Norakomi (38 views)
2014-10-16 15:22:06

Norakomi (31 views)
2014-10-16 15:20:20

lcass (35 views)
2014-10-15 16:18:58

TehJavaDev (65 views)
2014-10-14 00:39:48

TehJavaDev (65 views)
2014-10-14 00:35:47

TehJavaDev (55 views)
2014-10-14 00:32:37

BurntPizza (72 views)
2014-10-11 23:24:42

BurntPizza (43 views)
2014-10-11 23:10:45

BurntPizza (84 views)
2014-10-11 22:30:10
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!