Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (477)
Games in Android Showcase (107)
games submitted by our members
Games in WIP (536)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 [2]
  ignore  |  Print  
  Yet another particle engine update!  (Read 8113 times)
0 Members and 1 Guest are viewing this topic.
Offline matheus23

JGO Kernel


Medals: 106
Projects: 3


You think about my Avatar right now!


« Reply #30 - Posted 2012-11-15 15:41:19 »

I wouldn't recommend a sprite batcher on the GPU. You'd need to run your whole game on the GPU to know HOW to move your sprites around.
Wouldn't that be awesome? Haha Grin
And yeah, a game almost fully run on the GPU would be awesome.

Probably it doesn't count, but what about all those game of life simulators Smiley
They're not really a game, but there are much implmentations which fully run on the gpu.

See my:
    My development Blog:     | Or look at my RPG | Or simply my coding
http://matheusdev.tumblr.comRuins of Revenge  |      On Github
Offline Sickan

Senior Member


Medals: 8



« Reply #31 - Posted 2012-11-15 16:11:53 »

A JVM running on the GPU.
Online Danny02
« Reply #32 - Posted 2012-11-15 17:36:54 »

sometime ago I saw a game posted on glslsandbox which was quite simple, but run only in a a single shader. And the game state was saved in the same texture which was displayed.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #33 - Posted 2012-11-15 17:47:50 »

A JVM running on the GPU.
Shocked

sometime ago I saw a game posted on glslsandbox which was quite simple, but run only in a a single shader. And the game state was saved in the same texture which was displayed.
I've been thinking about porting the Flash game Creeper World to run almost completely on the GPU.


IN OTHER NEWS
I got rid of the 12 additional bytes in the transform feedback version, performance is now 3.55 million particles with transform feedback, up from 3.00 million. Haven't ported the SLI version yet, but it'd most likely hit 7 million particles at 60 FPS.

Myomyomyo.
Offline sproingie

JGO Kernel


Medals: 202



« Reply #34 - Posted 2012-11-15 18:35:55 »

GPUs perform very poorly at heavily-branching code.  If you could get any kind of general-purpose VM running on a GPU at all, I imagine it would perform very poorly on any code that wasn't already suited for GPU execution, i.e. heavily vectorized algorithms.
Offline Sickan

Senior Member


Medals: 8



« Reply #35 - Posted 2012-11-15 21:41:32 »

GPUs perform very poorly at heavily-branching code.  If you could get any kind of general-purpose VM running on a GPU at all, I imagine it would perform very poorly on any code that wasn't already suited for GPU execution, i.e. heavily vectorized algorithms.

Solution: don't write smart, efficient programs, write dumb, heavy working programs.

Cheers! Cheesy
Offline theagentd
« Reply #36 - Posted 2012-12-11 04:56:09 »

Instead of creating a new thread, I decided to just necro this one.

To celebrate my exams being over and the start of winter break (well, okay, I do have a basic Java exam left. Tongue) I decided to create new particle engine thingy. Been working all day, but I finally got it done! It's a collection of old things I've posted here plus a few new features!

 - It's now completely in 3D with particles bouncing around inside a huge box.
 - Like before, updating is done using OpenGL transform feedback. Nothing new here.
 - The particles are rendered as billboarded sprites using a geometry shader. The 4 vertices are generated in eye space and then sent through the projection matrix.

Nothing really huge here, just pretty much a mix of old stuff. The new feature is particle SORTING! I implemented a (very inefficient) radix sort using transform feedback to sort particles based on their depth. My algorithm needs 2 passes per bit of depth precision, meaning that for 24 bit integer depth I need 48 passes over the particles! Shit!

To combat this I decided to also do frustum culling when calculating the depth of each particle. That means that only particles that are actually on the screen will be sorted. All of them are still updated of course. This of course gave me a big performance boost when only a small number of particles are visible, but that's kind of cheating... =S

Anyway, I'm getting 200k sorted particles at 63 FPS (one GPU) at the moment. The particle culling is extremely efficient, improving FPS to 600 when no particles are visible (it's still updating them too). Using OGL4 I could reduce the number of passes needed to sort by a factor of 4 or even 8 at the cost of a small amount of video memory, but for now I'm stuck on OGL3. If anyone knows a more efficient sorting algorithm available in OpenCL or something like that, I'd love to hear about it!

Finally a screenshot!

Myomyomyo.
Offline ra4king

JGO Kernel


Medals: 337
Projects: 2
Exp: 5 years


I'm the King!


« Reply #37 - Posted 2012-12-11 06:37:29 »

Oh that's cool, get us all excited and all without giving a demo link Tongue

Offline theagentd
« Reply #38 - Posted 2012-12-11 16:47:01 »

It's still too unoptimized. I need to find an algorithm that doesn't require so many passes... Right now it seems to be a lot slower than Arrays.sort() on the CPU in raw sorting performance. If I disable culling, rendering and updating, I can sort 450 000 particles with 18-bits accuracy at 60 FPS (the algorithm scales linearly with the number of objects), so around 27 000 000 sorted particles per second. That's compared to real OpenCL sorting libraries which claims performance closer to a billion 32-bit keys sorted per second. I just have no idea how to implement this with OpenCL...

The algorithm also does not sort the particles, it sorts their indices by their distance (8 byte keys). Since I have so many I have to use 32-bit indices. It turns out that randomly indexing into the particle array is a lot slower than just drawing them all sequentially. Sorting indices makes it possible to copy around less data when sorting, but it might be a good idea to actually reorder the particle buffer too. Since the order changes very slowly, that would make the indices essentially sequential since the particle order changes very slowly so not many particles would be moved so far away that they cause a cache miss.

It doesn't looks that impressive on still images either. The coolest part is when I move the camera through that smoke cloud, and I can literally only see a few meters ahead. Without sorting, the cube of smoke looks hollow since you get the illusion that you can see inside it due to the incorrect blending of the particles. It'll also handle correct blending of things like fire and smoke.

Myomyomyo.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 744
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #39 - Posted 2012-12-11 16:59:00 »

Since the order changes very slowly,
It depends how you orientate your sprites.

  • if they are aligned to the camera orientation, then rotating the camera changes the sort-order drastically, as the render order is determined by the (infinite line)-point distance, not real distance to the camera.
  • if they are facing the camera (perpendicular to it), you indeed have relatively stable order, but then sprites will intersect eachother

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Online Danny02
« Reply #40 - Posted 2012-12-11 17:03:03 »

I did gpu sorting a while back and radix sort is the way to go, there is a open source OpenCL implementation which is really fast
Fastes was at the time the CUDA impl of the Nvidia SDK

But for so many particles try some iterativ sorting algorithm(bubble sort) and do only a few passes each frame. You won't have perfect sorting with this at every frame, but it is a very good 99% solution.

Also, if your particles are static then you have only a fixed nummber of orderings  (2D: 4, 3D:12)
Offline theagentd
« Reply #41 - Posted 2012-12-11 17:06:34 »

Well, they're separate problems. I currently compute depth the same way as the depth is usually stored, so depth indeed depends highly on the orientation of the screen. It'll be easy to change it to use eye-space distance instead which would remain stable no matter the camera's orientation. How I render them to make them look good is a different problem. =S

I did gpu sorting a while back and radix sort is the way to go, there is a open source OpenCL implementation which is really fast
Link? =D

Myomyomyo.
Offline Roquen
« Reply #42 - Posted 2012-12-11 17:46:24 »

No idea if this is of any interest...but I was just glancing at this: http://timothylottes.blogspot.fr/2012/12/storing-objects-on-gpu.html
Offline theagentd
« Reply #43 - Posted 2012-12-11 21:11:05 »

Ah, yes, that site. Funny that he posted that just when I did my sorting stuff. =S I've been monitoring his blog for TXAA info and he recently deleted everything concerning TXAA (10+ posts), so I was worried that he had been fired or something. ^^' We'll see what he comes up with...

Myomyomyo.
Online Danny02
« Reply #44 - Posted 2012-12-11 21:25:15 »

he deleted that stuff wtf?
Offline pitbuller
« Reply #45 - Posted 2012-12-11 23:56:26 »

Sounds really cool. Any possibility to share sorting shader?
These raw peak performance particle engines are quite interesting but sadly hardly ever translate directly to game usage.

How about adding features like environment lighting via cubemaps, dynamic lighting, self shadowing, casting shadows, dynamic force fields. Then you get to point where number of particles that is feasible is much lower than you currently use and that change some things radically.
Also you can usually cull whole emitters first and sort particles locally. So instead of sorting all particles you first sort emitter and after that sort particles per emitter.

Currently trying to figure how to get environment+ dynamic lighting and simulating vortexes with couple thousand particles at mobile tittle so only gles2.0, shadows are no go but luckily games does not even need those.

Any interesting particle papers to share? Currently trying to get ideas from this http://www.bitsquid.se/presentations/practical-particle-lighting.pdf
Offline theagentd
« Reply #46 - Posted 2012-12-12 01:25:39 »

My idea is to just dump all kinds of particles into a huge list and update them on the GPU. The update shader is an uber-shader which allows for lots of particle types, including emitter particles etc. Since the particles are sorted by distance, they will be somewhat grouped together by type since they're emitted from the same place so the branching will be relatively cheap. It's also worth noting that transform feedback has relatively high bandwidth cost for each vertex processed, so adding more work to the update shader won't affect performance at all in my tests. If I'm right, multiple vertex streams will also allow me to do frustum culling (just 6 dot-products) for multiple lights in one pass and output the indices to other buffers in the same pass.

My current sorting algorithm is an abomination and I really need a more optimized one. To still do this with transform feedback (instead of for example OpenCL or CUDA) I really need support for multiple vertex streams = OGL4 to direct particles into buckets. I'm currently forced to do one pass over all visible particles for each bucket, which means that I have to do twice as many passes as the bit precision of the depth. With multiple vertex streams, I could sort 4 bits per pass using 16 buckets and reduce the number of passes from 48 to 6 for 24 bits of depth, or just 4 buckets and get it done in 12 passes (if VRAM is a limitation). Like I said before, transform feedback is very memory limited, so this is the main bottleneck at the moment.

I've taken a look at fourier opacity mapping, and it seems to be an excellent way of doing particle shadowing and self-shadowing. Performance seems good since the resolution of the map can be kept very low while still giving a very good look thanks to the blurry nature of particles. The particles also do not have to be sorted when rendering the opacity map. My only problem is that I have absolutely no idea how it works. That kind of math goes waaaaay over my head. It's definitely somewhere in my todo list though.

Since my current particles are meant to simulate smoke I also had a go with fragment limitations. To get good looking smoke you need a lot of overdraw, and with the current 2 megapixel screens that becomes very expensive. Some games render the particles at half to reduce the number of pixels drastically. Using a special upsampling filter they can preserve sharp edges. Although the particles get slightly blurry, there's not much of a different since particle effects are inherently blurry. The only artifact possible are single-pixel errors that won't be visible at all. For 4 times as much overdraw I'd say it's definitely worth it.

Myomyomyo.
Offline pitbuller
« Reply #47 - Posted 2012-12-12 01:46:28 »

http://www.bungie.net/Inside/publications.aspx
At "Blowing S#!t Up the Bungie Way" paper they presents some nice gfx stuff that used at Halo3. There are nice tiling plate texture animation trick that brings more life to particles that can help you to reduce particle counts. Basically there is bigger tiling texture that have some shape in it an top of that they swim the actual particle texture by animation the uv's. It's seems so simple yet effective.

Another good trick is to use grayscale texture and palettize that with 1x256 texture. This can save some bandwith, reduce texture packing artefacts and give a lot more variation than simply using tint color. Addition to this technique would be pack albedo, spec mask and alhpa to one texture. This should give enought variation for textures that you could render liquid and gasses with same shader. Using world space normal maps also work like charm with cube mapping.
Offline ra4king

JGO Kernel


Medals: 337
Projects: 2
Exp: 5 years


I'm the King!


« Reply #48 - Posted 2012-12-12 02:37:42 »

Quick dumb question: why exactly are you sorting your particles?

Offline theagentd
« Reply #49 - Posted 2012-12-12 05:44:56 »

Quick dumb question: why exactly are you sorting your particles?
For correct blending. I have to sort them or the blending won't be applied in the correct order. I can't use normal z-buffering either.

Myomyomyo.
Offline Roquen
« Reply #50 - Posted 2013-01-12 13:54:31 »

Ah, yes, that site. Funny that he posted that just when I did my sorting stuff. =S I've been monitoring his blog for TXAA info and he recently deleted everything concerning TXAA (10+ posts), so I was worried that he had been fired or something. ^^' We'll see what he comes up with...
Just noticed that this stuff is suppose to be coming up at:
http://www.geforce.com/landing-page/txaa
https://developer.nvidia.com/content/welcome-game-graphics-technology-blog

from comments found here: http://timothylottes.blogspot.fr/2013/01/toward-practical-real-time-photon.html
Pages: 1 [2]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Riven (12 views)
2014-07-30 12:09:19

Riven (8 views)
2014-07-30 12:08:52

Dwinin (9 views)
2014-07-30 04:59:34

E.R. Fleming (25 views)
2014-07-29 21:07:13

E.R. Fleming (10 views)
2014-07-29 21:06:25

pw (39 views)
2014-07-24 19:59:36

Riven (39 views)
2014-07-24 15:16:32

Riven (27 views)
2014-07-24 15:07:15

Riven (28 views)
2014-07-24 14:56:16

ctomni231 (59 views)
2014-07-19 00:55:21
HotSpot Options
by dleskov
2014-07-08 21:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 18:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 18:47:22

How do I start Java Game Development?
by ra4king
2014-05-18 05:13:37

HotSpot Options
by Roquen
2014-05-16 03:59:54

HotSpot Options
by Roquen
2014-05-07 09:03:10

Escape Analysis
by Roquen
2014-04-30 16:16:43

Experimental Toys
by Roquen
2014-04-29 07:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!