Featured games (91)
games approved by the League of Dukes
Games in Showcase (580)
games submitted by our members
Games in WIP (500)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Concurency in 3d engine.  (Read 6051 times)
0 Members and 1 Guest are viewing this topic.
Offline lukasz1985

Senior Newbie

« Posted 2013-09-29 12:41:38 »

Offline xsvenson
« Reply #1 - Posted 2013-09-29 13:26:07 »

From an engineering standpoint, it would be a nice exercise, a rather complex one but still nice.
There are papers out there, on the internet, that talk about various methods and what can be done. Google will help You.

However, if Your goal is to produce something workable as soon as possible, then separating render from update will introduce You to a world of hurt and pain. If You just want to make an engine for others or an engine for Yourself to make a game, then You rather should focus on those goals. You can make Your engine better if and when Your performance or other metrics need it.

“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
Offline DrHalfway
« Reply #2 - Posted 2013-09-29 13:37:04 »

I used a similar strategy in one of my experiments. For every loop, there will be a thread for the render and a thread for the logic, both running concurrently. Once logic is complete, you buffer all the needed data as "read only" which the render thread will then access, this way the logic thread will move onto the next loop while the render thread grabs the current frame data and renders it. If incase one is running ahead of time, ie, a render thread is still rendering previous frame data as this frame is complete, the logic thread will stall and wait for the render thread to complete. Using this method means that a total time taken for a single frame is the same time as the thread which takes longer to compute.

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline theagentd
« Reply #3 - Posted 2013-09-29 16:59:56 »

Put the logic in its own thread, not the rendering. OpenGL can only be called from a single thread, so if you put rendering in its own thread you'll also have to initialize everything that uses OpenGL in that thread as well or mess with OpenGL contexts. Logic has no such limitation, so just put the logic in its own thread. It's makes almost no difference in how the game works in the end, but it solves so many problems with loading.

To be able to render while updating for the next frame, you'll need to double buffer your data to avoid situations where an update is done while rendering is running causing only some objects to be moved. At best this can cause graphical glitches, at worst it can easily crash your code (remove an object from a list in the logic thread, rendering thread goes out of bounds) and those errors are extremely hard to debug since they happen randomly and infrequently. To solve this, you need to have some kind of synchronization point where you copy all the data needed to render a frame so the rendering thread can safely use this data to render the game without having to worry about the logic thread modifying its data.

I wouldn't say only using two threads is optimal though. The load will most likely be very unbalanced between the two threads, so it won't actually be able to use two cores to their full potential. A more advanced approach would be to divide the engine's internal subroutines into different independent tasks that can be run in parallel. For example particle updating, bone animation updating, physics updating and terrain rendering can all be run at the exact same time on different cores since they do not need any synchronization between each other. Even further, you can even do particle updating on multiple threads assuming particles can not interact with each other. Such a system could use any number of cores assuming there are enough independent tasks, and since some tasks can be split up between any number of cores (100 000 particles can be updated on 1-16 cores or whatever) the engine can scale very well with the number of cores in the system.

As a shameless plug, I'd like to mention that I've developed a very small library for accomplishing this. It allows you to construct a tree of tasks with dependencies inbetween them. The task tree can then be run using a multithreaded executor which identifies which tasks can be run in parallel and attempts to spread out the available tasks across all available threads. You can find it here: If you have any questions about it, feel free to contact me either here or on Skype (same username).

Offline Roquen
« Reply #4 - Posted 2013-09-30 16:17:36 »

more cores = more power.  It's better to have 'n' cores at speed 'x' than one at 'nX'.  And it's even better if the cores have (an equivalent to) HyperThreading.
Offline Roquen
« Reply #5 - Posted 2013-09-30 17:36:27 »

Just the bare minimum brush strokes which will be very inexact.  Say you have two machines with identical CPU/memory specs except one has 8 2 GHz cores and the other has one core @ 16GHz neither with HyperThreading to keep it simple.  The 8 core machine can at every cpu cycle be processing 8 threads while the single can process 1.  Each core has it's own set of registers and L1 cache (among other things). When a core stalls...nothing effectively happens until the stall is resolved (or the OS swaps threads).  The cost of a core stalling for some external to core communication will be 8x greater on a single core machine.  When a stall occurs on the multi-core, on average the other cores will continue to run.  The single core machine has to service all threads of all running processes including the OS.  The context switch is effectively 8x higher and no forward progress computation occurs during the switch.  The multicore machine, by it's nature will require fewer context switches (on average).
Offline theagentd
« Reply #6 - Posted 2013-09-30 20:40:39 »

Not to mention the insane heat issues of a 16GHz CPU. Heat = X * (clock speed)^2, plus increasing the clock speed requires a higher voltage to maintain stability, meaning that the scaling is probably closer to (clock speed)^3. 8x clock speed = probably over 100x as much heat generated (and power consumed of course).

I'd think that in practice I'd probably prefer a theoretical 16GHz CPU than a 2GHz 8-core CPU considering how horribly badly threaded most game engines are. It's true that without a similar memory overclock (and CAS reduction etc) the clockspeed itself wouldn't help much, but the only engine I've seen that currently runs well on for example AMD's newer 8-core CPUs is the Frostbite engine (= Battlefield games since Bad Company). Those games get a huge boost even from hyperthreading on a dual core, and the engine seems to be able to use any number of cores you throw at it. As CPUs will be getting more and more cores more engines need to implement proper threading!

Off topic Planetside 2 bash
On the other end of the spectra we have Planetside 2 which was, and still is, a CPU performance clusterf*ck. A few months ago certain places in the game (huge bases) used to bring down my FPS to sub-15. When there were huge fights at those places, it'd go down to under 10 FPS. Oh, and this was on the continent without trees since vegetation easily cut your FPS in half and everything set to minimum. I now have a better CPU (i7 4770K @ 3.5GHz) and they've gone a long way at making the game run better, so I can usually maintain 50+ FPS now (still on minimum settings), but it still doesn't use more than 2 cores. Hopefully the promised upcoming optimization patches will change that.

Offline sproingie
« Reply #7 - Posted 2013-10-02 00:02:34 »

I have a beast of a CPU and GPU, but the whirlpool room in Borderlands 2 becomes a single-digit-FPS affair once the firefight starts to heat up.  In fact when the game crashes, it's usually there.  The only thing that ever slows down to a similar degree is the end-game loot shower from the Warrior.  I suspect memory management is the culprit in both cases, not CPU efficiency.
Online ags1

JGO Knight

Medals: 29
Projects: 2
Exp: 5 years

Make code not war!

« Reply #8 - Posted 2013-10-17 18:54:03 »

How about a game state buffer? The logic threads read current game state (which is invarying) and write to a future game state. The rendering threads read from current game state and do their opengl stuff. When the logic threads have finished updating the future game state it is swapped to become the current game state.

Offline Cero
« Reply #9 - Posted 2013-10-17 23:11:01 »

I have a beast of a CPU and GPU, but the whirlpool room in Borderlands 2 becomes a single-digit-FPS affair once the firefight starts to heat up.  In fact when the game crashes, it's usually there.  The only thing that ever slows down to a similar degree is the end-game loot shower from the Warrior.  I suspect memory management is the culprit in both cases, not CPU efficiency.

Really ? I dont have a beast and I didn't notice a thing. Granted I never have a game on high settings D=

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Jeremy
« Reply #10 - Posted 2013-10-18 00:01:38 »

I'm not sure if this is what you're looking for, but it stumbled across this article a couple months ago, it doesn't seem too difficult to implement:

JevaEngine, Latest Playthrough (This demo is networked with a centralized server model)
Offline cylab

JGO Knight

Medals: 34

« Reply #11 - Posted 2013-10-18 15:00:46 »

actually you might reconsider your opinion on copying renderstates,  because not having a frame completely updated befor drawing results in annoying glitches. your argument that it is only milliseconds does not hold,  because the glitch is at least timeboxed to the frame time. especially in more demanding scenes youll get the 3d equivalent to screen tearing,  but extended to all kinds of attributes - from color to lighting to textures to geometry. and since it can happen that youll have completely different glitches in subsequent frames youll end up with a very poor visual quality.  you need at least something like double buffering for your data.  to reduce data copying you could journal your changes to one buffer and apply the same changes to the second buffer before applying the changes for the next frame.

Mathias - I Know What [you] Did Last Summer!
Offline Roquen
« Reply #12 - Posted 2013-10-18 15:03:06 »

And locks et al. hate you and will make you hate yourself.  My 2-cents is to avoid them like the plague.
Offline EgonOlsen
« Reply #13 - Posted 2013-10-18 18:43:36 »

One frame is 16 miliseconds with 60 fps. Thats really a small amount. So the tearing can remain at most for 16fps. It's almost impossible to see.
And how do you handle for example rotation matrices? It's not all about simple translations. If the render thread uses a matrix that the update thread is working on, the result will be an invalid rotation matrix and depending on the change in the matrix, this can look really strange.

Offline Roquen
« Reply #14 - Posted 2013-10-18 19:28:32 »

Locks are a PITA and usually the worst solution.  Lock-free, wait-free,
Offline cylab

JGO Knight

Medals: 34

« Reply #15 - Posted 2013-10-18 21:30:31 »

One frame is 16 miliseconds with 60 fps. Thats really a small amount. So the tearing can remain at most for 16fps. It's almost impossible to see.
I see such things and are really annoyed by them.  Other than that it is only difficult to spot if it happens once in a while,  but as soon as the update thread and the render thread each take more than 50 percent of the frame time,  its probable that it happens every frame!

also if you use locks to solve latency problems, youll effectively synchronize your engine into single threaded performance - also youll hate to debug it.  but as you said,  time will tell  Tongue

Mathias - I Know What [you] Did Last Summer!
Offline pitbuller
« Reply #16 - Posted 2013-10-18 21:49:45 »

Rendering using undefined matrices will be horrible. What if matrix will scale object scale to really huge and you suddenly get extreme overdraw. Then gpu time goes up to roof and everything is pita.
Offline xsvenson
« Reply #17 - Posted 2013-10-19 13:01:23 »

“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
Offline lcass
« Reply #18 - Posted 2013-10-22 22:06:52 »

So ive been hearing about all these issues such as the 1 core cpu etc. I have a nice idea , you include it in your game but have it off by default when someone turns it on they get a warning popup and then it switches to using two threads.
Offline theagentd
« Reply #19 - Posted 2013-10-23 13:29:54 »

So ive been hearing about all these issues such as the 1 core cpu etc. I have a nice idea , you include it in your game but have it off by default when someone turns it on they get a warning popup and then it switches to using two threads.
Or you fix your code instead of putting down a big "Use at your own risk" sign...

Offline Roquen
« Reply #20 - Posted 2013-10-23 13:52:00 »

The thing about multithreading is that it doesn't need to be hard.  The big issues are up front design centered around isolation of tasks, minimal to no sharing of "state" data and not writing concurrency based data-structures and algorithms (let someone else do that for you).  Ideally thread communication should be single producer & single consumer..oh and have I mentioned avoid lock-like structures like the plague.  I actually find single-threaded design much harder because you (typically if you have moderate to higher complexity) have to worry about worst case timings of everything to prevent things from getting foo-bared (well unless you have a soft/hard realtime dedicated OS that allows writing of interrupt handlers).

As an example.  There's no specific reason to insist that at some timestamp X that a rendering thread and a simulation thread must "see" the world state as being the same.  Doing so make everything much so harder for no useful reason.  The rendering thread simply needs to have a plausible and internally consistent world view to show the given player at the current frame.  Design by composition and data segregation are very handy skills to develop.

Sadly so much of literature, teaching, etc. is based around architectures which haven't been around for decades and given the raw numbers of people learning CS the tendency to give "every problem is a nail" scenarios.  Personally I'd let to see a lot more of "write high level, think low level" and acknowledging that CS is applied mathematics and the only want to learn is to get off your ass and work problems (as opposed to which solution fits the problem).
Offline theagentd
« Reply #21 - Posted 2013-10-23 21:20:40 »

I don't agree with most of what you're saying.

Like you said, "big" problems can always be divided into a number of smaller problems, with part of the problem being how these parts should interact. Minimizing the areas where the problems overlap is the most important thing to do, since a small problem without complex interactions with other problems is a simple problem. Never try to solve a big problem, always divide and conquer.

Locks are not useful for <parallel> programming. Locks are a way of turning a parallel program into a serial program. For games, the focus is either on increasing performance through parallel computing, or on responsiveness (allowing the UI to continuously update while doing heavy computations in another thread). Therefore locks should be avoided as much as possible. There are almost always lock-less solutions to a problem. Have 4 threads writing to a shared list? Just give each thread its own list. If you for some reason need a single list, then just merge them all into one list when you're done, preferably while something else is being done on other threads. Bam, no more locking overhead keeping 4 threads busy 75% of the time. From my experience the only time I've not been able to get rid of my locks/synchronization is when coordinating which threads should do what. After that's sorted out you shouldn't need any synchronization of any kind.

I don't think most people here think that programming is especially painful. I'm sure some people like the "IT'S FINALLY DONE"-part more than mindless bean making, but the part I enjoy the most is coming up with a fancy algorithm for doing something and then implementing it. It's the challenge for me.

Avoiding libraries can be a good way to learn new things. For example, using OpenGL directly instead of using an engine built on top of OpenGL will usually be faster since you can do some pretty cool low-level optimizations that can be hard to pull off without a specifically tailored solution to the specific problem. But the same can be said about OpenGL itself. Why use OpenGL when you can interact directly with the GPU? You must know that this is a Java forum, so by simply being here you're using the Java garbage collectors and built-in libraries. Everyone draws the line somewhere. I think Java has lots of convenient features like garbage collection that lets me focus on what I find interesting instead of bothering with manually freeing pointers. The cost is a garbage collection pause every now and then, but through smart coding that can be kept to a minimum. Plus, I bet that the Java garbage collectors are better at their job than any memory manager I could possibly manage to write myself in C or C++ for example, but I digress. The point is that everyone draws their line somewhere depending on how close to the hardware they want to be. Every level of abstraction costs performance and flexibility, but reduces the time it takes to code and the complexity of the code.

On another hand, I think it's important to have a deeper understanding about the underlying libraries to be able to code effectively. If you don't know what the library/hardware you use is doing, you won't be able to optimize your code well. I guess this is more true for OpenGL than Java, but it can be applied to both. Unless you know how CPU caches work, you won't understand why LinkedList is so much slower than ArrayList. Unless you know how z-buffering works, you won't understand why transparency is so hard in 3D. If you don't know assembler, you won't know why
is not thread safe. I don't have to code my own Java VM or OpenGL driver to understand how to properly use Java and OpenGL, but I do need to know how they work to some extent.

Your last statement confused me a bit. Why would proper multithreading introduce a delay? The whole point should be to improve performance and/or responsiveness, and both of those should imply a reduced input delay. You're completely missed the point of multithreading in that case.

Offline Roquen
« Reply #22 - Posted 2013-10-23 22:15:44 »

On locks: SEE the search keywords I listed above...or look at the docs of most of the concurrency package...and those are designed to be general purpose.  Complex data structures can be formed without ever needed say a mutex (or worse a critical section) which can be concurrently accessed by an arbitrary number of threads.  You don't need locks.  On learning...I'm sticking to don't write concurrent algorithms or data structures (at least ones you actually intend to use).  Java's a little easier because you don't have to worry about the ABA problem...but you need mathematically "prove" your implementation is correct (Unit tests?  Ha!) for anything beyond dead easy.   Don't bother.  Let people that spend their lives on the topic do the heavy lifting for you.

I almost wept with joy when consumer hardware became advanced enough to use CAS & LL/SC (sigh).
Pages: [1]
  ignore  |  Print  


Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (51 views)
2014-04-15 18:08:23

BurntPizza (48 views)
2014-04-15 03:46:01

UprightPath (63 views)
2014-04-14 17:39:50

UprightPath (45 views)
2014-04-14 17:35:47

Porlus (62 views)
2014-04-14 15:48:38

tom_mai78101 (86 views)
2014-04-10 04:04:31

BurntPizza (146 views)
2014-04-08 23:06:04

tom_mai78101 (243 views)
2014-04-05 13:34:39

trollwarrior1 (202 views)
2014-04-04 12:06:45

CJLetsGame (209 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30 is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!