Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (494)
Games in Android Showcase (114)
games submitted by our members
Games in WIP (563)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Any multithreading people out there? Multithreaded loading of a huge array.  (Read 1304 times)
0 Members and 1 Guest are viewing this topic.
Online Rayvolution

JGO Kernel


Medals: 197
Projects: 2
Exp: 1 year


Resident Crazyman


« Posted 2014-07-07 10:56:47 »

Hey all, so I've been looking for ways to accelerate my loading process. I came up with an idea to break up my massive int[1024][1024][8] array into smaller chunks to be stored on the hard disk. Each chunk would be a int[256][256][8] array, so there would be a total of 16 arrays. (currently the entire int array is compressed and stored as one massive file)

My game saving is already multithreaded, it just makes a deep copy of my array on the main thread, and sends it off to another thread to be serialized. But, for loading I still do it in the main OpenGL context thread because the game needs to load the map to continue anyway. But, I had an idea; what if I used 16 different threads to load each of the [256][256][8] parts all at once, then, once all 16 are loaded whatever thread finishes last reassembles the entire map and then unlocks the OpenGL context thread?

I have to do a decent amount of refactoring just to test this, so while I play with the idea I  was hoping to get some feedback on it from someone that has more multithreading experience than I do.

Basically, it would work like this:
1. The OpenGL context thread flags for the map to load, sets
mapLoading = true
and locks itself with a while loop
while(mapLoading){Sit at loading screen, maybe animate something?}

2. 16 new threads are created, each one for a 256x256 part of my map. From what I understand of multithreading is if the threads/cores don't exist, it'll just wait until one opens up. So in theory I can open 16 threads regardless if the CPU is a single, dual, quad, 6 or 8 core processor. It'll just use what it can when it can.
3. Each thread loads it's part of the map, when it's done it sets
mapLoaded[x]=true;
(where x is one of the 16 pieces).
4. All threads will run an IF statement to see if all the mapLoaded booleans are true, so the last thread to run should clear the if statement and run.
5. The last thread reassembles the entire 1024x1024 map from the 16 pieces then flags
mapLoading = false
, thus unlocking the openGL context and allowing the game to continue.

Visual illustration:

- Raymond "Rayvolution" Doerr.
Retro-Pixel Castles - Survival Sim/Builder/Roguelike!
LIVE-STREAMING DEVELOPMENT: http://www.twitch.tv/SG_Rayvolution
Offline BurntPizza
« Reply #1 - Posted 2014-07-07 11:04:15 »

Loading files on multiple threads is a no-no, that's the fastest way to trash your performance as the disk is a serial device, best used in long continuous read/writes... skipping around on the disk is like skipping around in memory and nullifying your cache except thousands of times worse Sad

Best bet is the high level optimization: do you really need that much data? 1024 * 1024 * 8 * 4 = 32 Mb
Perhaps a
byte[]
would suffice (or utilize all 32 bits of each int if you're not)

Next opt after that would be data compression, something fast like Snappy or LZO can actually be faster to read + decompress than an uncompressed read, disks are that slow, esp if you use multithreading on the decomp.
Online Rayvolution

JGO Kernel


Medals: 197
Projects: 2
Exp: 1 year


Resident Crazyman


« Reply #2 - Posted 2014-07-07 11:13:36 »

Loading files on multiple threads is a no-no, that's the fastest way to trash your performance as the disk is a serial device, best used in long continuous read/writes... skipping around on the disk is like skipping around in memory and nullifying your cache except thousands of times worse Sad

Best bet is the high level optimization: do you really need that much data? 1024 * 1024 * 8 * 4 = 32 Mb
Perhaps a
byte[]
would suffice (or utilize all 32 bits of each int if you're not)

Next opt after that would be data compression, something fast like Snappy or LZO can actually be faster to read + decompress than an uncompressed read, disks are that slow, esp if you use multithreading on the decomp.

Blast! You're correct, I totally forgot about concurrent loading on a hard drive. I wasn't even factoring that in when I had the idea. Well damn! Wink

Sadly though, I can't use a byte[][][] array, because I need int IDs up in the millions (quite literally). Part of the way I designed the engine was to assign each tileset a collection of tiles based on what type of tile it is. So currently I am already using ints as large as 2,000,000 to represent global tile IDs in the tile loader.

(note that I don't actually have 2 million tile IDs, but there's purposely gaps in the tileID system to inject tile global IDs inbetween others later. Less than 1 million are terrains, 1-2 million are deco objects and walls, and 2 million+ are buildings and functional objects. Each primary tileset has a 10,000 tile gap with 400 tile gaps between each tileset subtype)

Really the load time itself isn't all that awful, my master plan was to convert it from being 3-5 seconds to being nearly instant (perceptively), but I completely forgot to factor in the hard drive can't read concurrently anyway so loading multiple files at once is actually slower than 1 at a time.

I guess I may have to go back to my old idea, breaking it up into 16 files anyway, and loading 1 by 1 in a single thread, and progressing a loading bar/animation every time 1 piece is loaded. I'll also look into Snappy and LZO, currently I'm just using GZIPInputStream/GZIPOutputStream

- Raymond "Rayvolution" Doerr.
Retro-Pixel Castles - Survival Sim/Builder/Roguelike!
LIVE-STREAMING DEVELOPMENT: http://www.twitch.tv/SG_Rayvolution
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline BurntPizza
« Reply #3 - Posted 2014-07-07 11:17:44 »

Depending on how the data is structured, it might be worthwhile to try a simple run-length encode, dictionary, or even a huffman pass over it instead of gzip, might actually wind up faster. Profile and see if the IO waits on the gzip decomp or if the CPU is actually starved.
Offline richierich

Junior Member


Medals: 5
Exp: 1 year



« Reply #4 - Posted 2014-07-07 11:23:15 »

No Java multithreading experience but I used to do a lot of work with databases on disk. Often it turned out that the disk head movement was the bottleneck, so multithreading small reads didn't help. Up to a point what did help was reading really large chunks (i.e. many sectors at a time), because heads can sometimes grab a few in one rotation. Also keeping stuff together in one file is usually better because the head can stay still rather than dodging back and forth between several file locations.

Also as BurntPizza says, compressing the data is worth trying. Maybe even actually use a database. I'm fairly new to Java so I don't know which databases are simple and available, but they often have highly tuned and effective compression options built in.
Online Rayvolution

JGO Kernel


Medals: 197
Projects: 2
Exp: 1 year


Resident Crazyman


« Reply #5 - Posted 2014-07-07 11:30:21 »

Maybe even actually use a database. I'm fairly new to Java so I don't know which databases are simple and available, but they often have highly tuned and effective compression options built in.

I actually thought about doing that, but I'd need to code a bunch of overhead just to test it. (mainly just recoding the saving/loading functions) A friend of mine is a database programmer so I was going to bug him about some of his database solutions if I go that route. I know there's several low-overhead SQL databases out there for java, but I don't need all the fluff SQL has. All I'm storing is a whole mess of ints. I also know there's some really fast "NoSQL" solutions, I just can't recall their names at 6:30AM  Roll Eyes

- Raymond "Rayvolution" Doerr.
Retro-Pixel Castles - Survival Sim/Builder/Roguelike!
LIVE-STREAMING DEVELOPMENT: http://www.twitch.tv/SG_Rayvolution
Online princec

JGO Kernel


Medals: 378
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #6 - Posted 2014-07-07 12:26:13 »

Does nobody get taught anything in college these days?  Emo

Loading multiple files in multiple threads is exactly the use case of threads... load each file in a separate thread! Do it!

Cas Smiley

Offline BurntPizza
« Reply #7 - Posted 2014-07-07 12:42:38 »

Does nobody get taught anything in college these days?  Emo

Loading multiple files in multiple threads is exactly the use case of threads... load each file in a separate thread! Do it!

Cas Smiley

Start college in August Tongue


Although, correct me if I'm wrong, loading files asynchronously is a good idea, as the CPU can do stuff while you wait for the disk, performing simultaneous IO to different parts of the disk (non-contiguous operations) is counter-productive, is it not?

EDIT: I guess really, as always, an actual test is necessary before optimizations can be assumed!
Online Rayvolution

JGO Kernel


Medals: 197
Projects: 2
Exp: 1 year


Resident Crazyman


« Reply #8 - Posted 2014-07-07 12:52:19 »

Does nobody get taught anything in college these days?  Emo

Loading multiple files in multiple threads is exactly the use case of threads... load each file in a separate thread! Do it!

Cas Smiley

Orly? Guess I'll have to try it and see what happens. Wink

- Raymond "Rayvolution" Doerr.
Retro-Pixel Castles - Survival Sim/Builder/Roguelike!
LIVE-STREAMING DEVELOPMENT: http://www.twitch.tv/SG_Rayvolution
Online princec

JGO Kernel


Medals: 378
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #9 - Posted 2014-07-07 12:58:38 »

Does nobody get taught anything in college these days?  Emo

Loading multiple files in multiple threads is exactly the use case of threads... load each file in a separate thread! Do it!

Cas Smiley

Start college in August Tongue


Although, correct me if I'm wrong, loading files asynchronously is a good idea, as the CPU can do stuff while you wait for the disk, performing simultaneous IO to different parts of the disk (non-contiguous operations) is counter-productive, is it not?

EDIT: I guess really, as always, an actual test is necessary before optimizations can be assumed!
A thread is generally doing one of two things*: waiting on I/O or running. I/O arrives in little dribs and drabs, and when you get a bit, you do a little bit of work with it, and then go back to waiting for more I/O to occur. So imagine you've got a pool of say 8 threads doing your I/O reading... chances are one of them will be reading a bit from the disk, and the other 7 will be either doing a little bit of work with the last bit of data they received or sat idle. End result is your disk is flat out, and you're using as much CPU as you can... which is the goal.

Cas Smiley

* unless it's otherwise waiting on synchronisation or whatever

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline BurntPizza
« Reply #10 - Posted 2014-07-07 13:04:21 »

I get that, but if multiple threads are "requesting" reads on the disk, then I would think the disk wastes time going between the somewhat random locations, instead of reading big blocks at a time, in sequential order, as you would with a single thread. I guess I'm saying you should be able to get the disk flat out on one thread, adding threads keeps it maxed out, but behaving less efficiently, similar to how walking a large array in random order is way slower than a linear scan, even if the reason for the slowdown is different (cache issue vs. mechanical device latency)

Waiting on a bench, searching for some evidence....
Offline BurntPizza
« Reply #11 - Posted 2014-07-07 13:12:59 »

http://www.drdobbs.com/parallel/multithreaded-file-io/220300055

The third test case, a single-drive laptop, is probably the most applicable and representative of the target audience of a video game of the three, and on the second page you can see that in the first test, "Sequential Read" of files, performance drops sharply with even just 2 threads, and continues to decline as more are added, and I believe that test is the same situation Ray is in.

EDIT: however you can also see that if you can detect certain RAID configurations, you can reap a nice speedup via multithreading!
Online princec

JGO Kernel


Medals: 378
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #12 - Posted 2014-07-07 13:29:24 »

Look more at the use-case that you're representing... lots of relatively small reads from many small files, rather than attempting multiple large simultaneous sequential reads. You can see the laptop actually peforms best with 8 threads in this situation.

Cas Smiley

Offline BurntPizza
« Reply #13 - Posted 2014-07-07 13:48:18 »

Hmm... it might be superior with certain tuning (file size vs. read size vs. thread count, etc), although I feel that it would be quite easy to lose all improvement if the balance isn't quite right for that particular machine/device setup... For now I'm going to maintain my position that it's easier to optimize for one-thread read than to juggle many threads and risk massacring perf, though more thorough testing is in order to determine how risky it might actually be. Either way, this is not the lowest fruit, gains can still be made more easily elsewhere I suspect.
Offline PandaMoniumHUN

JGO Coder


Medals: 31
Exp: 3 years


White-bearded OGL wizard


« Reply #14 - Posted 2014-07-07 15:21:58 »

Setting up a non-blocking file loading system (like Node.JS does) with callbacks is beneficial because the game loop thread won't have to wait for the IO operation to finish, you just supply a callback and it gets fired after the file has been loaded. However for file loading the one thread/file approach will only give you performance benefits if your file sizes are huge. If the files are small and there is a lot of them multiple threads might end up being slower than a single thread because of the context switching. But who knows, give it a try and you'll see what's best for your game. Pointing

My Blog | Jumpbutton Studio - INOP Programmer
Can't stress enough: Don't start game development until you haven't got the basics of programming down! Pointing
Offline BurntPizza
« Reply #15 - Posted 2014-07-07 15:32:05 »

However for file loading the one thread/file approach will only give you performance benefits if your file sizes are huge. If the files are small and there is a lot of them multiple threads might end up being slower than a single thread because of the context switching.

It seems the opposite is true (I guessing it's related to the disk cache), either way the context switching is not at all the bottleneck, it's always going to be the disk.
Online princec

JGO Kernel


Medals: 378
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #16 - Posted 2014-07-07 15:32:46 »

Indeed, I hope you've actually identified this as an actual problem that needs solving rather than just you tinkering away making things faster that no-one knows about, or cares about, or even notices in the first place. Watch that talk given by Jon Blow. If you can find it.

Cas Smiley

Online princec

JGO Kernel


Medals: 378
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #17 - Posted 2014-07-07 15:34:11 »

Hint: it's almost certainly not something you should be coding... probably ever.

Cas Smiley

Offline theagentd
« Reply #18 - Posted 2014-07-07 18:19:47 »

Loading files on multiple threads is a no-no, that's the fastest way to trash your performance as the disk is a serial device, best used in long continuous read/writes... skipping around on the disk is like skipping around in memory and nullifying your cache except thousands of times worse Sad

This is completely false in the first place. Both HDDs and SSDs perform better when you read multiple things from them using multiple threads. For HDDs, this is because it can optimize the searching by basically reordering the reads so that the reading head takes a more efficient "path". It can also queue up and pipeline the requests better, similar to how OpenGL works.

1  
2  
CPU: request file-------------------------------file received, request next--------------
HD:  -----------start reading~~~~~~~~~~~~~~~done----------------start reading~~~~~~~~~~~~

As you can see, a small stall occurs once the HD has finished reading a file. If you have multiple threads, it immediately knows what to do after.

For an SSD, using multiple threads is even more ideal. SSDs have the amazing ability to write and read from multiple parts of itself in parallel, so if the files are in different places on the SSD, you can basically read several times faster from it by having multiple threads request files.

Look up any harddrive or SSD benchmark and you'll see that they check it with different "queue depth", e.g. how many threads they use to read or write with.

Myomyomyo.
Offline Roquen
« Reply #19 - Posted 2014-07-07 18:47:33 »

Compression is your friend.
Offline BurntPizza
« Reply #20 - Posted 2014-07-07 19:39:06 »

@Agent

Reading this I will concede that (I had already seen the evidence earlier in the thread) reading many small chunks can be sped up with multithreading to give the drive controller more information about the requests, however looking at some benchmarks I'm seeing that sequential operation is about two orders of magnitude faster than random 4K operation, plural queue depth or not, which (assuming I read the numbers correctly) makes sense to me because sequential & contiguous is already optimal and optimizing an even semi-random request queue can only approach being sequential and contiguous, with the exception of the edge case of the random spread covering the whole section without overlap and the queue being perfectly optimized. This also assumes the consumer-grade drive has a sufficiently advanced controller, which is not guaranteed (although I expect would be common at least by now).

I've also never stated that SSDs have this "vulnerability," I hope I didn't come across as implying as such.


If you're still reading this Ray, just reconsider if any of this is worthwhile, and if so go for compression first. IIRC Snappy and LZO have Java implementations. I expect largest and easiest gains are to be had there if you can't reduce the bottom-line data quantity.
Online Rayvolution

JGO Kernel


Medals: 197
Projects: 2
Exp: 1 year


Resident Crazyman


« Reply #21 - Posted 2014-07-07 21:09:53 »

@Agent

Reading this I will concede that (I had already seen the evidence earlier in the thread) reading many small chunks can be sped up with multithreading to give the drive controller more information about the requests, however looking at some benchmarks I'm seeing that sequential operation is about two orders of magnitude faster than random 4K operation, plural queue depth or not, which (assuming I read the numbers correctly) makes sense to me because sequential & contiguous is already optimal and optimizing an even semi-random request queue can only approach being sequential and contiguous, with the exception of the edge case of the random spread covering the whole section without overlap and the queue being perfectly optimized. This also assumes the consumer-grade drive has a sufficiently advanced controller, which is not guaranteed (although I expect would be common at least by now).

I've also never stated that SSDs have this "vulnerability," I hope I didn't come across as implying as such.


If you're still reading this Ray, just reconsider if any of this is worthwhile, and if so go for compression first. IIRC Snappy and LZO have Java implementations. I expect largest and easiest gains are to be had there if you can't reduce the bottom-line data quantity.

Yep, still here. Sucking in all the various information in every direction. Tongue

I plan on trying many of the options and figuring out what works best. I did discover something odd; I broke my maps down into 8 files, 1 for each layer. So instead of saving 1 int[1024][1024][8] array, I'm saving 8 int[1024][1024] arrays. An odd thing happened when I did this: The total file size went from 3.2mb to 1.7mb for the exact same amount of data compressed the exact same way, just in 8 files instead of 1. Due to that, load time increased.

I attempted to multithread loading the 8 files with mixed success, mainly due to trying to have multiple instances of runnable() in the same class. It was choppy code, and I know how to fix it. I just haven't quite yet. But, when it *did* load correctly, it loaded much faster.

Either way though, I've dropped map loading down to 1200ms~ on my desktop (from 1800ms~) due to the decreased file size. I couldn't do any reasonable timed tests with the hacky multithreading code though, but I suspect it'll load roughly twice as fast from the few times it managed to load correctly by dumb luck.

I'm probably going to try Snappy or LZO anyway though, even though my total load time is under 3 seconds on my desktop, it's still a good 10-15 on my laptop.. and that seems a bit high.

- Raymond "Rayvolution" Doerr.
Retro-Pixel Castles - Survival Sim/Builder/Roguelike!
LIVE-STREAMING DEVELOPMENT: http://www.twitch.tv/SG_Rayvolution
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 793
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #22 - Posted 2014-07-08 06:58:06 »

http://www.tomshardware.com/reviews/laptop-ultrathin-hdd-500gb-review,3526-3.html

http://www.tomshardware.com/reviews/laptop-ultrathin-hdd-500gb-review,3526-4.html



Backing up BurntPizza's reply: for spinning disks (and to a lesser extent SSDs), pack your files, regardless of whether you throw compression into the mix. (think *.tar) Read the entire file in memory and extract your resources from there. You'll be reading in your data at about 80MB/s on laptop HDDs, which will blow away any performance gain you get from multithreading (the worst case scenario of) random-access on spinning hardware, which might bump I/O bandwidth from 0.5 to 2 MB/s.

Once the data is in memory, you can unleash as many threads as you want on it, to potentially boost performance - whether it is significant depends on the nature of the data. If you use compression, aim for compression on the resource level, as opposed to the file level. This way you can multi-thread the decompression by loading different resources using multiple threads. It might increase the file size by a few percent, but that's a proper tradeoff.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Pages: [1]
  ignore  |  Print  
 
 

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Dwinin (21 views)
2014-09-12 09:08:26

Norakomi (55 views)
2014-09-10 13:57:51

TehJavaDev (66 views)
2014-09-10 06:39:09

Tekkerue (33 views)
2014-09-09 02:24:56

mitcheeb (54 views)
2014-09-08 06:06:29

BurntPizza (38 views)
2014-09-07 01:13:42

Longarmx (24 views)
2014-09-07 01:12:14

Longarmx (30 views)
2014-09-07 01:11:22

Longarmx (28 views)
2014-09-07 01:10:19

mitcheeb (36 views)
2014-09-04 23:08:59
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!