Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (491)
Games in Android Showcase (112)
games submitted by our members
Games in WIP (556)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  How to read a single file efficiently using NIO and multiple threads?  (Read 4301 times)
0 Members and 1 Guest are viewing this topic.
Offline gouessej
« Posted 2010-10-08 15:02:32 »

Hi

I would like to read big files with several threads without blocking so that each thread can work on a region of the file without locking the whole file. I have looked for solutions for days... A FileChannel has no configureBlocking method, selectors can be used only with socket channels... FileChannel.map() seems to be blocking the whole file too. How can I map a region of a file in memory and create this mapping by using several threads?

Offline Matzon

JGO Knight


Medals: 19
Projects: 1


I'm gonna wring your pants!


« Reply #1 - Posted 2010-10-08 15:20:14 »

I would think that it makes the most sense to read the whole file to memory - using a single thread (since the disk will be the bottleneck here). Once its all in memory, slice it into N partitions where N = number of CPU threads.

I don't think there would be any advantage in having N threads doing random access to a big file, since this would probably cause needless seeking on the disk.

Offline dbotha

Senior Newbie





« Reply #2 - Posted 2010-10-08 15:41:06 »

Alternatively if the file is really big and you don't want to read it entirely into memory FileChannels are thread safe. According to the JavaDoc if you use the read methods that take an absolute position the operations will be performed concurrently (provided that the underlying implementation supports this). Thus if you want to go the concurrent route perhaps you could just have each thread read its region using explicit positions. Performance wise this is probably going to be worse than simply reading each region sequentially and passing it off to a another thread for processing. As Matzon says, concurrent reading is probably going to generate a lot of seeking and slow the whole process down.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline lhkbob

JGO Knight


Medals: 32



« Reply #3 - Posted 2010-10-08 17:03:16 »

Try looking into memory mapped files.

Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2010-10-08 17:08:49 »

FileChannel.map() seems to be blocking the whole file too.

No.

1  
FileChannel.map(FileChannel.MapMode mode, long position, long size)


As lhkbob said, it is exactly what you need. The OS will manage loading and storing for you.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline gouessej
« Reply #5 - Posted 2010-10-10 14:14:33 »

I would think that it makes the most sense to read the whole file to memory - using a single thread (since the disk will be the bottleneck here). Once its all in memory, slice it into N partitions where N = number of CPU threads.

I don't think there would be any advantage in having N threads doing random access to a big file, since this would probably cause needless seeking on the disk.
But the files are too big to be stored into memory Sad I tried to use a MappedByteBuffer and of course it did not work. Maybe trying to map smalled regions of the files could work.

No.

1  
FileChannel.map(FileChannel.MapMode mode, long position, long size)


As lhkbob said, it is exactly what you need. The OS will manage loading and storing for you.
Windows seems to handle this not as I expected, the second thread is waiting for the end of reading of the first thread Sad

Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #6 - Posted 2010-10-10 14:17:18 »

I tried to use a MappedByteBuffer and of course it did not work.

If you map truely huge files on 32-bit OSes, yes, it won't work as you run out of virtual memory. On 64 bit OSes, you can map any file into memory, even terabytes big, if your filesystem supports it.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline gouessej
« Reply #7 - Posted 2010-10-10 14:26:16 »

I looked at the source code of map:
http://www.docjar.com/html/api/sun/nio/ch/FileChannelImpl.java.html

There is no "synchronized" on the same lock unlike the read() method. I need a huge virtual memory to handle files of several GB, don't I?

Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #8 - Posted 2010-10-10 14:29:34 »

Well, it' not like you'll start to swap if you map a 500GB file. If you have a 64 bit OS, you can map that within a millisecond or so.

And why would you need a synchronisation, if each thread has it's own mapped byte buffer

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline lhkbob

JGO Knight


Medals: 32



« Reply #9 - Posted 2010-10-10 19:28:08 »

I also have to ask what you hope to get out of multi-threading disk access?  Pretty much any disk is going to give you serialized input (unless I missed something ...).  It might fake it and provide bits and pieces of multiple files interleaved (like how old CPUs let you run more than one program).  Why not re-work the problem so that the tasks can operate on chunks of the data in a parallel fashion?  You have a single thread reading through the file, and when a unit of work is ready to process it sends off to another worker thread.

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline gouessej
« Reply #10 - Posted 2010-10-11 13:13:08 »

I also have to ask what you hope to get out of multi-threading disk access?  Pretty much any disk is going to give you serialized input (unless I missed something ...).  It might fake it and provide bits and pieces of multiple files interleaved (like how old CPUs let you run more than one program).  Why not re-work the problem so that the tasks can operate on chunks of the data in a parallel fashion?  You have a single thread reading through the file, and when a unit of work is ready to process it sends off to another worker thread.
Actually I have to read pieces of data from a BIG file, convert them and write these data into another file. I wanted to use one thread per region of file. I thought it would be possible to read a single file by using several threads.

And why would you need a synchronisation, if each thread has it's own mapped byte buffer
I don't need synchronization, I don't want it; when I found the "synchronized" keyword in the read method, I was disappointed.

Offline cylab

JGO Ninja


Medals: 45



« Reply #11 - Posted 2010-10-11 13:31:06 »

Actually I have to read pieces of data from a BIG file, convert them and write these data into another file. I wanted to use one thread per region of file. I thought it would be possible to read a single file by using several threads.

As others pointed out, it would probably slow down the whole process a lot. Just do it sequentially.

Mathias - I Know What [you] Did Last Summer!
Offline Orangy Tang

JGO Kernel


Medals: 56
Projects: 11


Monkey for a head


« Reply #12 - Posted 2010-10-11 13:41:27 »

Actually I have to read pieces of data from a BIG file, convert them and write these data into another file. I wanted to use one thread per region of file. I thought it would be possible to read a single file by using several threads.
I don't need synchronization, I don't want it; when I found the "synchronized" keyword in the read method, I was disappointed.

Trying to multithread the file IO (especially when you have one big file to chew through and not multiple files) is fundamentally wrong-headed IMHO.

The most efficient solution is probably to have one file input thread (producer thread) reading in chunks of data (say, a few MB big) and pushing them onto a queue. Then have a pool of consumer/worker threads taking file chunks off the queue and processing them in parallel, before handing the output chunks to another file output thread via another queue. Your thread pool would have numCores-2 threads (so on an 8 core machine you'd have one input thread, one output thread and six worker threads).

Check out the java.util.concurrent stuff, it makes this kind of setup easy. Cheesy

[ TriangularPixels.com - Play Growth Spurt, Rescue Squad and Snowman Village ] [ Rebirth - game resource library ]
Offline princec

JGO Kernel


Medals: 369
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #13 - Posted 2010-10-11 13:50:07 »

Also, don't bother with nio, just use random access files.

Cas Smiley

Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #14 - Posted 2010-10-11 14:22:33 »

Nobody appreciates the fantastic work the OS does on memory mapped files? No way a 'regular programmer' can come up with something more efficient for random access.

The only disadvantage of MappedByteBuffer is that you get a region with a max-length of Integer.MAX_VALUE, so you need an array of MappedByteBuffers to map files bigger than 2GB, and that there is no unmap() which leaves you at the mercy of the GC to close the file handle (or messing with sun.misc.Unsafe to force it).

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline princec

JGO Kernel


Medals: 369
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #15 - Posted 2010-10-11 14:23:47 »

It's just complex, finicky, etc. The normal random access file IO stuff will work absolutely splendidly for this work load.

Cas Smiley

Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #16 - Posted 2010-10-11 15:03:28 »

It's just complex, finicky, etc. The normal random access file IO stuff will work absolutely splendidly for this work load.

True, but performance has rarely been associated with elegance.



This is how you map a 200GB file into memory:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
      // 200GB
     long len = 200L * 1024 * 1024 * 1024;
      File file = new File("C:\\huge.dat");

      RandomAccessFile raf = new RandomAccessFile(file, "rw");
      raf.setLength(len);
      FileChannel chan = raf.getChannel();

      long t0 = System.currentTimeMillis();
     
      List<MappedByteBuffer> maps = new ArrayList<MappedByteBuffer>();

      long off = 0;
      while (off < len)
      {
         long chunk = Math.min(len - off, Integer.MAX_VALUE);
         MappedByteBuffer map;
         map = chan.map(MapMode.READ_WRITE, off, chunk);
         off += map.capacity();
         maps.add(map);
      }
      raf.close();

      long t1 = System.currentTimeMillis();

      System.out.println("took: " + (t1 - t0) + "ms");


On my mediocre system it takes ~250ms.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline Orangy Tang

JGO Kernel


Medals: 56
Projects: 11


Monkey for a head


« Reply #17 - Posted 2010-10-11 15:17:08 »

Nobody appreciates the fantastic work the OS does on memory mapped files? No way a 'regular programmer' can come up with something more efficient for random access.
Memory mapping is fantastic, but given that gouessej's requirements are to linearly process a huge input file and produce an output file I'd rather go for the threaded approach and let a single thread chew though the file while the processing is distributed over the spare cores.

Of course gouessej has been horribly vauge on exactly what he's trying to do, if he genuinely does need random access from multiple threads then I agree that memory mapping is the way to go.

[ TriangularPixels.com - Play Growth Spurt, Rescue Squad and Snowman Village ] [ Rebirth - game resource library ]
Offline princec

JGO Kernel


Medals: 369
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #18 - Posted 2010-10-11 16:56:42 »

It would then be a particularly unusual problem though, wouldn't it. I bet your suggestion of 1 reader, n processors and 1 writer and using ordinary file access will be the most efficient and simple solution here.

Cas Smiley

Offline gouessej
« Reply #19 - Posted 2010-10-12 19:38:59 »

True, but performance has rarely been associated with elegance.



This is how you map a 200GB file into memory:
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
      // 200GB
     long len = 200L * 1024 * 1024 * 1024;
      File file = new File("C:\\huge.dat");

      RandomAccessFile raf = new RandomAccessFile(file, "rw");
      raf.setLength(len);
      FileChannel chan = raf.getChannel();

      long t0 = System.currentTimeMillis();
     
      List<MappedByteBuffer> maps = new ArrayList<MappedByteBuffer>();

      long off = 0;
      while (off < len)
      {
         long chunk = Math.min(len - off, Integer.MAX_VALUE);
         MappedByteBuffer map;
         map = chan.map(MapMode.READ_WRITE, off, chunk);
         off += map.capacity();
         maps.add(map);
      }
      raf.close();

      long t1 = System.currentTimeMillis();

      System.out.println("took: " + (t1 - t0) + "ms");


On my mediocre system it takes ~250ms.
Excellent suggestion Cheesy thank you very much. I think that we were trying to create too big MappedByteBuffer instances, that is why it was not working.

Offline Riven
« League of Dukes »

JGO Overlord


Medals: 783
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #20 - Posted 2010-10-12 19:47:11 »

Excellent suggestion Cheesy thank you very much. I think that we were trying to create too big MappedByteBuffer instances, that is why it was not working.

Yeah.... well... you could have read the javadoc to figure this out. Anyway, be careful what you wish for. Expect to never get back that mapped memory.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline gouessej
« Reply #21 - Posted 2010-10-12 19:48:58 »

Yeah.... well... you could have read the javadoc to figure this out. Anyway, be careful what you wish for. Expect to never get back that mapped memory.
You're right, I read the javadoc but my colleague didn't...

Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Nickropheliac (15 views)
2014-08-31 22:59:12

TehJavaDev (23 views)
2014-08-28 18:26:30

CopyableCougar4 (29 views)
2014-08-22 19:31:30

atombrot (41 views)
2014-08-19 09:29:53

Tekkerue (38 views)
2014-08-16 06:45:27

Tekkerue (35 views)
2014-08-16 06:22:17

Tekkerue (25 views)
2014-08-16 06:20:21

Tekkerue (35 views)
2014-08-16 06:12:11

Rayexar (72 views)
2014-08-11 02:49:23

BurntPizza (49 views)
2014-08-09 21:09:32
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!