Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (515)
Games in Android Showcase (122)
games submitted by our members
Games in WIP (577)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Extremely poor performance with NIO...  (Read 4878 times)
0 Members and 1 Guest are viewing this topic.
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Posted 2003-07-16 10:05:18 »

From the thread: http://www.JavaGaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=Networking;action=display;num=1052749856;start=45#45

....

Quote

I am still seeing slower-than-expected results with the sample code and would like more feedback if any.


Arrgh. I think I've just encountered the same problem in a different server. I've got a semi-production server (which for a month has been running a game OK) which has been fine with three Selectors. Performance has been good.

I just added a new processing module, and suddenly the performance on the Write Selector has gone through the floor and is busily tunnelling it's way to Australia.

First response takes 3 seconds (!) to send (it's only 63k, on a LAN!). If the request is repeated, the response takes 60 seconds (!) sending almost exactly the same data. If you try a third time, linux crashes - or sometimes, you get a lot of disk thrasing, linux kills the JVM process, doesn't crash [phew!] and you are left with 12 zombie JVM processes sharing 200 Mb of RAM - EACH process has to be force killed (signal 9) individually; note: normally in linux, killing one JVM process kills the lot, because the other processes are just the different threads within the JVM.

Normally, typical responses of the order of a few kb are transferred at LAN speeds (multiple Mbits, i.e. you'd expect perhaps 10 milliseconds for the transfer, not multiple seconds).

Profiling shows nothing - no CPU hogs, no memory hogs. This seems therfore to be a problem with the JDK 1.4.2 implementation of Selector's and/or ByteBuffer's (this is the only way I can think of that RAM usage and/or CPU usage are taking place but are invisible within the JVM?).

Interestingly, none of my base code has changed - all my processing is abstracted away from the Selector's (I just add a new module that returns String's to the lower-layer which sends them etc), so the only thing that is different between when I had no problems and now is the length of the response and the actual bytes involved.

Interesting points:
  1. My SO_SENDBUF is currently approx 25k. Previous responses have typically been 10k-20k (although I have NOT measured this for all of them); it is possible that this is the first time I'm going over the size of the default send buffer.
  2. I've been making excessive use of the convenience method in nio.charset:
1  
ByteBuffer bb = ((Charset) ch).encode( String );

...unfortunately, the nio team didn't document what kind of BB that code generates (direct, indirect, etc?) and any performance considerations. This is the one of the few "unknown quantity"'s in my code, so I'm treating it as potentially suspicious.

Could you try experimenting with limiting your requests to slightly less than the SO_SENDBUF? (have a look at Socket.getSendBufferSize() ). You can also change the size of SO_SENDBUF, but that may have further unpredictable effects, so I would suggest that you first of all just limit the amount of data you're sending...

If that has no effect, then are you making similar usage of Charset's?

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #1 - Posted 2003-07-16 11:33:17 »

Quote

First response takes 3 seconds (!) to send (it's only 63k, on a LAN!). If the request is repeated, the response takes 60 seconds (!) sending almost exactly the same data. If you try a third time, linux crashes
...
Normally, typical responses of the order of a few kb are transferred at LAN speeds (multiple Mbits, i.e. you'd expect perhaps 10 milliseconds for the transfer, not multiple seconds).


Using almost precisely the same ByteBuffer's (generated by the same algorithm filled with 63k data, etc):
  1. I altered the position() of each buffer from the tail forwards so that the gathering-write would only process the first 20k of data (well within the 25k of my send buffer)
  2. First request STILL takes 3 seconds. The amount of data has reduced by a factor of 3, the only thing that is constant is the number of BB's and their contents
  3. Successive requests now also take 3 seconds (variation of +- 20%)
  4. Re-enabling the full 63k of data, I noted that the individual BB's in the gathering write are delivering their content VERY fast (too fast for a human to see), but there is an inter-BB delay of up to 5 seconds (demonstrated by opening a telnet client, and watching the data appear in chunks exactly correlating to the contents of successive BB's).

The implications of this might be one or more of:

1. GatheringByteChannel.write( ) is extremely inefficient for large numbers of BB's in the array - I'm using approximately 60 BB's. This should be documented if this is true and deliberate, but there is no mention of it in the JDK docs at the moment. However, this could be a bug due to the following point:

2. Sun's implementation of GatheringByteChannel in SocketChannel is very poor (for JDK 1.4.2). GatheringByteChannel states that:

"Unless otherwise specified, a write operation will return only after writing all of the r requested bytes. Some types of channels, depending upon their state, may write only some of the bytes or possibly none at all. A socket channel in non-blocking mode, for example, cannot write any more bytes than are free in the socket's output buffer."

In practice, Sun's implementation appears to only write one BB's worth at a time. This can be very very much less that what the socket's output buffer could hold (in my current example, approximately 2% of the size!).

This may indirectly be the cause of the performance problems (perhaps the overhead of doing 60 select() operations in quick succession is causing the massive delays?).

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #2 - Posted 2003-07-16 20:19:28 »

I wrote a simple test case that makes it easy to experiment with these behaviours.

Download it from:

http://www.grexengine.com/sections/people/adam/NioTester.class

...or the source, if you want it:

http://www.grexengine.com/sections/people/adam/NioTester.java

Just try running it (it's only one class file) and it will print instructions.

I'd be interested to hear if other people's experience differs from my own...particularly on non-linux platforms.

I find that with 1 or very few buffers (less than 5) performance is what I'd expect given I'm going from localhost to localhost. With 100 buffers, the performance drops to around 50kb/s, which is pathetic even for a 10Mbit LAN, let alone local machine.

I'm still hoping I've made a stupid mistake here, but I wrote this test case from scratch and I'm still seeing this bad behaviour.

malloc will be first against the wall when the revolution comes...
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #3 - Posted 2003-07-16 20:25:19 »

Quote

In practice, Sun's implementation appears to only write one BB's worth at a time. This can be very very much less that what the socket's output buffer could hold (in my current example, approximately 2% of the size!).


With the test case, I always get 16 buffers processed per WRITE operation, whether I use 100 buffers or 200. 16 is suspiciously power-of-2, so perhaps this is some "preferred" number of buffers to send at once.

With a large number of buffers (e.g. 100) it still sends 16 buffers at once, even though that's more data than my send-buffer can handle (approximately 32k for a 25k buffer)

malloc will be first against the wall when the revolution comes...
Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #4 - Posted 2003-07-16 20:56:29 »

I tried your program on Mac OS X 10.2.6 with Java 1.4.1 DP 102

10 buffers = 5.9 MB/s to 17.6 MB/s
100 buffers = 1654 B/s to 39.8 kB/s
200 buffers = anywhere from 170 B/s  to 10kB/s


In general the first few parts would take the most time (up to 200 ms each) and the remaining parts were much faster (under 10 ms each)

Offline GKW

Senior Duke




Revenge is mine!


« Reply #5 - Posted 2003-07-16 23:51:27 »

win2k

10 buffers       1.1            MB/s
100 buffers     0.031        MB/s
1000 buffers   0.000904  MB/s
Offline AndersDahlberg

Junior Duke





« Reply #6 - Posted 2003-07-17 08:12:42 »

redhat 9:

10: 8.79 mb/s
100: 20.2 kb/s
200: 5.29 kb/s
Offline Abuse

JGO Knight


Medals: 13


falling into the abyss of reality


« Reply #7 - Posted 2003-07-17 09:15:27 »

hmmmm, maybe I did something wrong  Roll Eyes

1 - 0ms div/0
10 - 0ms div/0
100 - IOException: Vector write failed
1000 - IOException: Vector write failed

or maybe its cos I was using Win98  Wink

just did 2 more tests :p

16 - 0ms div/0
17 - IOException: Vector write failed

magic 16 again  Shocked

Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! https://dl.dropbox.com/u/54785909/EliteIVsmaller.png
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #8 - Posted 2003-07-17 18:44:43 »

Quote
hmmmm, maybe I did something wrong  Roll Eyes

1 - 0ms div/0
10 - 0ms div/0
100 - IOException: Vector write failed
1000 - IOException: Vector write failed


Yeah, sorry about the div/0. It was only a quick-n-dirty test case Smiley.

It would be nice to know what that IOE means; it doesn't seem to be matched by google (are you summarising a longer description?).

I've also now got more results from WinXP systems:
1, 10 = 0ms
100, 200 = approx 12Mb/s (which would be max LAN speed; 100Mbit LAN)
1000 = approx 40 MB/s

Those results are the same whether the client telnet is from the local machine or another one over the LAN. It is clear from the telnet clients that WinXP is NOT slowing down throughout the transfer (unlike linux/OS X and win2k). I can't remember whether win98 even has real OS-level asynch I/O or not (windows completion ports) ? I'm surprised that win2k is doing badly here.

Unfortunately, telnet clients are ultimately a poor choice for this test because they can't reliably say much about the actual *transfer* speed (note: the testcase only measures the "apparent write-to-network-card speed" or similar).

My original problem is two-fold: both the apparent-write-speed and the actual-transfer speed are extremely low. I want to check that WinXP (and 98?) aren't just experiencing the same problem but reporting it differently Smiley. So, here's a (very simple) test client with source code, that does NOT output what it gets back to screen (System.out.println being notoriously bottlenecking...).

NOTE: Server HAS changed slightly, so you need to download both class files (or source files, if you prefer)

http://www.grexengine.com/sections/people/adam/NioTester.class
http://www.grexengine.com/sections/people/adam/NioTesterClient.class

http://www.grexengine.com/sections/people/adam/NioTester.java
http://www.grexengine.com/sections/people/adam/NioTesterClient.java

As before, both programs come with their own instructions.

malloc will be first against the wall when the revolution comes...
Offline GKW

Senior Duke




Revenge is mine!


« Reply #9 - Posted 2003-07-17 21:47:35 »

10        17.8MB/s
100        0.025MB/s
1000      0.000936MB/s

The thousand buffer test generated, according to the task manager, 250MB of garbage.  So take that with a grain of salt since about 3/4 of the way through the test my cache started thrashing like mad.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Abuse

JGO Knight


Medals: 13


falling into the abyss of reality


« Reply #10 - Posted 2003-07-17 22:16:55 »

Quote


Yeah, sorry about the div/0. It was only a quick-n-dirty test case Smiley.

It would be nice to know what that IOE means; it doesn't seem to be matched by google (are you summarising a longer description?).


heres the exact error msg, it looks like you are catching an excpetion, then rethrowing it.

Quote

IOE: e = java.io.IOException: Vector write failed


Trying the new version now....

Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! https://dl.dropbox.com/u/54785909/EliteIVsmaller.png
Offline Abuse

JGO Knight


Medals: 13


falling into the abyss of reality


« Reply #11 - Posted 2003-07-17 22:31:25 »

Got pretty much the same.

With 1 buffer :-
Quote

WARNING: time taken was less than 1 ms; speed estimate will be very inaccurate
OVERALL TIME FOR WRITING ONE RESPONSE: 1 ms.
OVERALL SPEED (estimated): 55000 bytes per second.


With 10 buffers :-
Quote

WARNING: time taken was less than 1 ms; speed estimate will be very inaccurate
OVERALL TIME FOR WRITING ONE RESPONSE: 1 ms.
OVERALL SPEED (estimated): 17632000 bytes per second.


With 16 buffers :-
Quote

WARNING: time taken was less than 1 ms; speed estimate will be very inaccurate
OVERALL TIME FOR WRITING ONE RESPONSE: 1 ms.
OVERALL SPEED (estimated): 29340000 bytes per second.


With 17+ buffers :-

Quote

IOE: e = java.io.IOException: Vector write failed


The latency was variable, normally 50 or 60ms, occasionally 100-200, and once or twice it took 1000-2000.

Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! https://dl.dropbox.com/u/54785909/EliteIVsmaller.png
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #12 - Posted 2003-07-18 08:29:03 »

Quote
Got pretty much the same.


Apparently completion ports (windows only decent implementation of Asynch I/O) is only available on NT, 2k, and XP; this may be 50% of your specific problem.

The message I'm printing out is merely Exception.toString(), so that's the error message that your JVM is deciding to create. ISTR that windows has very low hard-coded limits on various aspects of it's networking - I'm guessing that win98 is hard-coded to 16 buffers. There was a bug in JDK 1.4.0 IIRC whereby no windows implementation could do more than 64 sockets at once, or something, because one of these hard-limits? But I'm almost totally ignorant of the implementation details of windows I/O, so don't assume I'm right Smiley.

ISTR also that these values can be changed, something like in the registry or via system calls to reconfigure the limit (just like the good old days of having to specify at boot-time the maximum number of file-handles in your autoexec.bat / config.sys Wink ).

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #13 - Posted 2003-07-18 08:52:06 »

Quote
10        17.8MB/s
100        0.025MB/s
1000      0.000936MB/s

The thousand buffer test generated, according to the task manager, 250MB of garbage.  So take that with a grain of salt since about 3/4 of the way through the test my cache started thrashing like mad.


Thanks for that detail. That mirrors the worst of my experiences with linux, and it's useful to know it happened on win2k too, although I've not yet managed to cause such excessive RAM usage with the simple test case. 1000 buffers use 125Mb according to linux. There must be a bug in my test case somewhere, because either way that sounds an insane amount of RAM usage. There have been some bugs in Sun's JVM's for years where it fails to GC I/O object-refs that have gone out of scope, so I've changed the server to explicitly null most references just before they go out of scope. This may or may not help (makes no difference in linux).

I'm afraid I've also spotted a bug in the server - it was artificially reporting much lower speeds than it should (for the speed calculation, I was using the most-recent number of bytes written, rather than total number).

I've fixed this, and added a line to summarise the total number of bytes actually sent, so you can easily verify the speed yourself Smiley.

Re-testing on my machines, the speeds are somewhat better - but still 100's of times slower than they should be.

Does anyone have any explanation for the behaviour we're seeing here, other than the possibility of a bug in NIO?

malloc will be first against the wall when the revolution comes...
Offline Abuse

JGO Knight


Medals: 13


falling into the abyss of reality


« Reply #14 - Posted 2003-07-18 10:55:33 »

Quote

There have been some bugs in Sun's JVM's for years where it fails to GC I/O object-refs that have gone out of scope, so I've changed the server to explicitly null most references just before they go out of scope. This may or may not help (makes no difference in linux).


I know nothing about NIO, but I have encountered input/output streams being the source of memory leaks b4 as well.
It wasn't in J2SE though, it was J2ME MIDP on the Siemens mobiles. If you lost your reference to an input/output stream before you called close() on it, it never got gc()'ed :S

Unlikely to be the problem your encountering, but it was related.... kinda Wink [ok, maybe not, but still something worth knowing if u ever branch into J2ME Cheesy]

Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! https://dl.dropbox.com/u/54785909/EliteIVsmaller.png
Offline leknor

Junior Duke




ROCK!!!


« Reply #15 - Posted 2003-07-18 11:27:23 »

Quote
Does anyone have any explanation for the behaviour we're seeing here, other than the possibility of a bug in NIO?

Why don't you ask the guy who "wrote the book"? The 3rd section of his site is "Gotta Question" and he was friendly enough to exchange a series of 4 emails with me as I explained and described my problem and two of those emails were before I had bought the book.
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #16 - Posted 2003-07-18 12:28:09 »

Quote


If you lost your reference to an input/output stream before you called close() on it, it never got gc()'ed :S


Yep, I had the same with J2SE 1.3.x. I only noticed because I had an application that ran for weeks at a time, reading and writing to thousands of files an hour (checkpointing).

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #17 - Posted 2003-07-19 09:18:43 »

Quote

Why don't you ask the guy who "wrote the book"? The 3rd section of his site is "Gotta Question" and he was friendly enough to exchange a series of 4 emails with me as I explained and described my problem and two of those emails were before I had bought the book.


Thanks, leknor, that was a good idea. Ron sensibly suggested doing a direct comparison with a manual implementation of iterating over the buffers and reading from each of them.

I've tried this, and now it seems that you can achieve almost 100% speed from doing this, instead of using the (supposedly faster!) gathering write! I've updated the server (NioTester) again, and it now takes an extra argument: either 1 which works exactly as before, or 2 which does:

1  
2  
3  
4  
5  
6  
7  
8  
9  
      long lastWrite = -1l;
      for( int k=0; k<responseBuffers.length; k++)
      {
            lastWrite = ((SocketChannel) key.channel() ).write( responseBuffers[k] );
            //System.out.println ( "    MODE2: Wrote "+lastWrite+" bytes to buffer[ "+k+"]" );
            if( lastWrite < 0 )
                  break;
            bytesWritten = bytesWritten + lastWrite;
      }


This is not perfect: I'm afraid it may distort the readiness notification, and the interleaving of writes to different clients (because instead of doing one write to all buffers, it does consecutive writes to each buffer).

If you care to try this version, and compare the two different modes, I think you'll find that mode 2 works quite nicely:

(for 1000 buffers, mode 2)
--------
BYTES TRANSFERRED: 1,956,986
SPEED (estimated): 2,942,836 bytes per second.
--------
BYTES TRANSFERRED: 1,953,989
SPEED (estimated): 2,594,938 bytes per second.
--------

(for 1000 buffers, mode 1)
--------
BYTES TRANSFERRED: 1,956,986
SPEED (estimated): 89,392 bytes per second.
--------
BYTES TRANSFERRED: 1,952,990
SPEED (estimated): 114,430 bytes per second.
--------

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #18 - Posted 2003-07-19 09:51:57 »

I've submitted a bug-report, and included a pointer to this forum/topic. From current experience (I just got a bug confirmed yesterday), it's taking many weeks to get bugs processed, so it could easily be a month before it gets turned into an "official" bug.

In the meantime, I'm going to use the workaround in my production server, and see if it holds up when loaded.

malloc will be first against the wall when the revolution comes...
Offline bt_dan

Senior Newbie





« Reply #19 - Posted 2003-07-28 15:42:03 »

blah blah,

how has the workaround held up so far.  

I am thinking of putting a bug report in for the problem i was seeing with multiple clients, and so revisited your test.

What java version were you running?  I saw a VERY significant difference when running the exact same numbers between java 1.4.1_01 and java 1.4.2b.

Just wondering if something may have changed between the version.  It runs much faster with 1.4.1_01
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #20 - Posted 2003-07-29 17:28:05 »

Quote

how has the workaround held up so far.  


So far, so good. I haven't subjected it to heavy load, but it's been handling website-tasks (HTTP) on the net for a week nonstop with no crashes. I've recently added some simple load and performance-measures which give figures representing "typical" and "recent" performance (e.g. using rolling-means/ranges and mean-with-feedback/hysteresis).

We're going to external alpha test v.soon, so I'll hopefully have some stats to verify things going OK. However, I wish SNMP was integrated into java in a much easier fashion (PS thanks to leknor for pointer to JMX). It appears right now that the best you can get is to use a J2EE server which supports JMX (it seems that only j2ee vendors currnetly have JMX solutions?), and even JMX is a bitch to get started with. And there's no way I'm touching J2EE for this project - even if Sun bribed me *hint, hint* (nah, not even then Smiley).

Quote

I am thinking of putting a bug report in for the problem i was seeing with multiple clients, and so revisited your test.

What java version were you running?


1.4.2 - I waited until I could test it on the gold release before submitting bug etc.

The current downloads pump out (on the client side only, oops) the java version etc, so people could just cut and paste them along with figures.

malloc will be first against the wall when the revolution comes...
Offline AndersDahlberg

Junior Duke





« Reply #21 - Posted 2003-07-29 17:40:37 »

Quote

It appears right now that the best you can get is to use a J2EE server which supports JMX (it seems that only j2ee vendors currnetly have JMX solutions?), and even JMX is a bitch to get started with. And there's no way I'm touching J2EE for this project - even if Sun bribed me *hint, hint* (nah, not even then ).


If you only want to use jmx maybe you should give jboss a try?
They have a modular server which supposedly gives you the ability to strip out all services you don't want (ejb, jndi, webserver, etc etc ) and only keep the jmx bit, this is done by editing a xml configuration file.

Not sure if this is what you want, but I thought I could as well let you know (if you didn't already Wink
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #22 - Posted 2003-08-02 13:30:19 »

Quote

If you only want to use jmx maybe you should give jboss a try?
They have a modular server which supposedly gives you the ability to strip out all services you don't want (ejb, jndi, webserver, etc etc ) and only keep the jmx bit, this is done by editing a xml configuration file.


Ah, yes, their move to a micro-kernel. I'm afraid I still haven't got out of the habit of thinking "JBoss == opensource J2EE". You're correct that I ought to try a stripped-down version and see how good it is, although I still have nightmares about having tried to do that with things like linux before (where it is architecturally and theoretically easy, but in practice often just pain, pain, and more pain, depending on what you are trying to do).

The biggest question is going to be how easy it is to integrate with an existing server - whilst maintaining security policies and architectural constraints.

malloc will be first against the wall when the revolution comes...
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

TehJavaDev (31 views)
2014-10-27 03:28:38

TehJavaDev (26 views)
2014-10-27 03:27:51

DarkCart (40 views)
2014-10-26 19:37:11

Luminem (21 views)
2014-10-26 10:17:50

Luminem (25 views)
2014-10-26 10:14:04

theagentd (31 views)
2014-10-25 15:46:29

Longarmx (61 views)
2014-10-17 03:59:02

Norakomi (57 views)
2014-10-16 15:22:06

Norakomi (46 views)
2014-10-16 15:20:20

lcass (43 views)
2014-10-15 16:18:58
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!