Java-Gaming.org    
Featured games (91)
games approved by the League of Dukes
Games in Showcase (579)
games submitted by our members
Games in WIP (500)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 [2]
  ignore  |  Print  
  Should you risk using NIO for hard-core networking  (Read 8390 times)
0 Members and 1 Guest are viewing this topic.
Offline leknor

Junior Member




ROCK!!!


« Reply #30 - Posted 2003-05-20 18:34:13 »

Quote
Just a quick glance through indicates to me that this is the kind of stuff we REALLY need to be able to post on here, as part of the "WEBSITE" and not the "FORUMS."
Ok, this will be the one time I say this for this thread. Give me a Wiki. Then we are self-publishing and it's easy to evolve documents. It's not perfect but it's a good fit.
Offline GergisKhan

Junior Member




"C8 H10 N4 O2"


« Reply #31 - Posted 2003-05-20 19:54:52 »

Leknor, is there anything that prevents us from making our own Wiki?  If not, why don't we just start one?

gK

"Go.  Teach them not to mess with us."
          -- Cao Cao, Dynasty Warriors 3
Offline rreyelts

Junior Member




There is nothing Nu under the sun


« Reply #32 - Posted 2003-05-20 20:05:05 »

is there anything that prevents us from making our own Wiki?

I won't put words in Leknor's mouth, but Wiki software is pretty much freely available, just like forums software. The only thing preventing somebody from hosting one is the physical server and bandwith. Do you have a T1 you'd like to donate? Smiley

God bless,
-Toby Reyelts


About me: http://jroller.com/page/rreyelts
Jace - Easier JNI: http://jace.reyelts.com/jace
Retroweaver - Compile on JDK1.5, and deploy on 1.4: http://retroweaver.sf.net.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline GergisKhan

Junior Member




"C8 H10 N4 O2"


« Reply #33 - Posted 2003-05-20 20:11:05 »

T1? No.  Server?  Quite possibly.  I'll start researching.  If this happens I will be hosting it on www.equinoxesolutions.com which is my corporate site, but doesn't see THAT much traffic.  It's currently sitting on a DS3 for now, with a redundant DS3 going in in July, from what my provider tells me.

gK

"Go.  Teach them not to mess with us."
          -- Cao Cao, Dynasty Warriors 3
Offline leknor

Junior Member




ROCK!!!


« Reply #34 - Posted 2003-05-20 20:42:01 »

Quote
Leknor, is there anything that prevents us from making our own Wiki?  If not, why don't we just start one?
Nope, I just don't want anyone to feel like I'm trying to hijack a community or spam one about another one. I like JGO and want to add to it, not spread it out.

If people want I'll create jgo.leknor.com and put TWiki on it. It's perl based like YaBB which should help future JGO intergration and is good enough for IntelliJ. There are a lot of Wiki's out there and it's hard to know which ones are worth a damn.
Offline GergisKhan

Junior Member




"C8 H10 N4 O2"


« Reply #35 - Posted 2003-05-20 20:52:15 »

RE-EDIT:

I'm installing TikiWiki, a PHP-based Wiki since I prefer PHP to Perl any day of the week.  It's quite powerful, GNU LPGL, and looks to serve our needs.

More info in a different topic more appropriate to this soon.


gK

"Go.  Teach them not to mess with us."
          -- Cao Cao, Dynasty Warriors 3
Offline GergisKhan

Junior Member




"C8 H10 N4 O2"


« Reply #36 - Posted 2003-05-20 21:54:27 »

Further Wiki discussion in the "General Announcements" category.

gK

"Go.  Teach them not to mess with us."
          -- Cao Cao, Dynasty Warriors 3
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #37 - Posted 2003-05-21 09:57:19 »

Quote

Yes, I think we need to somehow get together and start seriously thinking about getting up GOOD articles on how to program games, including all aspects such as networking.


This sounds good; but wikis might not be good as the mechanism.

I've used wikis a fair bit before, and they are a great tool for lots of situations, but have significant problems with writing coherent and/or authoritative literature. The biggest problem is that any document/webpage that seeks to help people by informing them and comparing and contrasting different approaches MUST be moderated, and this pushes it away from what Wiki's are best at.

I've participated in using wiki's to compose documents before, where one author (or maybe a couple of them) then created a new document based on everything in the Wiki. This works well.

However, with something like this, where accuracy is critical, and it's really easy to make mistakes, I think it's important that we have everything confirmed/verified by someone else before telling people about it.

...howabout using a wiki for a "suggested architecture/pattern/etc" area, with each submission including a test case? Then, as each test case + idea etc gets verified, they can be transferred into a moderated document?

So, if you need accurate info, you read the latest version of the doc. If you want to see what new ideas people are kicking about, and feedback into them, and/or fix bugs in their source/testcase, you go to the "submissions" area.

Shrug. Just throwing out ideas....

malloc will be first against the wall when the revolution comes...
Offline bt_dan

Senior Newbie





« Reply #38 - Posted 2003-06-04 16:22:01 »

Why did  this thread die?

I essentially cleaned down my sample code I was working with so that there is only 1 thread with one selector for all 3 operations: OP_CONNECT, OP_READ, OP_WRITE.  I know this is not the best approach, but wanted to remove any chance of having MT issues.  The following code was adapted from a Sun developer example and modified to keep track of how long it takes for each operation.  

The general flow is the following:
1) The Server starts up and listens for connecting sockets.
2) Once the predetermined number of connections have connected to the server, the server then sends 1 byte to all the connections telling them to send their payload.  This is the initialization phase.  After sending the bytes, the connections selection key interest ops is set to read.
3) The clients upon reading this one byte from the server, then switch their interest ops to write, and send their payload to the server.  This is the clients write phase.  The clients then switch interest ops back to read to be ready for the response from server.
4) The server then reads the payload for all the clients.  This is the servers read phase.  
5) When all bytes for a client connection have been read, the server immediately switches that connections selection key interest ops to write and writes back all the bytes received(simple echo).  This is the servers write phase.
6) The client then reads all the bytes sent back from the server.  This is the clients read phase.
7)  On the server, once all bytes have been read for all client connections, the server repeats the initialization phase (#2 above ) all over again to repeat the process up to the number of trips pre-determined.

Each phase keeps track of how long it takes to perform its operation based on System.currentTimeMillis() taken before and after each operation.  Then after each trip is completed, a print out is done summarizing the results.

The Server Trip results contain:

Total time: The time between the servers first read operation and the servers last write operation for this trip.
Init time: The time spent actually writting the 1 initialization byte to the sockets.
Read time: The time sent actually reading the payload from the socket, not counting time between select() opeation.
Write time: The time sent actually writing the payload from the socket, not counting time between select() opeation.
Read Sel Time: The time between the first read operation on a key and the last read operation on a key.  This DOES include time spent inside the select() method as well as time spent inside the read()
Write Sel Time: The time between the first write operation on a key and the last write operation on a key.  This DOES include time spent inside the select() method as well as time spent inside the write()


The Client process is a group of connections and has the following trip summary:

Read Time: This is the sum of all clients time to finish reading in the payload.  This is misleading as there is the chance of time overlap involved here.
Write Time: This is the sum of all clients time to finish writing the payload to the socket.  This is misleading as there is the chance of time overlap involved here.
Round Trip Time(RTT): This is the sum of all clients time between their final write time and their first read time.  This is essentially the time take to go across the network, be processed by the server, then get back to the client.  This is misleading as there is the chance of time overlap involved here.
Total Time: This is the sum of all clients time between their first write time and their final read time.  This is essentially the time take to write all bytes, go across the network, be processed by the server, then get back to the client, and all bytes read in.  This is misleading as there is the chance of time overlap involved here.

The Avgs are then printed out.  This is a better understanding as it is the total values printed in the previous line and explained above.  This is the values above divided by the number of clients.

The next line gives more insight:
Just Reading: The time sent actually reading the payload from the socket, not counting time between select() opeation.
Just Writing: The time sent actually writing the payload from the socket, not counting time between select() opeation.
Total Read time: The time between the first read operation on a key and the last read operation on a key.  This DOES include time spent inside the select() method as well as time spent inside the read()
Total Write Time: The time between the first write operation on a key and the last write operation on a key.  This DOES include time spent inside the select() method as well as time spent inside the write()

REASON FOR THIS POST:
Now, after explaining all this, the purpose of my post is to get some insight into the numbers I am receiving from running this test.  I am using the sample code below to do some stress testing on how many connections an advanced gaming server built on NIO can handle.  The tests I have ran have been rather dissapointing, which leads me to think there is something going on that I can't see.  

For example:  When running the test with 2000 clients, sending 100 bytes each, sent 100 trips, I will see the server taking anywhere from 5 to 20 seconds to read in all the bytes, and write them all back to the clients.  It appears that there is alot of time spent inside the select() method waiting to be notified for the operation to take place.  The actual reads and writes aren't taking that much time.  Because there is only one thread, that rules out MT deadlocks.

I have ran this test on Windows, Linux(Red Hat), and Solaris, all with similar results.  All on my company's 100Mbs(I think ??) network.  I have had my Network Admin sniff the network to see if there is any lags in the network and he assures me there is nothing slow or fishy going on.

Any insight or improvements on the code is welcome.  Or any other code to test how many connections NIO can handle.  My objective is to see how many players can send up their scores to be processed and ranked within a 10 second window.

Thanks for any insight.  Sorry for the long post.

The code will be the following post





Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #39 - Posted 2003-06-04 16:48:47 »

Quote
Why did  this thread die?

I essentially cleaned down my sample code I was working with so that there is only 1 thread with one selector for all 3 operations: OP_CONNECT, OP_READ, OP_WRITE.  I know this is not the best approach, but wanted to remove any chance of having MT issues.  The following code was adapted from a Sun developer example and modified to keep track of how long it takes for each operation.  



If I understand correctly, to summarise (and slightly over-simplify):

 1: lots of clients get connected to the server, and go into a wait() situation.
 2: once all are waiting, server does the equivalent of a notifyall, to get ALL of them *simultaneously* to do their transfers.
 3: ...server waits for all to get back into the wait, then starts again

Questions:

 1. Are your clients and server separate machines, with fully switched connection (as opposed to hubs)?
 2. What happens if you double the number of physical client machines, halving the number of client-apps running on each?
 3. What's the LAN saturation like?
 4. Any collision-storms going on? (sounds like you have an admin who would have net-management tools to give you this info).
 5. Have you tried having two selectors on the server, one for read, one for write? Changing the interestOps for a key is a trivial method call hiding a non-trivial implementation (note that it has at least one blocking point just to change the interestOps!)...it could be that there is some delay added by the switching back and forth of all those interest sets.

However, I suspect the problem is something completely different and hope that someone else will reply with a "nah, this is the problem:" post instead ;).

PS the thread isn't dead, just sleeping ;)...personally, I've suddenly received a couple of new deadlines - and haven't had time to come back to this thread :(.

malloc will be first against the wall when the revolution comes...
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline bt_dan

Senior Newbie





« Reply #40 - Posted 2003-06-04 16:48:55 »

Oops.  Too much code to post.  I did a quick ref for it at:

http://users.adelphia.net/~dfellars1/NIOSelectorCode.html

Sorry about that.  Please check it out there.  

Thanks again for any help.
Offline bt_dan

Senior Newbie





« Reply #41 - Posted 2003-06-04 17:01:36 »

Quote


If I understand correctly, to summarise (and slightly over-simplify):

 1: lots of clients get connected to the server, and go into a wait() situation.
 2: once all are waiting, server does the equivalent of a notifyall, to get ALL of them *simultaneously* to do their transfers.
 3: ...server waits for all to get back into the wait, then starts again


That is correct.

Quote

 1. Are your clients and server separate machines, with fully switched connection (as opposed to hubs)?


Yes, I have ran them on seperate machines on different network, I have ran them on the same network, as well as on the same machine.

Quote


 2. What happens if you double the number of physical client machines, halving the number of client-apps running on each?



I have tried this with somewhat similar results.
Quote


 3. What's the LAN saturation like?
 4. Any collision-storms going on? (sounds like you have an admin who would have net-management tools to give you this info).


I will have to get together again with my admin.  They wont allow me to run any network monitoring myself so I have to bug them to get them to do it.

Quote


 5. Have you tried having two selectors on the server, one for read, one for write? Changing the interestOps for a key is a trivial method call hiding a non-trivial implementation (note that it has at least one blocking point just to change the interestOps!)...it could be that there is some delay added by the switching back and forth of all those interest sets.


I originally had it designed this way. but was worried that my issue was MT, so went to a single selector approach.  Since going to a single selector, I actually saw it speed up, thus implying there may have been a MT issue.  although I still cant see where.


Quote


However, I suspect the problem is something completely different and hope that someone else will reply with a "nah, this is the problem:" post instead ;).


That is what I am hoping for as well.  

Thanks

Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #42 - Posted 2003-06-04 17:41:25 »

Quote


I will have to get together again with my admin.  They wont allow me to run any network monitoring myself so I have to bug them to get them to do it.

I originally had it designed this way. but was worried that my issue was MT, so went to a single selector approach.  Since going to a single selector, I actually saw it speed up, thus implying there may have been a MT issue.  although I still cant see where.


Well, two more things. Firstly, I have seen something like this before, but I can't remember what the problem was, other than that someone had made a stupid (and subtle) mistake in the implementation...will see if I can find a changelog for the app where I think it happened. IIRC, it was a buffer that was being abused (not being cleared properly, or similar) or in fact that data was not fully being read from buffers. It was surprising that the thing worked at all, once you saw the mistake - but it did, just at 15% the expected speed, and (only at very high load) dropping requests (which should have been impossible).

Having compiled your code, I get the following behaviour:

Start with 1 client and 1 server on the same machine
2000 instances per client, 10 trips each, sending 1000 bytes.

The observed behaviour:
 - the sequential numbers (which I assume are the clients connecting, havent' read the source yet) go up, but pause at about 150, 250, 450, 500, for up to 10 seconds each. (the points at which it pauses are clearly random)
 - once everything is connected, it all goes a bit slowly, and then I hit the "too many open sockets" limit (on linux; don't have root access on that machine).

But the first time I changed ONLY the "2000 client instances" down to 500, then ALL the pauses that occurred prior to 500 evaporate. This suggested there is at least one problem caused by having too many client instances going on on one machine...

Sadly, repeated runs (without changing ANYTHING) show pauses at about 100, 300, etc.

One final thought - on some systems, java select might be implemented using the inefficient OS primitive which iterates across the ENTIRE table of socket descriptors, rather than just those that have altered state. However, given that you had the same behaviour on linux and windows, this is probably irrelevant; I assume you are using win2k or xp, both of which use reasonably good OS IO.

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #43 - Posted 2003-06-04 17:49:18 »

Quote
Why did  this thread die?

I essentially cleaned down my sample code I was working with so that there is only 1 thread with one selector for all 3 operations: OP_CONNECT, OP_READ, OP_WRITE.  I know this is not the best approach, but wanted to remove any chance of having MT issues.  The following code was adapted from a Sun developer example and modified to keep track of how long it takes for each operation.  

The general flow is the following:
1) The Server starts up and listens for connecting sockets.
2) Once the predetermined number of connections have connected to the server, the server then sends 1 byte to all the connections telling them to send their payload.  This is the initialization phase.  After sending the bytes, the connections selection key interest ops is set to read.
3) The clients upon reading this one byte from the server, then switch their interest ops to write, and send their payload to the server.  This is the clients write phase.  The clients then switch interest ops back to read to be ready for the response from server.
4) The server then reads the payload for all the clients.  This is the servers read phase.  
5) When all bytes for a client connection have been read, the server immediately switches that connections selection key interest ops to write and writes back all the bytes received(simple echo).  This is the servers write phase.
6) The client then reads all the bytes sent back from the server.  This is the clients read phase.
7)  On the server, once all bytes have been read for all client connections, the server repeats the initialization phase (#2 above ) all over again to repeat the process up to the number of trips pre-determined.


Ahem. We can also add:

1.5, 4.5: Server sleeps for 10 milliseconds if the number of keys returned from select == 0.

1  
2  
3  
4  
5  
6  
nNumKeys = sel.selectNow(  );

            if( nNumKeys <= 0 )
            {
                thread.sleep( 10 );
            }


That's a pretty likely culprit! I added a sys.out.print at that point, and found that the sleep is being called hundreds if not THOUSANDS of times.

Bug found? :) Perhaps...

malloc will be first against the wall when the revolution comes...
Offline bt_dan

Senior Newbie





« Reply #44 - Posted 2003-06-04 17:56:32 »

Please forgive me for my stupidity.  This is one of those "I could have sworn I changed that" moments.  I dont know if you remember one of my previous post where I mentioned that when I performed a sleep after a selectNow call it speed things up, which made you blahblahblah suggest an MT issue, so that is why i made it all one thread and *thought* it put it back to using the select() method by passing true to that method.  But I obviously didnt.  Sorry.  I have updated the code on my webserver and will re-run again with new code to see if it speeds things up.

Thanks , and again I am sorry for that.
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #45 - Posted 2003-06-04 18:15:52 »

Quote
Please forgive me for my stupidity.


No worries; it seems to be one of the big problems with NBIO server-development that it's really easy to make subtly stupid mistakes that are never quite show-stoppers, and so they're hard to discover. I've made similar mistakes in NBIO code a couple of times Wink.

EDIT: I only say this because NBIO is particularly difficult to spot problems, and using encapsulation etc can be much more helpful than normal - and I expect you'll continue to run into hard-to-trace problems as you continue to modify your test app.

However, your monolithic structure (e.g. 150+ lines of code in your server run method!) leaves a lot to be desired; it would be much easier to understand and scan your code for possible problems if you split it up more. I notice there are lots of methods, but often for only a few lines of code each; I'd suggest a method each for handling acceptable, readable, and writable keys - and help if you decide to split those functions out.

A split into separate classes would also help.

Only because you're having strange problems, I'd also suggest commenting out all the fancy, exotic stuff. E.g. settting socket receive buffer sizes, tcpNoDelay, etc. The NIO API's are currently  under-tested by Sun, and it's been quite easy in 1.4.0 and even 1.4.1 to break them by doing anything exotic (<rant>for the 1.4.x series, Sun appeared to lack anyone in the NIO team who understood unit and system testing - basic stuff that should have been automatically tested in unit tests went unfixed for both .0 and .1 releases</rant>).

malloc will be first against the wall when the revolution comes...
Offline bt_dan

Senior Newbie





« Reply #46 - Posted 2003-06-27 17:03:10 »

Im back,

I cleaned up the code to make it more legible and it now uses the select() method instead of sleeping.  I am still seeing slower-than-expected results with the sample code and would like more feedback if any.

the code is found at: http://users.adelphia.net/~dfellars1/NIOSelectorCode.html


Also, on a different NIO related topic, I have 2 servers communicating with eachother over a socket that remains open the entire time the servers are up.  Up until recently I was running the servers on the same network, but recently moved one of them to a different network and am getting the following situation:

The connection is getting dropped for whatever reason, which is to be expected.  However, the client that writes to the dropped connection does not throw any exception when writing.  It writes all the bytes to the channel and returns as if everything is ok.  Then maybe a minute later I will receive a read of -1 specifying the connection was dropped, but the writen message was never received on the server.  

What I would like to be able to do is determine that the connection is dropped before writing to it, so that I can reconnect and then write to the channel.  Is there an easy way to detect a dropped connection?  I am already using isOpen() and key.isValid() which are returning true each time.  I have also tried socket.setKeepAlive(true) with no success.

Should the SocketChannel.isConnected() tell whether the connection is available to write to, or does that just state that the connection has connected to the server?

I have implemented a pinging system to keep traffic going across the connection, but dont want to have to rely on this to keep the connection alive.

Thanks for an insight

Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #47 - Posted 2003-07-16 12:14:06 »

I may be seeing the same problem w.r.t. poor performance, I've started a separate thread for it:

http://www.java-gaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=Networking;action=display;num=1058357118;start=0

In the meantime...
Quote

The connection is getting dropped for whatever reason, which is to be expected.  However, the client that writes to the dropped connection does not throw any exception when writing.  It writes all the bytes to the channel and returns as if everything is ok.  Then maybe a minute later I will receive a read of -1 specifying the connection was dropped, but the writen message was never received on the server.  

What I would like to be able to do is determine that the connection is dropped before writing to it, so that I can reconnect and then write to the channel.  Is there an easy way to detect a dropped connection?  I am already using isOpen() and key.isValid() which are returning true each time.  I have also tried socket.setKeepAlive(true) with no success.

Should the SocketChannel.isConnected() tell whether the connection is available to write to, or does that just state that the connection has connected to the server?


There have been MANY bugs in this particular part of the API, although AFAIAA most have been fixed now (but not necessarily all). Most of them were platform specific.

You might want to try a non-blocking read and check if it is returning -1 bytes (it would return 0 if there were no bytes to read). My own code is a hodge-podge of different techniques to detect dead connections. If that doesn't work, let me know and I'll try digging out the different things I've been using.

In a production server, I've got it working on the net without any problems with hanging connections - and this is despite the fact that people have been attacking it (and doing bad things, like breaking the protocol, disconnecting uncleanly, etc).

Also take a look at:

http://www.grexengine.com/sections/people/adam/adamsguidetonio.html

which I updated recently. It has some limited coverage of a few more bugs to do with this. e.g.: "some versions of Java 1.4.x actually require you to register OP_ACCEPT as well as OP_READ or OP_WRITE, instead of READ or WRITE on it's own"....although there's still a lot of stuff I haven't covered there yet (if you think of anything specific, jog me and I'll try and dig out notes and fill it in Wink)

It's also worth looking at the "known bugs list" for 1.4.2. There is at least one NIO-related bug.

malloc will be first against the wall when the revolution comes...
Pages: 1 [2]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (35 views)
2014-04-15 18:08:23

BurntPizza (31 views)
2014-04-15 03:46:01

UprightPath (46 views)
2014-04-14 17:39:50

UprightPath (29 views)
2014-04-14 17:35:47

Porlus (46 views)
2014-04-14 15:48:38

tom_mai78101 (67 views)
2014-04-10 04:04:31

BurntPizza (127 views)
2014-04-08 23:06:04

tom_mai78101 (227 views)
2014-04-05 13:34:39

trollwarrior1 (192 views)
2014-04-04 12:06:45

CJLetsGame (199 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!