Java-Gaming.org    
Featured games (91)
games approved by the League of Dukes
Games in Showcase (579)
games submitted by our members
Games in WIP (500)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  SocketChannels die, I'm not invited to the funeral  (Read 3485 times)
0 Members and 1 Guest are viewing this topic.
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Posted 2003-08-10 15:14:10 »

Is this a design flaw in the NIO classes? I can't find an API-way to guarantee being informed when a SocketChannel gets disconnected....the best I can do is to discard the whole point of NBIO and periodically read from every SC, ignoring the Selector!?!

There's no "OP_DISCONNECT" to listen for, and it seems that most of my SC's just drop silently, without triggering an OP_READ event (which I would have expected).

I thought this was a bug from 1.4.0 / .1 that had been fixed by now, but I'm getting this with 1.4.2

The funny thing is that you won't realise it's happening, and it's never been a problem for me before (because I didn't care). But now I have an app where I have OTHER code that is dependent on being notified when an SC disappears. I store info for each SC as it is accepted, which I then need to supplment with data like "total time connected" - which I can't fnd out unless I get close/disconnect notification.

malloc will be first against the wall when the revolution comes...
Offline kevglass

JGO Kernel


Medals: 85
Projects: 25


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #1 - Posted 2003-08-10 15:18:17 »

I'm currently seeing behavior synonymous (sp?) with C libraries I've worked with before. The selector reports an FD (channel) as being ready to read (OP_READ) when its been closed. Reading from that channel fails.

I've tried stuff on XP and Linux with the same results. What platform are you seeing these problems?

Kev

PS. I'm working on 1.4.1_02-b06
PPS. Great topic title.. we need more like this Smiley

Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #2 - Posted 2003-08-10 15:56:32 »

Quote
I'm currently seeing behavior synonymous (sp?) with C libraries I've worked with before. The selector reports an FD (channel) as being ready to read (OP_READ) when its been closed. Reading from that channel fails.

I've tried stuff on XP and Linux with the same results. What platform are you seeing these problems?

PS. I'm working on 1.4.1_02-b06


Yeah, my thoughts exactly - I'd coded in the expectation an OP_READ would come through on SC death. However, IIRC there has been at least one bug for this not happening as expected.

Unfortunately, the dreadful state of the NIO docs being as they are, there is no "official" answer on this. Hard to tell if it's a bug or a feature Wink.

I'm running on linux, and using 1.4.2 (there were some other bugs in NIO that got fixed for 1.4.2 that were causing me problems, so I upgraded ASAP).

I'll see if I can find a 1.4.1 install to try this on, it could certainly be a regression (as I said, I've never needed to monitor this before Sad). I'll also see if I can reproduce under XP. Hmm. Thanks for the thoughts.

malloc will be first against the wall when the revolution comes...
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #3 - Posted 2003-08-13 15:48:09 »

Quote
I'm currently seeing behavior synonymous (sp?) with C libraries I've worked with before. The selector reports an FD (channel) as being ready to read (OP_READ) when its been closed. Reading from that channel fails.

What platform are you seeing these problems?


Ha! Confounded again by the idiot who (un-) documented NIO. It's returning dropped connection on OP_WRITE for me !

Yay. Unfortunately, that means that my ACCEPT and READ Selector's get no notification at all.

I'm really pleased that between 1.4.1 and 1.4.2 they apparently changed the linux implementation to go from READ to WRITE. That's gonna hurt a lot of current code Smiley. Obviously, it's perfectly fair, because they never explicitly documented which way around it was going to be, so it's OUR fault for trying to use it </joyful sarcasm>

Actually, I suspect it's dependent upon what underlying NBIO you have available on your OS - and linux can have any of about 5 different NBIO libraries at the moment, depending on what you installed.

I would guess that this is more "kernel dependent" than it is "JDK version dependent"...I'm running 2.4.18. You?

malloc will be first against the wall when the revolution comes...
Offline darcone

Junior Member




Size matters


« Reply #4 - Posted 2003-08-13 15:53:15 »

Isnt it enough to just listen if read(buffer) returns -1 which would mean the end of stream?
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #5 - Posted 2003-08-13 16:03:23 »

Quote
Isnt it enough to just listen if read(buffer) returns -1 which would mean the end of stream?


Ahem. How, exactly, would I do that? (think about it: you couldn't write source code that would actually be executed in an NBIO situation).

...


Bear in mind that the way NIO works (with Selector's) is that it's Event-driven. If no event fires (to let you know you can perform a read), you cannot (well you can call the read() method, but see below...) perform a read.

You could use a select( long ), and "read and be damned", but in legitimate situations, you would:

A. Lose the performance boost you got from using event-semantics

B. Potentially block permanently on every read that WASN'T a dropped connection, thereby making your server useless. (this is under-specified by the NIO API - it is not even guaranteed that an NB channel will not block even when the SelectionKey says it won't. However, the wording leads me to believe that the only reason it is not guaranteed to be accurate is because you (the programmer) may invalidate it yourself, by doing the read - and NIO doesn't *automatically* update the status if you do.)


malloc will be first against the wall when the revolution comes...
Offline darcone

Junior Member




Size matters


« Reply #6 - Posted 2003-08-13 16:41:32 »

Well, in my nice higher-lvl api I use read() to determine the end of stream on a SocketChannel and it works every time.. Obviously the channel is registered as readable once again after it has delivered the last message to the buffer. Thus, able to deliver the -1 message that means end of stream. I tried this by connecting-disconnecting several times with different combinations (writing to the server and disconnecting as fast as possible etc) and it worked every time.
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #7 - Posted 2003-08-13 17:29:02 »

Quote
Well, in my nice higher-lvl api I use read() to determine the end of stream on a SocketChannel and it works every time.. Obviously the channel is registered as readable once again after it has delivered the last message to the buffer. Thus, able to deliver the -1 message that means end of stream. I tried this by connecting-disconnecting several times with different combinations (writing to the server and disconnecting as fast as possible etc) and it worked every time.


I'm confused here. That sounds exactly like kevglass described - he gets an OP_READ event when the connection is dropped. To recap, my problem is that I was *hoping* to get one, but I do not. I think that basically what you're saying is that it works for you the same way it works for kevglass.

The problem here is that without an event from the select() to indicate that the connection has been dropped, there is no way of detecting it has been dropped!

My followup post was to say that on a Selector with OP_WRITE registered, I did actually get an OP_WRITE notification when the channel was dropped. If this is consistent (note: NONE of this is documented / officially specified in the API), then I could try registering for OP_WRITE on my read-only Selector.

In fact, this sounds just like the bug report I remember from Sun - if you do NOT register OP_WRITE, you do NOT get the OP_READ notification on a closed connection, but if you ONLY register OP_WRITE, you get an OP_WRITE notification instead! (but I *thought* I read that as one of the "fixed" bugs for 1.4.x. Possibly this has regressed with 1.4.2?)

malloc will be first against the wall when the revolution comes...
Offline darcone

Junior Member




Size matters


« Reply #8 - Posted 2003-08-13 17:44:02 »

Heh, if you want I will upload my source and you can take a look.
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #9 - Posted 2003-08-13 18:15:52 »

Quote
Heh, if you want I will upload my source and you can take a look.


OK, but first it'd be easier if you explain what event on the Selector you are reacting to when you do your read? And if you're not reacting to any event at all, how are you doing non-blocking I/O without Selector's? Otherwise, I'm not going to have a clue what your code is doing Sad.

malloc will be first against the wall when the revolution comes...
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline darcone

Junior Member




Size matters


« Reply #10 - Posted 2003-08-13 18:56:44 »

I am looking for isReadable() event, and then reading to a buffer... As I said, this has worked every time so far Smiley

Anyhow, here is a page with my api and the javadoc:

SOURCE: http://www.naturalgamer.com/OverConn/OverConn.zip

JavaDoc: http://www.naturalgamer.com/OverConn/Javadoc/
Offline kevglass

JGO Kernel


Medals: 85
Projects: 25


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #11 - Posted 2003-08-13 19:22:26 »

So you're registering for OP_READ and OP_WRITE for every channel.

Interesting (and slight odd and worrying point) is that you have a selector for every connection/channel pair. From what I understand (not sure) the point of selectors is partly to reduce the need of a thread for blocking communication for every connection. By having a selector (which you block on) for every connection you're not getting a true benefit... that being said, you're apparantly recieving a -1 everytime a channel closes.

For what its worth, I'm using one selector for all my channels, meaning I only need one thread. I only get a -1 on the channel when a TCP socket closes if the other end with killed (as opposed to closed cleanly).

BlahBlahEtc - I did some more looking into it, and I don't reliably get -1 on the TCP channel when it closes. Only if the other end is killed. I've added a keep alive to my TCP channel which means I get a close within 3 seconds, but its not really good enough.

Kev

PS. Thanks for the source

Offline darcone

Junior Member




Size matters


« Reply #12 - Posted 2003-08-13 19:25:43 »

No no... if you look at the OverConn class, it is made to be the ONLY Thread, and when you need a connection, you call .connect() which returns a OverChannel and registers it to the selector. Thus, you only use one selector for all the channels, as long as you only have one OverConn Thread running.

Some simple app that needs to connect to 5 irc servers at once would look like this...

------------------------------------------------------------

OverConn overConn = new OverConn(50, 5000);
overConn.start(); // Started Thread

OverChannel client1 = overConn.connect(OverChannel.TCP, "wineasy.se.quakenet.org", 6667, "ISO-8859-1");

OverChannel client2 = overConn.connect(OverChannel.TCP, "wineasy.se.quakenet.org", 6667, "ISO-8859-1");

OverChannel client3 = overConn.connect(OverChannel.TCP, "wineasy.se.quakenet.org", 6667, "ISO-8859-1");

OverChannel client4 = overConn.connect(OverChannel.TCP, "wineasy.se.quakenet.org", 6667, "ISO-8859-1");

OverChannel client5 = overConn.connect(OverChannel.TCP, "wineasy.se.quakenet.org", 6667, "ISO-8859-1");

----------------------------------------------------------------

That would leave 5 connections running in that one OverConn thread... Following the multiplexing paradigm.

Offline kevglass

JGO Kernel


Medals: 85
Projects: 25


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #13 - Posted 2003-08-13 19:28:46 »

Ah, sorry, only had a quick look over, must have missed that. I think the naming confused me a bit. I thought OverConn would be a connection object, one created for each connection.. fair play, didn't get it. My apologises.

Kev

Offline darcone

Junior Member




Size matters


« Reply #14 - Posted 2003-08-13 19:31:19 »

np, heh.. cool.. while you were writing that, I edited my message to include the code, so while I saved the modification, your message appeared =P

The naming is chosen so that it is clear its only a abstraction of the "real" java.nio channels.

Also, something I need to fix is that the channels gets registered as OP_WRITE only when it has something to send.
Offline darcone

Junior Member




Size matters


« Reply #15 - Posted 2003-08-13 19:39:02 »

This also raises another question, is it wise to un-register a channel from a selector in order to register it again with new options? This would be nice for channels that are idle for long times, but channels that need to send alot would be re-registered alot and this could impact on performance and perhaps introduce bugs?
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #16 - Posted 2003-09-01 18:09:48 »

Quote
This also raises another question, is it wise to un-register a channel from a selector in order to register it again with new options? This would be nice for channels that are idle for long times, but channels that need to send alot would be re-registered alot and this could impact on performance and perhaps introduce bugs?


C/C++ non-blocking I/O libraries vary wildly in the answer to your question. It's entirely implementation dependent. So, for Java, it needs to either have an API call which does something akin to "getCapabilities" (or something that indicates what the Selector is good at etc), or for Sun to mandate how any given implementation should work.

Either way, it ought to be documented.

I would suggest you submit a bug report to Sun on this. If they get enough people telling them they need to document the NIO API's, they might actually do it.

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #17 - Posted 2003-09-01 18:16:02 »

Quote

For what its worth, I'm using one selector for all my channels, meaning I only need one thread. I only get a -1 on the channel when a TCP socket closes if the other end with killed (as opposed to closed cleanly).

BlahBlahEtc - I did some more looking into it, and I don't reliably get -1 on the TCP channel when it closes. Only if the other end is killed. I've added a keep alive to my TCP channel which means I get a close within 3 seconds, but its not really good enough.


I'm pretty desperate here. I've found several scenarios in which disconnect is NEVER reported, no matter what you do. I also fear that a production server where we're running Sun's 1.4.2 linux is doing some form of processing that's O(n) or worse in the number of things registered with the Selector. After a few days it's maxing out the CPU whilst doing nothing but ordinary selects, and I'm afraid that it's all those dead connections that are causing the problem. The only way we can get our application to work at the moment is for someone to login and quit java each morning, and restart the server. This is beyond ridiculous - this is pure farce.

How have you implemented your keep-alive, and does it work for you in all situations? On some ports, I can change the protocol and force heartbeat/keepalive, but other ports have to be HTTP - which AFAICS makes keepalive impossible Huh

Argghhh.

malloc will be first against the wall when the revolution comes...
Offline kevglass

JGO Kernel


Medals: 85
Projects: 25


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #18 - Posted 2003-09-01 18:24:48 »

I've got no HTTP connections to worry about, but for what its worth, assuming you are just downloading a bunch of data down the HTTP connection you could just force a disconnect if you don't recieve data for a while. Hideous but I suppose it might work.

Kev

Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #19 - Posted 2003-09-01 18:48:05 »

Quote
I've got no HTTP connections to worry about, but for what its worth, assuming you are just downloading a bunch of data down the HTTP connection you could just force a disconnect if you don't recieve data for a while. Hideous but I suppose it might work.


Chuckle. I'm desperate enough to give it a try Smiley. Whilst reading this it's also occurred to me that the CPU problems always occur on the server/app that is also running an SSLServerSocket. A light bulb above my head is starting to glimmer...I'd assumed that the SSLServerSocket was a mature implementation that wouldn't spontaneously after a matter of hours start hammering the CPU for no apparent reason. Especially when there is ZERO activity on any of the ports.

I would still wager that the problem is from nio, but it gives me another avenue to try. I've got five or six things now. Looks like it's going to be a loooong night  :-/

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #20 - Posted 2003-09-01 19:06:16 »

Quote


Chuckle. I'm desperate enough to give it a try Smiley. Whilst reading this it's also occurred to me that the CPU problems always occur on the server/app that is also running an SSLServerSocket.


...so I went over our classes which integrate old io with nio with a fine toothcomb (note: Sun's SSL hasn't yet been upgraded to support SelectableChannel's). And I found a point where a particular malformed HTTP request could cause a busy-wait on an individual Socket - but only if that Socket had been created via old io, not via nio. This could, of course, explain cpu-hogging. I fixed that loophole and put in an alert so that if a request triggers it, we'll get a note in the log, and I'll know that that problem at least was being exercised.

I've also put in Kev's brute-force "if you take too long to request data, I kill you" approach Smiley. We'll just have to wait and see which one triggers first...

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #21 - Posted 2003-09-11 19:26:16 »

Quote


...so I went over our classes which integrate old io with nio with a fine toothcomb (note: Sun's SSL hasn't yet been upgraded to support SelectableChannel's). And I found a point where a particular malformed HTTP request could cause a busy-wait on an individual Socket - but only if that Socket had been created via old io, not via nio.
...
I've also put in Kev's brute-force "if you take too long to request data, I kill you" approach Smiley. We'll just have to wait and see which one triggers first...


In the long-run, there were quite a lot of malformed HTTP requests, so it looks like that was causing the problem...pity that you have to code, maintain and support everything twice at the moment, if you want to use SSL (our nio-related classes were always robust...).

malloc will be first against the wall when the revolution comes...
Offline blahblahblahh

JGO Coder


Medals: 1


http://t-machine.org


« Reply #22 - Posted 2003-09-11 19:28:32 »

Quote
This also raises another question, is it wise to un-register a channel from a selector in order to register it again with new options? This would be nice for channels that are idle for long times, but channels that need to send alot would be re-registered alot and this could impact on performance and perhaps introduce bugs?


...I realised my previous answer was a bit too vague. The situation normally (in C) is that your Asynch library is working one of two ways - either it iterates over ALL Channels (or the equivalent for your asynch lib), or it iterates only over those which have changed state.

Obviously, in the former case it's a REALLY good idea to de-register if you have large numbers of channels (e.g. several thousand connected clients). OTOH, in the latter case it's a bad idea - because you're adding considerable extra overhead with no gain in performance.

Hope that makes it a little clearer...

malloc will be first against the wall when the revolution comes...
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

xsi3rr4x (32 views)
2014-04-15 18:08:23

BurntPizza (29 views)
2014-04-15 03:46:01

UprightPath (44 views)
2014-04-14 17:39:50

UprightPath (27 views)
2014-04-14 17:35:47

Porlus (44 views)
2014-04-14 15:48:38

tom_mai78101 (65 views)
2014-04-10 04:04:31

BurntPizza (125 views)
2014-04-08 23:06:04

tom_mai78101 (225 views)
2014-04-05 13:34:39

trollwarrior1 (190 views)
2014-04-04 12:06:45

CJLetsGame (198 views)
2014-04-01 02:16:10
List of Learning Resources
by SHC
2014-04-18 03:17:39

List of Learning Resources
by Longarmx
2014-04-08 03:14:44

Good Examples
by matheus23
2014-04-05 13:51:37

Good Examples
by Grunnt
2014-04-03 15:48:46

Good Examples
by Grunnt
2014-04-03 15:48:37

Good Examples
by matheus23
2014-04-01 18:40:51

Good Examples
by matheus23
2014-04-01 18:40:34

Anonymous/Local/Inner class gotchas
by Roquen
2014-03-11 15:22:30
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!