Re: tcp close problems under heavy load on 2.6.18



Jean-Francois Smigielski <smig@xxxxxxxxxxxxxx> wrote:
I met a problem with with tcp connections on a linux 2.6.18 (both
clients and servers).

I have an echo service that can be represented by 1, 2, 3 or 4
processes that listen on the same ip/port. This service accepts tens
of thousands of simultaneous connections. Each client process starts
thousands of connections to the service, writes some data, read the
sanswer,wait, close, and then open, write, ...

Both client and server sockets are non-Blocking and use the options
SO_LINGER to avoid letting a lot of sockets in a TIME_WAIT state. I
started with a linger time-out of 0.

I thought it was generally agreed that deliberately causing abortive
closes that way was a "bad thing" - for example, RST's are not
retransmitted, so you could leave the remote in ESTABLISHED etc for a
very long time... And TIME_WAIT is there for a reason - to protect
against the accidental acceptance of old segments from a TCP
connection of the same name.

If I kill the client processes of a host, killing so thousands of
connections at a time, I should observe many tcp RST-flagged
packets, at least one for every socket. But only a part of those
packets are sent, for one half of the original number of
sockets. This happens with more than 4 thousands of client sockets.

Are you certain that your packet sniffer actually saw all the packets?
Sometimes even pcap reporting zero drops doesn't necessarily mean it
did see all the traffic.

If you were tracing on the server, back on the client, a sudden spike
of 4000 RST's going out at once might have filled the driver/NIC's
transmit queues and so some of them may have been dropped, never to be
seen again... It is possible that if you were tracing on the client
that those drops happened before the promiscuous tap (I'm not certain
of that, just speculating).

The observed effect on the server is obvious : all the badly closed
sockets remain in ESTABLISHED state, since the server only answers
to received data...

Ah, so you do see then firsthand one of the reasons an abortive close
of a TCP connection is considered a Bad Thing :)

rick jones
--
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
.



Relevant Pages

  • Re: Banana Republic (was Re: OpenVMS Book Wins award)
    ... client ... No bollocks HTTP, SOAP, XML, Java, Garbage ... receive messages from any number of server processes who in turn could be ... Unlike TCP/IP and/or UDP Sockets with Java that have been around since ...
    (comp.os.vms)
  • Re: Asynchronous socket programming vs. remoting
    ... You are the first person that said I should use sockets. ... them quicker than I can load them from my harddrive using the file system. ... It scales nice too - I tried throwing 400 requests at the server in a span ... > do not need the same assembly on the client and server. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Network intermittently dropping the connection to shared files on server
    ... what we were using with our SBS2000 server with no problems. ... It's a small Server plus 4 Client W/S set up in one office. ... All users that have current connections to the shared files are ... We have a small network < 5 clients connected to a new Dell ...
    (microsoft.public.windows.server.sbs)
  • sockets, closing and TIME_WAIT
    ... During heavy load the server can't follow anymore because the sockets ... my server should be able to handle 10 clients connecting ... This gets a free position in the array of connections, ...
    (comp.unix.programmer)
  • Re: network programming: how does s.accept() work?
    ... The program you contact at Google is a server. ... so, the server will usually assign a new port, say 56399, specifically ... connections to a server remain on the same port, ... sockets is what identifies them. ...
    (comp.lang.python)

Loading