pptpd stops responding -- connections stuck in SYN_RECV state.
From: Ray Van Dolson (rayvd_at_digitalpath.net)
Date: 12/30/04
- Next message: mjessup_at_yahoo.com: "Aborted/dead network connections and other oddities"
- Previous message: /dev/null: "Re: Boot Method?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 30 Dec 2004 08:35:11 -0800
I've also posted this on the poptop-server mailing list -- but I'm
guessing I won't get a response there, so I'm at least hoping someone
here can give me some suggestions. :)
Just have *one* server where this happens. It"s triggered by a network
issue which at this point is out of my hands, but it seems pptpd
shouldn't
react in this way...
Randomly throughout the day on a busy server (800 users), approximately
half the users are disconnected at once and immediately try to
reconnect.
I'm limiting the inbound connections per second with an iptables rule,
but
it seems like even the massive number of disconnects may be causing
pptpd to
stop responding to requests on port 1723. The only way to revive it is
to
stop the process and restart it (at which point I have to kill all the
connetions so we don"t have duplicate IP assignment). Rather messy :)
I'm using ulimit -n 65535 and ulimit -u 71680 with my pptpd process,
which
seems like it should be plenty high.
If I do an lsof on pptpd when it's in it's "frozen" state I don"t see
anything out of the ordinary:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
pptpd 5431 root cwd DIR 3,2 4096 2 /
pptpd 5431 root rtd DIR 3,2 4096 2 /
pptpd 5431 root txt REG 3,2 27418 4308995
/usr/sbin/pptpd
pptpd 5431 root mem REG 3,2 106916 4112442
/lib/ld-2.3.3.so
pptpd 5431 root mem REG 3,2 1459344 4554816
/lib/tls/libc-2.3.3.so
pptpd 5431 root mem REG 3,2 16708 4112459
/lib/libdl-2.3.3.so
pptpd 5431 root mem REG 3,2 76652 4112485
/lib/libresolv-2.3.3.so
pptpd 5431 root mem REG 3,2 7100 4112453
/lib/libcom_err.so.2.1
pptpd 5431 root mem REG 3,2 22172 4112471
/lib/libnss_dns-2.3.3.so
pptpd 5431 root mem REG 3,2 2830972 4112474
/lib/libnss_ldap-2.3.3.so
pptpd 5431 root mem REG 3,2 50944 4112472
/lib/libnss_files-2.3.3.so
pptpd 5431 root 0u CHR 1,3 132595 /dev/null
pptpd 5431 root 1u CHR 1,3 132595 /dev/null
pptpd 5431 root 2u CHR 1,3 132595 /dev/null
pptpd 5431 root 3u unix 0xd1c3ac80 4153618 socket
pptpd 5431 root 4u IPv4 4153619 TCP
192.168.1.51:1723 (LISTEN)
pptpd 5431 root 5u IPv4 4153628 TCP
<private>:53284-><private>:ldap (CLOSE_WAIT)
However, when I do a netstat -tupan, I observe the following:
tcp 0 0 192.168.1.51:1723 0.0.0.0:* LISTEN
5431/pptpd
tcp 0 0 192.168.1.51:1723 10.14.77.243:2971 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.76.253:2505 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.76.250:4657 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.10.243:2753 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.253:2659 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.82.255:3880 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.246:2685 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.247:3110 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.4.249:2635 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.252:4388 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.61.245:4802 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.247:3107 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.10.243:2777 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.10.243:2784 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.74.253:2458 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.76.250:4695 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.5.250:2241 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.78.245:4099 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.62.254:4733 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.78.255:4904 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.74.255:4103 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.76.248:4820 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.67.246:4216 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.62.255:4803 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.76.255:2588 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.60.246:4807 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.248:3034 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.12.245:4521 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.78.255:4885 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.250:2245 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.78.247:3269 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.76.247:3600 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.6.242:4337 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.73.255:3515 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.77.251:3974 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.67.249:3950 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.76.250:4676 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.78.245:4098 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.78.248:4875 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.78.253:3366 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.82.253:3311 SYN_RECV
-
tcp 0 0 192.168.1.51:1723 10.14.82.253:3303 SYN_RECV
-
(Probably 100 of these). So I see that pptpd is "listening" but these
connections stuck in SYN_RECV state seem to be hogging the connection
queue.
When things are running normally there are *no* connections in SYN_RECV
state.
If I recall my TCP correctly, a connection is in this state when a
SYN,ACK
packet has been sent back to the client? So are we stuck waiting for a
response? Or perhaps we"ve received the response from the client but
pptpd
has gone funky and can"t finish the handshake so these "pile up".
I guess I could limit outbound SYN/ACK responses from the server to see
if
this makes a difference.
Could something in pptpd not handling all of its child processes dying
(
disconnecting) at once because of LCP timeout cause it to hang?
Anyways, any thoughts on whether or not this could be an issue with
poptop
itself or something funky with my kernel config/resource settings?
I've read that possibly a program using select()/accept() may run into
this issue if it's not calling accept() often enough. However, I'm not
a programmer so I'm hoping someone out there can tell me if this is
true or not. The code for the listener is here:
http://www.digitalpath.net/~rayvd/concentrators/pptpd/pptpmanager.c
The entire source is here:
http://www.digitalpath.net/~rayvd/concentrators/pptpd/
I've tried enabling tcp_syncookies (shot in the dark) and increased my
tcp_max_syn_backlog to 4096, but the problem still occurs.
Thanks,
Ray
- Next message: mjessup_at_yahoo.com: "Aborted/dead network connections and other oddities"
- Previous message: /dev/null: "Re: Boot Method?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|