Stack sends SYN/ACKs even though accept queue is full

From: Jan Olderdissen (jan_at_ixiacom.com)
Date: 04/29/04

  • Next message: Andrew Morton: "Re: ~500 megs cached yet 2.6.5 goes into swap hell"
    To: "'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>
    Date:	Thu, 29 Apr 2004 14:53:36 -0700
    
    

    Synopsis:

    When the accept queue of a listening socket is full, the stack will SYN/ACK
    additional SYNs at a rate of 0.5Hz and put them on the syn queue. Those
    connections behave in bandwidth wasting ways if the accept queue remains
    full. In particular, the server resends the SYN/ACK multiple times while the
    client attempts to communicate thinking it has a valid connection. The
    client retransmits its data packets and eventually gives up.

    Packet trace:
     
    Time Info
    7.381621 1277 > 5555 [SYN] Seq=1955472727 Ack=0 Win=2920 Len=0
    7.381712 5555 > 1277 [SYN, ACK] Seq=1971042013 Ack=1955472728 Win=2896
    Len=0
    7.381776 1277 > 5555 [ACK] Seq=1955472728 Ack=1971042014 Win=2920 Len=0
    7.400029 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    7.609149 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    8.029061 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    8.868911 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    10.378517 5555 > 1277 [SYN, ACK] Seq=1971042013 Ack=1955472728 Win=2896
    Len=0
    10.379162 1277 > 5555 [ACK] Seq=1955472818 Ack=1971042014 Win=2920 Len=0
    10.548510 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    13.907815 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    16.377300 5555 > 1277 [SYN, ACK] Seq=1971042013 Ack=1955472728 Win=2896
    Len=0
    16.378034 1277 > 5555 [ACK] Seq=1955472818 Ack=1971042014 Win=2920 Len=0
    20.626501 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    28.374887 5555 > 1277 [SYN, ACK] Seq=1971042013 Ack=1955472728 Win=2896
    Len=0
    28.375489 1277 > 5555 [ACK] Seq=1955472818 Ack=1971042014 Win=2920 Len=0
    34.063710 1277 > 5555 [PSH, ACK] Seq=1955472728 Ack=1971042014 Win=2920
    Len=90
    52.569956 5555 > 1277 [SYN, ACK] Seq=1971042013 Ack=1955472728 Win=2896
    Len=0
    52.570129 1277 > 5555 [ACK] Seq=1955472818 Ack=1971042014 Win=2920 Len=0
    57.380254 1277 > 5555 [FIN, ACK] Seq=1955472818 Ack=1971042014 Win=2920
    Len=0
    60.938358 1277 > 5555 [FIN, PSH, ACK] Seq=1955472728 Ack=1971042014
    Win=2920 Len=90
    100.760213 5555 > 1277 [SYN, ACK] Seq=1971042013 Ack=1955472728 Win=2896
    Len=0
    100.760347 1277 > 5555 [ACK] Seq=1955472819 Ack=1971042014 Win=2920 Len=0

    Other TCP connections, IP addresses and various other superfluous
    information removed.

    Code analysis:

    tcp_v4_conn_request() in tcp_ipv4.c contains the following code:

        /* Accept backlog is full. If we have already queued enough
         * of warm entries in syn queue, drop request. It is better than
         * clogging syn queue with openreqs with exponentially increasing
         * timeout.
         */
        if (tcp_acceptq_is_full(sk) && tcp_synq_young(sk) > 1)
            goto drop;

    A synq entry is considered young when it hasn't timed out yet as the
    following comment in tcp_timer.c indicates:

        /* Normally all the openreqs are young and become mature
         * (i.e. converted to established socket) for first timeout.
         * If synack was not acknowledged for 3 seconds, it means
         * one of the following things: synack was lost, ack was lost,
         * rtt is high or nobody planned to ack (i.e. synflood).
         * When server is a bit loaded, queue is populated with old
         * open requests, reducing effective size of queue.
         * When server is well loaded, queue size reduces to zero
         * after several minutes of work. It is not synflood,
         * it is normal operation. The solution is pruning
         * too old entries overriding normal timeout, when
         * situation becomes dangerous.
         *
         * Essentially, we reserve half of room for young
         * embrions; and abort old ones without pity, if old
         * ones are about to clog our table.
         */

    Unfortunately, when a server is really busy and the acceptq remains full,
    the connections held on the synq will drop incoming ACK (and other) packets
    without compunction as the code from tcp_v4_syn_recv_sock() in tcp_ipv4.c
    shows:

        if (tcp_acceptq_is_full(sk))
            goto exit_overflow;

    Which leads to the strange packet trace outlined above.

    Because newly accepted connections are considered 'young', two such
    connections put on the synq will cause additional SYNs to be dropped until
    young connections age and additional connections are SYN/ACKed , etc. Since
    the initial TCP timeout is three seconds, you would expect two additional
    connections to be accepted every three seconds. However, experiments with
    2.4.25 show that number to be two connections every four seconds for unclear
    reasons.

    In addition to the 2.4.21 sources we mainly work with in our embedded
    systems, I checked the 2.4.26 and 2.6.5 sources. They don't appear to differ
    in the sections discussed. The packet trace is from 2.4.25.

    Conclusion:

    Unless I'm missing something material, the stack has no business accepting
    connections for which it doesn't have entries in the accept queue. If it is
    the intention of the application to have a large number of pending
    connections, it should have a long accept queue as it is. Accepting two
    additional connections every four seconds does not materially improve the
    performance of highly loaded servers.

    Perhaps someone on the mailing list can enlighten me as to the point of the
    'tcp_synq_young(sk) > 1' condition. The intent seems to be to keep a modicum
    of 'warm' connections in the air at all times in case the app eventually
    gets around to accepting all the pending connections and an oscillation
    effect might ensue. However, the implementation doesn't scale and the effect
    of suggesting to the remote app that it has someone to talk to and then just
    ignoring all packets it sends appears to be somewhat counterproductive.

    Thanks go to Thomas Ameling who was instrumental in tracking down this
    issue.

    Jan Olderdissen
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Andrew Morton: "Re: ~500 megs cached yet 2.6.5 goes into swap hell"

    Relevant Pages

    • Re: [take25 1/6] kevent: Description.
      ... To ensure every connection is handled as quickly as possible you stuff them all in the same queue and then have all threads use this one queue. ... In nscd, for instance, we have one single thread waiting for incoming connections and it then has to wake up a worker thread to handle the processing. ... With the new event handling this wouldn't be the case, one thread only is woken and we don't have to wake worker threads. ... I think I already gave my opinion on a ring buffer, ...
      (Linux-Kernel)
    • Re: Bug? Kernels 2.6.2x drops TCP packets over wireless (independentof card used)
      ... TCP connections hang and timeout before all data is read. ... ICMP and UDP protocols seem to work (provided by 0% packet loss ping to router (and internet servers), ... wifi0: Use hw queue 1 for WME_AC_BE traffic ...
      (Linux-Kernel)
    • Re: One Remote Delivery queue is filling up
      ... Check the properties of the queue in question. ... Unable to bind to destination server in DNS ... The receiving server dropped the connection because of a rule. ... >>Outbound connections to 3000 and messages per domain to ...
      (microsoft.public.exchange2000.protocols)
    • Re: How to handle multiple incoming TCP connections in Windows Services
      ... inserting the connections in a queue and then spawning a new thread ... You have to use a synced queue though: ... I need the service to accept multiple incoming requests. ... TcpClient tcpClient = new TcpClient; ...
      (microsoft.public.dotnet.languages.csharp)
    • Threading problem when many sockets open
      ... I have written a socket based service in python and under fairly heavy ... enqueues the connection on a Queue. ... connections do not pile up very quickly. ...
      (comp.lang.python)