RE: UDP recvmsg blocks after select(), 2.6 bug?

From: David Schwartz (davids_at_webmaster.com)
Date: 10/07/04

  • Next message: Hanna Linder: "Re: [PATCH 2.6][2/54] arch/i386/pci/acpi.c Use for_each_pci_dev macro"
    To: <linux-kernel@vger.kernel.org>
    Date:	Thu, 7 Oct 2004 12:31:38 -0700
    
    

    > I have a problem where the sequence of events is as follows:
    > - application does select() on a UDP socket descriptor
    > - select returns success with descriptor ready for reading
    > - application does recvfrom() on this descriptor and this recvfrom()
    > blocks forever

            POSIX does not require the kernel to predict the future. The only guarantee
    against having a socket operation block is found in non-blocking sockets.

    > My understanding of POSIX is limited, but it seems to me that a read call
    > must never block after select just said that it's ok to read from the
    > descriptor. So any such behaviour would be a kernel bug.

            Suppose hypothetically that we add a new network protocol that permits the
    sender to 'invalidate' data after it's received by the remote network stack
    and before it's accepted by the remote application. Would you argue that
    'select'ing must be considered a read in this case? Even though an
    application might 'select' on a socket with no intention to follow up with a
    read? Remember, the 'select' operation is supposed to be protocol neutral.

    > From a brief look at the kernel UDP code, I suspect a problem in
    > net/ipv4/udp.c, udp_recvmsg(): it reads the first available datagram
    > from the queue, then checks the UDP checksum. If the UDP checksum fails at
    > this point, the datagram is discarded and the process blocks
    > until the next
    > datagram arrives.

            You should understand a hit on 'select' to mean that something happened,
    and that it would therefore behoove your application to try the operation it
    wants to perform again. The 'select' operation is not fine-grained enough to
    know what operation you planned, and whether that particular operation would
    block.

            Suppose, for example, that instead of using 'read' you used 'recvmsg', and
    we add an option to 'recvmsg' to allow you to read datagrams with bad
    checksums. What should 'select' do if a datagram is received with a bad
    checksum? It has no idea what flavor of 'recvmsg' you're going to call, so
    it can't know if your operation is going to block or not.

    > Could someone please help me track this problem?
    > Am I correct in my reasoning that the select() -> recvmsg() sequence must
    > never block?

            No, you are incorrect. Consider, again, a 'recvmsg' flag to allow you to
    receive messages even if they have bad checksums versus one that blocks
    until a message with a valid checksum is received. The 'select' function
    just isn't smart enough.

            Consider a 'select' for write on a TCP socket. How does 'select' know how
    many bytes you're going to write? Again, a 'select' hit just indicates
    something relevant has happened, it *cannot* guarantee that a future
    operation won't block both because 'select' has no idea what operation is
    going to take place in the future and because things can change between now
    and then.

    > If yes, is it possible that this problem is triggered by a failed UDP
    > checksum in the udp_recvmsg() function?
    > If yes, can we do something to fix this?

            The bug is in your application. The kernel behavior might be considered
    undesirable, but it's your application that is failing to tell the kernel
    that it must not block.

            DS

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Hanna Linder: "Re: [PATCH 2.6][2/54] arch/i386/pci/acpi.c Use for_each_pci_dev macro"

    Relevant Pages

    • Performance on udp socket
      ... (UDP 150 byte) ... with bad checksum ... broadcast/multicast datagrams dropped due to no socket ...
      (freebsd-performance)
    • Re: problem with UDP socket & recvfrom()
      ... of without checksum (which it does in linux). ... get recognized as udp and/or are immediately discarded. ... socket, I somehow never receive any data on it, even though I'm ...
      (comp.unix.programmer)
    • Re: wierd errors with USB drive & syslog
      ... > 352023 with no checksum ... > 386465 broadcast/multicast datagrams dropped due to no socket ... number of mbufs or mbuf clusters, look at NMBCLUSTERS and NMBUFS in your ...
      (freebsd-questions)
    • Re: non blocking sockets
      ... >> determined that the socket was ready. ... > Upon successful completion, the pselector selectfunction shall ... > arguments to indicate which file descriptors are ready for reading, ... > descriptor shall be considered ready for reading.) ...
      (comp.unix.programmer)
    • Re: Fine grain select locking.
      ... Here is an update that avoids the malloc per fd when there are no collisions. ... This unfortunately adds 64bytes to every socket in the system. ... Per-thread wait channel rather than global select wait channel. ... The unfortunate cost of this patch is that a descriptor per select fd must be allocated to track individual threads. ...
      (freebsd-arch)