Anyone know why the linux select() function is broken?

baumann.Pan_at_gmail.com
Date: 03/29/05


Date: 28 Mar 2005 17:53:12 -0800

I am in the trouble that when select func timeout once, it will timeout
 next time.

1, I sent a packet to a LAN server A(192.168.0.10:3233), select works
ok.
2, I sent a packet to a virtual lan server B(no this ip in lan:2323),
select timeout, it's correct in this case, but
3, I sent a packet to the LAN A, and etherpeek also captures the
response packet, but select timeout failed.

why? how can I resolve it? thanks any advice.

I also searched the newsgroup, found someone has asked the same
question, but no solutions for the problem.

below is the one:

Anyone know why the linux select() function is broken?
All 7 messages in topic - view as tree
 Steve McWilliams Jan 8 2000, 12:00 am show options

Newsgroups: comp.os.linux.development.apps
From: stev...@Radix.Net (Steve McWilliams) - Find messages by this
author
Date: 2000/01/08
Subject: Anyone know why the linux select() function is broken?
Reply to Author | Forward | Print | Individual Message | Show original
| Report Abuse

I posted this the other day in the thread discussing select timeout
problems
but got no response. Since that thread has since degenerated into
absurdity,
I'd like to try again.

The problem I have isolated is that if a udp socket is openned, and a
packet
is sent for which there is no receiver, a subsequent select call on the
socket
erroneously times out immediately. If there is a receiver however, the
select
call times out correctly. This bug only manifests itself under linux,
not
solaris or nt.

Below is the test code to provoke the problem. Thanks in advance for
any ideas.

Steve

--
/*
 * file: main.c
 */
#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>
#define REM_PORT     11430
#define REM_IP_ADDR  "192.1.1.80"
int fd;
struct sockaddr_in loc_addr;
struct sockaddr_in rem_addr;
int socket_open(void);
int socket_send(void);
int socket_recv(void);
int socket_select(void);
int main(int argc, char *argv[])
{
    printf("configured for address %s, port %d\n", REM_IP_ADDR,
REM_PORT);
    if (socket_open() < 0)
        return 1;
    if (socket_send() < 0)
        return 1;
    if (socket_select() < 0)
        return 1;
    if (socket_recv() < 0)
        return 1;
    return 0;
}
int socket_open(void)
{
    printf("openning socket ...\n");
    if ((fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0)
    {
        perror("unable to get local socket");
        return -1;
    }
    memset(&loc_addr, 0, sizeof(loc_addr));
    loc_addr.sin_family = AF_INET;
    loc_addr.sin_addr.s_addr = htonl(INADDR_ANY);
    if (bind(fd, (struct sockaddr *)&loc_addr, sizeof(loc_addr)) < 0)
    {
        perror("cannot bind local socket address");
        return -1;
    }
    memset(&rem_addr, 0, sizeof(rem_addr));
    rem_addr.sin_family = AF_INET;
    rem_addr.sin_addr.s_addr = inet_addr(REM_IP_ADDR);
    rem_addr.sin_port = htons(REM_PORT);
    return fd;
}
int socket_send(void)
{
    printf("sending message ...\n");
    if (sendto(fd, "hello", 6, 0, (struct sockaddr *)&rem_addr,
        sizeof(rem_addr)) != 6)
    {
        perror("send failed");
        return -1;
    }
    printf("sent message\n");
    return 0;
}
int socket_recv(void)
{
    unsigned int len;
    char buffer[256];
    printf("receiving message ...\n");
    if (recvfrom(fd, buffer, 256, 0, (struct sockaddr *)&rem_addr,
&len) <= 0)
    {
        perror("recv failed");
        return -1;
    }
    printf("received %s\n", buffer);
    return 0;
}
int socket_select(void)
{
    int ret;
    fd_set read_set;
    struct timeval timeout = { 10, 0 };
    printf("selecting socket (%ld second timeout) ...\n",
timeout.tv_sec);
    FD_ZERO(&read_set);
    FD_SET(fd, &read_set);
    if ((ret = select(fd + 1, &read_set, NULL, NULL, &timeout)) < 0)
    {
        perror("select failed");
        return -1;
    }
    else if (ret == 0)
    {
        fprintf(stderr, "select timeout\n");
        return -1;
    }
    printf("select returned %d, bit is %d\n", ret, FD_ISSET(fd,
&read_set));
    return 0;
- Hide quoted text -
- Show quoted text -
}
 David Schwartz   Jan 8 2000, 12:00 am     show options
Newsgroups: comp.os.linux.development.apps
From: David Schwartz <dav...@webmaster.com> - Find messages by this
author
Date: 2000/01/08
Subject: Re: Anyone know why the linux select() function is broken?
Reply to Author | Forward | Print | Individual Message | Show original
| Report Abuse
Steve McWilliams wrote:
> I posted this the other day in the thread discussing select timeout
problems
> but got no response.  Since that thread has since degenerated into
absurdity,
> I'd like to try again.
> The problem I have isolated is that if a udp socket is openned, and a
packet
> is sent for which there is no receiver, a subsequent select call on
the socket
> erroneously times out immediately.  If there is a receiver however,
the select
> call times out correctly.  This bug only manifests itself under
linux, not
> solaris or nt.
> Below is the test code to provoke the problem.  Thanks in advance for
any ideas.
        The behavior you are seeing is perfectly logical. A socket
should
select for read when there is an error on it. This is consistent with
TCP behavior.
        DS
 Mattias Engdegård   Jan 9 2000, 12:00 am     show options
Newsgroups: comp.os.linux.development.apps
From: f91-...@nada.kth.se (Mattias Engdegård) - Find messages by this
author
Date: 2000/01/09
Subject: Re: Anyone know why the linux select() function is broken?
Reply to Author | Forward | Print | Individual Message | Show original
| Report Abuse
In <8590c1$ra...@saltmine.radix.n­et> stev...@Radix.Net (Steve
McWilliams) writes:
>The problem I have isolated is that if a udp socket is openned, and a
packet
>is sent for which there is no receiver, a subsequent select call on
the socket
>erroneously times out immediately.  If there is a receiver however,
the select
>call times out correctly.  This bug only manifests itself under linux,
not
>solaris or nt.
According to the linux udp(7) man page:
       All fatal errors will be passed to the user  as  an  error
       return  even  when  the  socket  is  not  connected.  This
       behaviour differs from many other BSD  socket  implementa­
       tions  which  don't  pass  any errors unless the socket is
       connected. Linux's behaviour is mandated by RFC1122.
       For compatibility with legacy code it is possible  to  set
       the  SO_BSDCOMPAT  SOL_SOCKET  option  to  receive  remote
       errors only when the socket has been connected (except for
       EPROTO  and  EMSGSIZE).   It  is better to fix the code to
       handle  errors  properly  than  to  enable  this   option.
       Locally generated errors are always passed.
 Steve McWilliams   Jan 9 2000, 12:00 am     show options
Newsgroups: comp.os.linux.development.apps
From: stev...@Radix.Net (Steve McWilliams) - Find messages by this
author
Date: 2000/01/09
Subject: Re: Anyone know why the linux select() function is broken?
Reply to Author | Forward | Print | Individual Message | Show original
| Report Abuse
- Hide quoted text -
- Show quoted text -
f91-...@nada.kth.se (Mattias Engdegrd) writes:
>In <8590c1$ra...@saltmine.radix.n­et> stev...@Radix.Net (Steve
McWilliams) writes:
>>The problem I have isolated is that if a udp socket is openned, and a
packet
>>is sent for which there is no receiver, a subsequent select call on
the socket
>>erroneously times out immediately.  If there is a receiver however,
the select
>>call times out correctly.  This bug only manifests itself under
linux, not
>>solaris or nt.
>According to the linux udp(7) man page:
>       All fatal errors will be passed to the user  as  an  error
>       return  even  when  the  socket  is  not  connected.  This
>       behaviour differs from many other BSD  socket  implementa
>       tions  which  don't  pass  any errors unless the socket is
>       connected. Linux's behaviour is mandated by RFC1122.
>       For compatibility with legacy code it is possible  to  set
>       the  SO_BSDCOMPAT  SOL_SOCKET  option  to  receive  remote
>       errors only when the socket has been connected (except for
>       EPROTO  and  EMSGSIZE).   It  is better to fix the code to
>       handle  errors  properly  than  to  enable  this   option.
>       Locally generated errors are always passed.
Hmm.  That's certainly a new one on me.  I assumed that since it's a
connectionless protocol, sending to an address where there may not be a
receiver present was not considered an error.
Thanks,
Steve
 andy   Jan 9 2000, 12:00 am     show options
Newsgroups: comp.os.linux.development.apps
From: a...@news-server.san.rr.com () - Find messages by this author
Date: 2000/01/09
Subject: Re: Anyone know why the linux select() function is broken?
Reply to Author | Forward | Print | Individual Message | Show original
| Report Abuse
On 9 Jan 2000 10:55:44 -0500, Steve McWilliams <stev...@Radix.Net>
wrote:
- Hide quoted text -
- Show quoted text -
>f91-...@nada.kth.se (Mattias Engdegrd) writes:
>>In <8590c1$ra...@saltmine.radix.n­et> stev...@Radix.Net (Steve
McWilliams) writes:
>>>The problem I have isolated is that if a udp socket is openned, and
a packet
>>>is sent for which there is no receiver, a subsequent select call on
the socket
>>>erroneously times out immediately.  If there is a receiver however,
the select
>>>call times out correctly.  This bug only manifests itself under
linux, not
>>>solaris or nt.
>>According to the linux udp(7) man page:
>>       All fatal errors will be passed to the user  as  an  error
>>       return  even  when  the  socket  is  not  connected.  This
>>       behaviour differs from many other BSD  socket  implementa
>>       tions  which  don't  pass  any errors unless the socket is
>>       connected. Linux's behaviour is mandated by RFC1122.
>>       For compatibility with legacy code it is possible  to  set
>>       the  SO_BSDCOMPAT  SOL_SOCKET  option  to  receive  remote
>>       errors only when the socket has been connected (except for
>>       EPROTO  and  EMSGSIZE).   It  is better to fix the code to
>>       handle  errors  properly  than  to  enable  this   option.
>>       Locally generated errors are always passed.
>Hmm.  That's certainly a new one on me.  I assumed that since it's a
>connectionless protocol, sending to an address where there may not be
a
>receiver present was not considered an error.
>Thanks,
>Steve
So how do you tell there is no receiver?
If it is a local address and the ARP fails would be my first guess.
Any ICMP messages rejecting the message.
But there are still plenty of cases where you can't tell if there
is a receiver or not.
 David Schwartz   Jan 9 2000, 12:00 am     show options
Newsgroups: comp.os.linux.development.apps
From: David Schwartz <dav...@webmaster.com> - Find messages by this
author
Date: 2000/01/09
Subject: Re: Anyone know why the linux select() function is broken?
Reply to Author | Forward | Print | Individual Message | Show original
| Report Abuse
a...@news-server.san.rr.com wrote:
> But there are still plenty of cases where you can't tell if there
> is a receiver or not.
        Exactly. The absence of an error is not proof of reception.
However, if
there is an error, the operating system will tell the application about
it.
        I strongly recommend that such errors be ignored, however.
Honoring
them makes it too easy for spoofed error packets to break 'connections'
using your protocol layered over UDP.
        DS
 Rick Ellis   Jan 12 2000, 12:00 am     show options
Newsgroups: comp.os.linux.development.apps
From: e...@ftel.net (Rick Ellis) - Find messages by this author
Date: 2000/01/12
Subject: Re: Anyone know why the linux select() function is broken?
Reply to Author | Forward | Print | Individual Message | Show original
| Report Abuse
In article <8590c1$ra...@saltmine.radix.n­et>,
Steve McWilliams <stev...@Radix.Net> wrote:
>The problem I have isolated is that if a udp socket is openned, and a
packet
>is sent for which there is no receiver, a subsequent select call on
the socket
>erroneously times out immediately.  If there is a receiver however,
the select
>call times out correctly.  This bug only manifests itself under linux,
not
>solaris or nt.
The select "times out" because there is an error to be reported.  The
error
is from the previous send being rejected.
--
http://www.fnet.net/~ellis/pho­to/linux.html
End of messages
 watch this topic  
« Newer  -  Compilation problem !!    Gtk


Relevant Pages

  • Re: How to terminate a socket in CLOSE_WAIT state
    ... FTP Server fixed for certain FTP clients who use both passive ... This was causing a PASSIVE opened socket to be left ... but instead expect the "half close" from the receiver. ... this sends a TCP/IP FIN packet to the ...
    (microsoft.public.win32.programmer.kernel)
  • Re: How to terminate a socket in CLOSE_WAIT state
    ... with the socket released within a few seconds of proper closure on both ... FTP Server fixed for certain FTP clients who use both passive ... but instead expect the "half close" from the receiver. ... this sends a TCP/IP FIN packet to the ...
    (microsoft.public.win32.programmer.kernel)
  • Re: data packet split problem in socket networking
    ... The socket split can be avoided or not? ... You should allow for the split in your receiver code. ... previous packet is split, will the next socket packet contain the rest ... and the 2nd byte of SourceID is sent in the ...
    (microsoft.public.win32.programmer.networks)
  • Re: too much packet loss receive UDP datagrams
    ... You could experiment with asynchronous socket reads, ... none of this will guarantee that packet loss wont occur. ... > When I test the applications on 2 different host, the receiver is not ... > iReceivedPacketsLost = packetVolgnr - iPacketsReceived; ...
    (microsoft.public.dotnet.csharp.general)
  • Isson on relationship between socket and sock of a packet?
    ... In Linux IP network, when a packet is going to be ... struct sock *sk; struct socket *skt): ...
    (Linux-Kernel)