When did I lost packets?



Hello everyone,

I have written small bits of code to test high-rate packet handling.
(Approximately 10,000 packets per second.)

I send UDP packets at a constant rate from one computer:

while ( 1 )
{
sendto(sock, &seqno, sizeof seqno, 0,
(struct sockaddr *)&addr, sizeof addr);
++seqno;
busy_loop(100);
}

The only payload in the UDP packet is a 64-bit sequence number.
(Please ignore endianness issues.)

I use busy_loop(int us) to do nothing for 'us' micro-seconds.

This code is run by root, on an otherwise idle system, at the default scheduling policy, with nice -n -10



I receive the packets on a different computer:

while ( 1 )
{
recvfrom(sock, &R_seqno, sizeof R_seqno, 0, NULL, NULL);
while ( E_seqno != R_seqno )
{
++lost; ++E_seqno;
}
++E_seqno;
}

R_seqno is the _received_ sequence number.
E_seqno is the _expected_ sequence number.
lost tracks the number of packets missed.

This code is run by root, on an otherwise idle system, as a SCHED_FIFO process, with priority 80. (Why 80? I don't know.)

param.sched_priority = 80;
if ( sched_setscheduler(0, SCHED_FIFO, &param) < 0 )
{
perror("sched_setscheduler");
}

I registered a signal handler to print statistics:

static void catch(int sig)
{
printf("RECEIVED=%llu LOST=%llu\n", R_seqno, lost);
}

signal(SIGQUIT, catch);

(AFAIU I'm not supposed to call printf() inside a signal handler?
However, I don't think it would explain why I drop packets. But I
could be wrong!)

I ran the setup overnight (1000 minutes) and here are my results:

According to top, the receive process ate 16.5 minutes of CPU time.
(i.e. 1.65% CPU occupancy on average.)
The system stays very responsive despite the SCHED_FIFO process.

RECEIVED=577.5 million packets
LOST=3225 packets

I don't understand why I lose ANY packet...

I forgot to mention: I increased the size of the socket buffer.
(That was my intention, at least.)

$ /sbin/sysctl net | grep rmem_
net.core.rmem_default = 1064960
net.core.rmem_max = 1064960

The link layer does not report any problem.
(errors:0 dropped:0 overruns:0 frame:0 can someone explain what
these numbers mean exactly?)

$ /sbin/ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:13:20:0D:1F:47
inet addr:10.10.10.208 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::213:20ff:fe0d:1f47/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:661166427 errors:0 dropped:0 overruns:0 frame:0
TX packets:20981 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1036860947 (988.8 Mb) TX bytes:1686030 (1.6 Mb)
Interrupt:9

I noticed that I lose packets in bursts of 30-100 packets, and these loss bursts are quite rare (~1 every 10-40 minutes). Someone told me another high-priority process (another SCHED_FIFO??) might be running.

I checked /var/log/messages and saw:
# cat /var/log/messages
Apr 27 04:40:01 venus syslogd 1.4.1: restart.
Apr 27 05:04:33 venus -- MARK --
Apr 27 05:24:34 venus -- MARK --
Apr 27 05:44:34 venus -- MARK --
Apr 27 06:04:34 venus -- MARK --
Apr 27 06:24:34 venus -- MARK --
Apr 27 06:44:35 venus -- MARK --
Apr 27 07:04:35 venus -- MARK --
Apr 27 07:24:35 venus -- MARK --
Apr 27 07:44:35 venus -- MARK --
Apr 27 08:04:36 venus -- MARK --
Apr 27 08:24:36 venus -- MARK --
Apr 27 08:44:36 venus -- MARK --
Apr 27 09:04:36 venus -- MARK --
Apr 27 09:24:37 venus -- MARK --
Apr 27 09:44:37 venus -- MARK --
Apr 27 10:04:37 venus -- MARK --
Apr 27 10:24:37 venus -- MARK --

1 every 20 minutes... What do these log entries refer to?
Is it a high-priority process? Perhaps even a kernel thread?
Is it CPU-intensive? Could it explain why I drop packets?

If you've read this far, THANKS! :-)

Regards,

Spoon
.



Relevant Pages

  • Re: Help with an odd log file...
    ... I'm getting the same types of packets to a router - since May 17. ... probes that come a few seconds apart. ... Sequence is always ... and is some sort of homing signal for a complex trojan. ...
    (Incidents)
  • Re: ARP Spoof Question
    ... The TCP sequence number *should* be cryptographically ... 100 consecutive connections), I SHOULD NOT be able to predict the next ... > spoofed ARP packets to receive packets but have been unable to locate ... > my switch table. ...
    (Security-Basics)
  • Re: Avoiding Packet duplication
    ... Implement sequence numbers. ... Refrain from transmitting packets. ... how does it get replicatedon the receiver side? ... the retransmissions can happen at either layer 2 or layer 3 ...
    (alt.internet.wireless)
  • RE: [fw-wiz] IPSEC over load-shared T1s (per packet)
    ... The packets were being sent over alternating links in strict round-robin, ... which meant that the ESP packets sometimes arrived out of sequence. ... > that a session started and sent via one t1 remains directed ... >> IPSec it's left up to the implementation. ...
    (Firewall-Wizards)
  • RE: [fw-wiz] IPSEC over load-shared T1s (per packet)
    ... Another theory that someone had was the CRC was failing because the packets ... router here and at the NOC, so the count hop and speed is exactly the same, ... mandatory sequence number in the ESP header. ... The receiveing IPSec doesn't ...
    (Firewall-Wizards)