Linux TCP - unexpected retransmissions



This may not be the proper newsgroup but any help would be greatly
appreciated.

Our are working on an embedded system that has a number of PowerQUICC
processors running Linux. During normal operation, processors exchange
small messages (< 100 bytes) using TCP. We have a response time
requirement of about 100 milliseconds and we observed that sometimes
we have a long latency in transporting (e.g., > 200 mlliseconds across
Ethernet link) messages between nodes of the system resulting in
response time exceeding our requirement. This latency occurs randomly
at different places and on different interface types. We set the
socket NO_DELAY option, tried different setting (proc file ipv4
options) and test programs to isolate the root cause of the latency
with no success.

We can reproduce the latency using a small application where two
PowerQuicc cards randomly send each other burst of messages across an
Ethernet link. For this test, we are using the 2.6.16 kernel. We use a
sniffer to capture data across the Ethernet link to realize that
sometimes when both TCPs send each other messages at about the same
time (segment 5 and 6 below), for unknown reasons, the second TCP does
not ack the message from the first TCP and a transmission occurs
(segment 8). We also observed that retransmissions sometimes occur
when one TCP is busy transmitting many messages (segment 38 contains
many application messages) while a message is being sent to it, again,
for unknown reasons, that TCP does not ack the message thus forcing a
retransmission (segment 40).

Netstats reports TCP segments being retransmitted but no error at the
interface level. We have no reason to believe that segments are
dropped at the physical layer. We suspect that segments are dropped at
the TCP layer but we don't know why/where. Any ideas?

Thanks
Francois

Here is the trace with relative sequence numbers where we capture
three instances of a retransmission.
1 0.000000 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=0 Ack=0 Win=9902 Len=84 TSV=15025917 TSER=16502810
2 0.039817 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [ACK] Seq=0 Ack=84 Win=2896 Len=0 TSV=16502926 TSER=15025917
3 0.080062 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=0 Ack=84 Win=2896 Len=8 TSV=16502936 TSER=15025917
4 0.080103 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=84 Ack=8 Win=9902 Len=0 TSV=15025937 TSER=16502936
5 0.583935 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=84 Ack=8 Win=9902 Len=8 TSV=15026063 TSER=16502936
6 0.583940 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=8 Ack=84 Win=2896 Len=8 TSV=16503062 TSER=15025937
7 0.583985 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=92 Ack=16 Win=9902 Len=0 TSV=15026063 TSER=16503062
8 0.795861 172.118.100.102 172.118.100.101 TCP [TCP
Retransmission] 4124 > 9000 [PSH, ACK] Seq=84 Ack=16 Win=9902 Len=8
TSV=15026116 TSER=16503062
9 0.796059 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [ACK] Seq=16 Ack=92 Win=2896 Len=0 TSV=16503115 TSER=15026116
10 0.797151 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=16 Ack=92 Win=2896 Len=8 TSV=16503115
TSER=15026116
11 0.797194 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=92 Ack=24 Win=9902 Len=0 TSV=15026116 TSER=16503115
12 1.088260 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=92 Ack=24 Win=9902 Len=8 TSV=15026189
TSER=16503115

16 6.127280 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=324 Ack=2656 Win=9902 Len=8 TSV=15027449
TSER=16504322
17 6.127289 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2656 Ack=324 Win=2896 Len=8 TSV=16504448
TSER=15027323
18 6.127334 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=332 Ack=2664 Win=9902 Len=0 TSV=15027449 TSER=16504448
19 6.127865 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2664 Ack=332 Win=2896 Len=8 TSV=16504448
TSER=15027449
20 6.127907 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=332 Ack=2672 Win=9902 Len=0 TSV=15027449 TSER=16504448
21 6.631221 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=332 Ack=2672 Win=9902 Len=8 TSV=15027575
TSER=16504448
22 6.631226 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2672 Ack=332 Win=2896 Len=8 TSV=16504574
TSER=15027449
23 6.631260 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=340 Ack=2680 Win=9902 Len=0 TSV=15027575 TSER=16504574
24 6.839618 172.118.100.102 172.118.100.101 TCP [TCP
Retransmission] 4124 > 9000 [PSH, ACK] Seq=332 Ack=2680 Win=9902 Len=8
TSV=15027627 TSER=16504574
25 6.840379 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2680 Ack=340 Win=2896 Len=8 TSV=16504626
TSER=15027627
26 6.840433 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=340 Ack=2688 Win=9902 Len=0 TSV=15027627 TSER=16504626
27 7.136158 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=340 Ack=2688 Win=9902 Len=8 TSV=15027701
TSER=16504626
28 7.136163 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2688 Ack=348 Win=2896 Len=8 TSV=16504700
TSER=15027701
29 7.136164 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=348 Ack=2696 Win=9902 Len=0 TSV=15027701 TSER=16504700

31 1106.230079 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470416 Ack=58388 Win=2896 Len=84 TSV=16779507
TSER=15302381
32 1106.230121 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58388 Ack=470500 Win=14942 Len=0 TSV=15302506
TSER=16779507
33 1106.230402 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470500 Ack=58388 Win=2896 Len=84 TSV=16779507
TSER=15302381
34 1106.230445 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58388 Ack=470584 Win=14942 Len=0 TSV=15302506
TSER=16779507
35 1106.230716 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470584 Ack=58388 Win=2896 Len=84 TSV=16779507
TSER=15302381
36 1106.230759 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58388 Ack=470668 Win=14942 Len=0 TSV=15302506
TSER=16779507
37 1106.232746 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=58388 Ack=470668 Win=14942 Len=8 TSV=15302507
TSER=16779507
38 1106.232809 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470668 Ack=58388 Win=2896 Len=588 TSV=16779507
TSER=15302506
39 1106.272712 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58396 Ack=471256 Win=14942 Len=0 TSV=15302517
TSER=16779507
40 1106.440704 172.118.100.102 172.118.100.101 TCP [TCP
Retransmission] 4124 > 9000 [PSH, ACK] Seq=58388 Ack=471256 Win=14942
Len=8 TSV=15302559 TSER=16779507
41 1106.443387 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=471256 Ack=58396 Win=2896 Len=8 TSV=16779560
TSER=15302559
42 1106.443391 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58396 Ack=471264 Win=14942 Len=0 TSV=15302559
TSER=16779560
43 1106.736707 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=58396 Ack=471264 Win=14942 Len=8 TSV=15302633
TSER=16779560
44 1106.737143 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=471264 Ack=58404 Win=2896 Len=8 TSV=16779633
TSER=15302633
45 1106.737196 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58404 Ack=471272 Win=14942 Len=0 TSV=15302633
TSER=16779633

.



Relevant Pages

  • Re: TCP Free-BSD setup behaviour.
    ... Have some behaviour change with FREEBSD compared to LINUX. ... You probably ought to verify the behavior against the protocol specifications, and not what some other TCP implementation happens to to. ... If the protocol stack absolutely knows that a TCP retransmission is identical to the previous segment, then in theory, it could use the same IPID fields to increase the chance that a previously fragmented TCP segment with a lost segment could get reassembled with fragments from the retransmission. ...
    (freebsd-net)
  • Re: How long does read(2) wait before an EAGAIN is thrown?
    ... The 'Default Ethernet MTU' for wired devices is 1500 octets. ... as to whether or where the TCP frame is fragmented being broken, ... segment size it can deal with as part of connection initiation. ... sending large segments with the DF-('don't fragment') flag set in the ...
    (comp.unix.programmer)
  • Re: TCP SACK issue, hung connection, tcpdump included
    ... because the segment which never gets correctly ACKed is also the ... SACK block is DSACK information telling explicitly the address ... if this ACK doesn't reach the ... SERVER TCP, RTO is triggered and the first not yet cumulatively ACKed ...
    (Linux-Kernel)
  • Re: How long does read(2) wait before an EAGAIN is thrown?
    ... Which, for all practical purposes, means that the ethernet MTU is 1500 ... But if TCP ... A router which would have needed to fragment the datagram ... then supposed to retry with a smaller segment ... ...
    (comp.unix.programmer)
  • Re: tcp send bigger data faster then small data
    ... AFAIK if the data you send is not an intergral multiple of segment ... then tcp is supposed to sent the last segment only after ... little delay, but in itself can not explain large delays you see. ... low time, if you give up processor for even short time then you ...
    (comp.os.linux.development.system)