possible tunnel driver bug?
- From: "als@xxxxxxxxxxxx" <als@xxxxxxxxxxxx>
- Date: 5 Jan 2006 03:18:50 -0800
I have a small program that rewrites TCP packets for a system that I am
working on. The program has two threads, largely for historical
reasons. (It's cut down from a larger earlier program). One thread
reads packets from a tunnel interface (using the tun driver) and queues
them to be written back to the kernel later.
Some of the packets appear to be corrupted after being written back to
the kernel. The latest kernel this has been tested on is 2.6.9 (RedHat
ES4). Here is an example of a trace for a packet that I wrote to the
kernel. The bytes are dumped just before the writev() system call. (The
first line is a timestamp)
21,020215 -> k:
TCP SYN ACK sport = 1202 dport = 59346 seq = 620090535 ack = 615496366
window = 5840 doff = 8 csum = 32434
IP total length = 52 ip checksum = 9918 fragoff = 16384 saddr
= 10.2.0.2 daddr=10.2.0.1
00 00 08 00 45 00 00 34 00 00 40 00 40 06 26 be ....E..4..@.@.&.
0a 02 00 02 0a 02 00 01 04 b2 e7 d2 24 f5 d4 a7 ............$...
24 af ba ae 80 12 16 d0 7e b2 00 00 02 04 05 b4 $.......~.......
01 01 04 02 01 03 03 00 ........
At the same time tcpdump showed
17:33:41.027397 10.2.0.2.1202 > 10.2.0.1.59346: S [bad tcp cksum 40fc!]
620090535:620090535(0) ack 615
496366 win 5840 <eol> (DF) (ttl 64, id 0, len 52)
0x0000 4500 0034 0000 4000 4006 26be 0a02 0002 E..4..@.@.&.....
0x0010 0a02 0001 04b2 e7d2 24f5 d4a7 24af baae ........$...$...
0x0020 8012 16d0 7eb2 0000 0064 0001 0203 0405 ....~....d......
0x0030 0607 0809 ....
The packets differ starting at byte 0x28 (in the tcpdump display). This
is the TCP options section. It is also an iovec boundary. The bytes you
can see are actually application data from a recent data packet. It
looks as if the kernel failed to copy in all of the packet and left
trash in the socket buffer exposed.
This bug is intermittent and timing related. If I slow the program down
enough then it all works perfectly. My best guess is that the problem
is triggered when both threads in the program are trying to do I/O at
the same time, one reading and one writing. I've since rewritten the
program to use the one thread and it all works perfectly.
Has anyone seen this sort of thing before?
.
- Prev by Date: Re: adjtime limited to 2145 seconds?
- Next by Date: Help please with a virtual printer in Linux.
- Previous by thread: sleeping and waiting and tasklets
- Next by thread: Help please with a virtual printer in Linux.
- Index(es):
Relevant Pages
|