2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage



On Sat, 22 Dec 2007 23:30:56 PST, Andrew Morton said:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/

I've bisected it down this far:

kvm-ist-kaput.patch GOOD
git-lblnet.patch
git-lblnet-fixup.patch
git-leds.patch
git-libata-all.patch
git-libata-all-fix-pata_winbond-borkage.patch
git-libata-all-wtf.patch BAD

and somehow, I doubt the leds or libata trees horked up networking. ;)

Symptoms - semi-sporadic failures in making network connections. The test
case that tripped it up was the 'make test' from the Tcl 8.5 - several of the
test cases will create a listening socket, and then try to connect to it.
Under 2.6.24-rc5-mm1, it works just fine, but I'm seeing hangs under -rc6-mm1.
Doing a 'netstat -n -a -A inet -p' while it's hung shows me this:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:34118 0.0.0.0:* LISTEN 2236/tcltest
tcp 0 1 127.0.0.1:59460 127.0.0.1:34118 SYN_SENT 2236/tcltest
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest
tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest
tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest

Pretty consistent failure mode - a socket is in 'listen', and the connection
gets hung in 'SYN_SENT'. There's 3 outputs listed - the first one from one run
of the test case, the second 2 are some 20 seconds apart on the same run.
It's pretty obvious that if you can't complete a 3-packet handshake to loopback
in 20 seconds, something is hosed. However, it's apparently some sort of
race/timing issue, as many *other* test cases in the Tcl test tree do in fact
work OK.

I already checked, it's not a slam-dunk to just 'patch -R' as there's 3 or 4
conflicts where later patches need massaging/reverting as well.

It's a problem with both 'classic RCU' and 'preempt RCU' (that was my *first*
guess as to the cause).

Any clues/hints/advice/patches?

Attachment: pgp3uD15ITXVQ.pgp
Description: PGP signature



Relevant Pages

  • Re: mount nfs - Operation not permitted
    ... >>> on the FC4 nfs server while accessing the exported partition from the mac ... >> Active Internet connections (servers and established) ... > you run netstat? ...
    (Fedora)
  • Re: permision denied ? into .fam_socket as root
    ... Not everything that is represented in all filesystems that are ... > Active Internet connections (only servers) ... > Active UNIX domain sockets ...
    (comp.os.linux.security)
  • Re: sendmail problem
    ... $ service sendmail status ... Active Internet connections (only servers) ...
    (Fedora)
  • Re: open port 5432 for postgres
    ... Wolfgang Kueter wrote: ... >> Active Internet connections (only servers) ... > zaphod:~ # rcpostgresql start ...
    (comp.security.firewalls)
  • n00b question again
    ... I was concerned about the security and bandwidth usage ... Active Internet connections ... It can bee seen that yahoo servers have established ... Or is Yahoo using my bandwidth? ...
    (freebsd-questions)