Re: 24 lost ticks with 2.6.20.10 kernel



Kok, Auke wrote:
Michel Lespinasse wrote:
(I've added the E1000 maintainers to the thread as I found the issue
seems to go away after I compile out that driver. For reference, I was
trying to figure out why I lose exactly 24 ticks about every two
seconds, as shown with report_lost_ticks. This is with a DQ965GF
motherboard with onboard E1000).

that's perfectly likely. The main issue is that we read the hardware
stats every two seconds and that can consume quite some time. It's
strange that you are losing that many ticks IMHO, but losing one or two
might very well be.

We've been playing with all sorts of solutions to this problem and
haven't come up with a way to reduce the load of the system reading HW
stats, and it remains the most likely culprit, allthough I don't rule
out clean routines just yet. This could very well be exaggerated at
100mbit speeds as well, I never looked at that.

I've had good results with 2.6.21.1 (even running tickless :)) on these
NICs. Have you tried that yet?

Maybe this could fix it in 2.6.20? (went into 2.6.21)

--------------------------------------------------------------------------

Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=46fcc86dd71d70211e965102fb69414c90381880
Commit: 46fcc86dd71d70211e965102fb69414c90381880
Parent: 2b858bd02ffca71391161f5709588fc70da79531
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxxxxxxxx>
AuthorDate: Thu Apr 19 18:21:01 2007 -0700
Committer: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxxxxxxxx>
CommitDate: Thu Apr 19 18:21:01 2007 -0700

Revert "e1000: fix NAPI performance on 4-port adapters"

This reverts commit 60cba200f11b6f90f35634c5cd608773ae3721b7. It's been
linked to lockups of the e1000 hardware, see for example

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229603

but it's likely that the commit itself is not really introducing the
bug, but just allowing an unrelated problem to rear its ugly head (ie
one current working theory is that the code exposes us to a hardware
race condition by decreasing the amount of time we spend in each NAPI
poll cycle).

We'll revert it until root cause is known. Intel has a repeatable
reproduction on two different machines and bus traces of the hardware
doing something bad.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: 24 lost ticks with 2.6.20.10 kernel
    ... The main issue is that we read the hardware ... strange that you are losing that many ticks IMHO, ... This reverts commit 60cba200f11b6f90f35634c5cd608773ae3721b7. ...
    (Linux-Kernel)
  • Re: FW: use of base image / delta image for automated recovery from attacks
    ... the performance and storage penalties accumulate. ... The problem is that if you commit too eagerly, ... > But in your typical web application, most partitions do not accrue important ... > doing this in hardware, ...
    (SecProg)
  • Re: remove blk_queue_max_phys_segments in libata
    ... LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements ... The basic issue is that the physical segment ... The commit message itself has good reasoning as well. ... naming, but essentially the 'hardware' ...
    (Linux-Kernel)
  • Re: vge(4) bad checksum
    ... it's probably using hardware checksum offloading. ... you probably missed a commit by csjp@ where it was fixed. ... is because the hardware strips the VLAN tag, ...
    (freebsd-current)
  • Re: [PATCH] sched: do not stop ticks when cpu is not idle
    ... commit ca1b5a8a9abb3db57562a838f41cdba842f13fe8 ... Since there is no timer ticks any more from then, ... the idle task will be scheduled out again and switch to next task, ... the idle task is scheduled back. ...
    (Linux-Kernel)