Re: TSC cannot be used as a timesource -> SOLVED

From: Tim Schmielau (tim_at_physik3.uni-rostock.de)
Date: 01/06/04

  • Next message: Marcelo Tosatti: "Linux 2.4.25-pre4"
    Date:	Tue, 6 Jan 2004 15:09:10 +0100 (CET)
    To: Bauke Jan Douma <bjdouma@xs4all.nl>
    
    

    > First
    > -----
    > Until today I had my Linux system running on an old 1.7Gb /dev/hda and a
    > newer 20Gb /dev/hdb. Because the 1.7Gb was extremely unreliable with
    > DMA (established a long time ago), I always had it off for that disk;
    > the 20Gb disk had DMA on (udma4).
    >
    > Kernel in recent months has always been the current 2.6.0-testNN and
    > since its release, kernel 2.6.0
    >
    > With the 2.6 kernels (and possibly with late 2.5 versions too?) I would
    > always get the mystifying:
    >
    > Losing too many ticks!
    > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
    > Falling back to a sane timesource.
    >
    > during the boot-up, right at the point it seems where a standard e2fsck
    > is run.
    >

    As the message says, your system loses timer interrupts (clock ticks)
    compared to the CPU-internal Time Stamp Counter. This is probably due to
    the old harddrive disabling interrupts for too long a time.
    Since the wall clock and the TSC differ, the kernel has to decide which
    time source to trust. Current kernels choose the timer interrupt (this
    might change in the future), which in your case is just the wrong
    decision.

    > Today I installed a new 80Gb harddisk, making the 20Gb /dev/hda and
    > the 80Gb /dev/hdb, and junking the 1.7Gb. Both have DMA enabled (udma5).
    >
    > It seems the TSC problem has vanished, no more such messages -- knock on
    > wood.

    Yep, since the culprit is removed that turned interrupts off for long
    intervals.

    >
    [detailed system description skipped]
    >
    > Second
    > ------
    > In order to use my new 80Gb harddisk, I first installed it alongside
    > the two 'old' disks, as /dev/hdc, so as to be able to move ca. 16Gb of
    > data from the 20Gb to the 80Gb. So I temporarily had:
    > hda: 1.7Gb / no dma
    > hdb: 20Gb / udma4
    > hdc: 80Gb / udma5
    >
    > Now here's what happened: during the one foul swoop 'cp -axvp *' from
    > the 20Gb HD to the 80Gb HD, at two times the copying process seemed
    > to 'hang' for ca. 10-15 minutes (at least there were two times I noticed),
    > the copy being in a 'D' state (uninterruptable sleep).
    >
    > By the time the cp was finally finished (could be ~1.5 hr later wall clock
    > time!), the system clock was running behind ca. 45 minutes!

    This is again explained by your system losing clock ticks. It just shows
    how bad the situation was: on average it seems to lose every other clock
    tick.
    What happens if you turn interrupt unmasking on for your old drive?
    (hdparm -u1 /dev/hda)
    BEWARE: rare chance of data corruption ahead, better mount the drive
    read-only before doing so.

    This should give your system the chance to catch many more timer
    interrupts.

    Easiest solution, of course, is to just retire the old drive.

    Hope this explains your findings,
    Tim
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Marcelo Tosatti: "Linux 2.4.25-pre4"

    Relevant Pages

    • Re: NTP problem - Clock too fast for NTP to keep up?
      ... > The kernel measures the passage of time by counting timer ticks. ... > interrupts and sending them on. ... -- the kernel does miss clock interrupts; ...
      (Fedora)
    • Re: [opensuse] ntp can not manage to put the clock in sync.
      ... had a hardware clock on the mainboard, independent of the OS or user ... It can not be used as the system clock. ... those interrupts, updating the "system clock". ... Suse programs this clock (in the kernel) to interrupt 250 times per ...
      (SuSE)
    • Re: [opensuse] ntp can not manage to put the clock in sync.
      ... had a hardware clock on the mainboard, independent of the OS or user ... It can not be used as the system clock. ... those interrupts, updating the "system clock". ... Suse programs this clock (in the kernel) to interrupt 250 times per ...
      (SuSE)
    • Re: 2.6.12 sound problem
      ... >>of obscure kernel parameters. ... Allocating PCI resources starting at 10000000 ... Using ACPI for IRQ routing ... Using CSCINT to route CSC interrupts to PCI ...
      (Linux-Kernel)
    • Time Drift Compensation on Linux Clusters
      ... While working on a Linux cluster with kernel version 2.4.27 we've ... for this problem which is based on the Pentium's TSC clock. ... Following is a detailed description of the problem and the fix, ... The number of interrupt per second is defined by ...
      (Linux-Kernel)