Time Drift Compensation on Linux Clusters

From: Dror Cohen (dror.xiv_at_gmail.com)
Date: 03/03/05

  • Next message: Paul Jackson: "Re: RFD: Kernel release numbering"
    Date:	Thu, 3 Mar 2005 16:50:52 +0200
    To: linux-kernel@vger.kernel.org
    
    

    Hi all,
    While working on a Linux cluster with kernel version 2.4.27 we've
    encountered a consistent clock drift problem. We have devised a fix
    for this problem which is based on the Pentium's TSC clock. We'd
    appreciate any comments on the validity of the proposed solution and
    on whether it might cause unexpected (and undesired) problems in other
    parts of the kernel.

    Following is a detailed description of the problem and the fix,

    Thanks everyone,
    Dror
    XIV ltd.

    ------------------------------------------------------------------------------------------------------------

    Time Drift Compensation on Linux Clusters

    1. Background

    During the operation of the 2.4.27 kernel under intense IO conditions a clock
    drift was detected. The system time, as received by gettimeofday() began lagging
    behind the wall clock.

    All PCs have a battery driven hardware clock. This clock is used at boot time to
    initialize the system time. Linux then uses the Programmable Interval
    Timer (PIT)
    to keep an internal accounting of the flow of time. The PIT is set to raise an
    interrupt every fixed interval. The number of interrupt per second is defined by
    the kernel constant HZ, which typically equals 100 (10 on Alpha machines). The
    interrupt handler kernel/timer.c:do_timer() increments a kernel wide
    variable named
    jiffies and invokes the bottom half to take care of timers and scheduling.

    2. Analysis

    In IO intensive environments a many CPU cycles are spent handling interrupts.
    This is done by the device driver code for the different IO devices.
    Typically, while
    dealing with the hardware, device drivers block other IRQs, including the PIT's
    (timer) IRQ.

    When debugging drivers it is not uncommon to use printk() inside
    driver code while
    IRQs are still blocked. This method is by nature blocking and can only
    be run once
    even on SMP machines. This is because messages printed by printk() must reach
    the console before any other operation might cause the kernel to panic or block.

    In cluster environments a serial console can be used to access
    machines remotely.
    One drawback of using such a console is its longer response time which
    significantly
    lengthens the blocking time of printk().

    Putting all this together results in an interrupt rich environments in
    which many
    interrupt handlers block the IRQs for long periods of time. This
    matters when 'long'
    becomes longer than 1 second / HZ, which means that the CPU misses timer
    interrupts.

    Each time a time interrupt is missed the system looses one jiffy.
    These lost jiffies
    accumulate into a consisting clock drift.

    3. Fix

    The fix proposed here relies on a quality of the Pentium processor and thus is
    irrelevant to other architectures or to older (80386, 80486) CPUs.

    The Pentium has an internal cycles counter called the Time Stamp Counter (TSC).
    This counter is a 64 bit register which is incremented internally
    every CPU cycle and
    can be read using a special assembly instruction. On a 1GHz machine
    this register
    wraps to zero once every 584 years.

    The suggested fix adds code to the timer interrupt handler which
    tracks the value
    of this counter. When an interrupt occurs, the jiffies counter is
    increased not just
    by one but by the number of 1 second / HZ intervals that actually
    passed according
    to the TSC value change.

    This involves modifying two kernel files:

    kernel/timer.c:
            modify the interrupt handler do_timer()

            698a699,703
    > #ifdef X86_TSC_DRIFT_COMPENSATION
    > __u64 xtdc_step;
    > __u64 xtdc_last = 0;
    > #endif
    >
            700a706,712
    > #ifdef X86_TSC_DRIFT_COMPENSATION
    > __u64 cycles = get_cycles();
    > while (xtdc_last < cycles) {
    > (*(unsigned long *)&jiffies)++;
    > xtdc_last += xtdc_step;
    > }
    > #else
            701a714,715
    > #endif
    >

    arch/i386/kernel/time.c

            modify calibrate_tsc() to get the initial TSC count and to calculate the number
            of TSC cycles per 1 second / HZ interval

            768a769,777
    > #ifdef X86_TSC_DRIFT_COMPENSATION
    >
    > extern __u64 xtdc_step;
    > extern __u64 xtdc_last;
    > #define CALIBRATE_LATCH (4 * LATCH)
    > #define CALIBRATE_TIME (4 * 1000020/HZ)
    >
    > #else
    >
            771a781,782
    > #endif
    >
            805a817,820
    > #ifdef X86_TSC_DRIFT_COMPENSATION
    > xtdc_last = (((__u64) endhigh) << 32) + (__u64) endlow;
    > #endif
    >
            811a827,830
    >
    > #ifdef X86_TSC_DRIFT_COMPENSATION
    > xtdc_step = ((((__u64) endhigh) << 32) + (__u64) endlow) >> 2;
    > #endif
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Paul Jackson: "Re: RFD: Kernel release numbering"

    Relevant Pages

    • RE: [PATCH] perf_counter: Fix a race on perf_counter_ctx
      ... Subject: perf_counter: Fix a race on perf_counter_ctx ... On my system, the kernel hangs. ... interrupt too slow", and I get a soft lockup bug. ...
      (Linux-Kernel)
    • Re: 4.8 "Alternate system clock has died" error
      ... interrupts after interrupt generation has been re-enabled, ... lot of knowledge and experience in kernel hacking. ... There's another clock device, which 'hz' is derived of, for example. ... Perhaps one can call it the CPU clock. ...
      (freebsd-hackers)
    • Re: Preffered OS for a GPS based stratum 1
      ... so it seems to be an acceptable ref clock for my purposes. ... My understanding is Linux 2.6 PPS support is rough at best. ... the kernel is no place to be doing those sanity checks. ... be slightly faster) but you could do an interrupt driven system where the ...
      (comp.protocols.time.ntp)
    • Re: athlon 64 dual core tsc out of sync
      ... the possible fix being the use of the pmtmr. ... if there is support builtin to the kernel for more than ... As Andi has recounted many times already, pmtmr is now the ... My clock was going about 1.414 times as fast as it ought to. ...
      (Linux-Kernel)
    • Re: pcm0 freezes computer with recent -STABLE
      ... Ian's fix for the USB problem also fixed the pcm0 hang ... CVSupping should fix it (I rebuilt my kernel with just this fix ... Save and restore the interrupt mask in fork1instead of using ... This should fix the boot-time USB hangs that have been reported. ...
      (freebsd-stable)