Re: summary (Re: [patch] prefer TSC over PM Timer)

From: George Anzinger (george_at_mvista.com)
Date: 11/17/04

  • Next message: Antonino A. Daplas: "Re: Linux 2.6.10-rc2 SAVAGEFB startup crash"
    Date:	Wed, 17 Nov 2004 14:30:23 -0800
    To: dean gaudet <dean-list-linux-kernel@arctic.org>
    
    

    dean gaudet wrote:
    > ok thanks everyone... i've been educated, and attempted to summarize the
    > situation.
    >
    > if timer_pm is fixed to read the PM timer only once on non-broken systems
    > then it is generally the best choice. it is only at a ~3x disadvantage
    > compared to tsc/lapic in that case.
    >
    > until/unless C3 and deeper resync tsc then it's best not to default to tsc
    > even on transmeta. it would require some co-ordination between timer_tsc
    > and ACPI code to know if C3/etc. are enabled, i don't see that
    > co-ordination there now. so it really does seem like adding "clock=tsc"
    > to boot is best left to installers/users/not-the-kernel for now.
    >
    > here's my device summary:
    >
    > PIT:
    > - many slow i/o accesses to read
    > - works everywhere
    >
    > PM:
    > - minimum one slow i/o access to read
    > - measurements on a handful of systems show one PM timer read
    > costs ~3x a TSC read.
    > - kernel presently uses 3 reads as a bug workaround, but can be
    > reduced to one read.
    > - works on ~all hardware less than a few years old

    Both the PIT and PM use the same 14.3181818MHz "rock" which is chosen for time
    keeping. As such the PIT & PM should be considered the "GOLD" standard for time
    keeping.
    >
    > TSC:
    > - fast read
    > - on most systems this varies with power mgmt -- and some power mgmt
    > occurs "behind-the-scenes" without kernel awareness
    > - cpufreq is better and better at tracking the changes (but not on SMP?)
    > - 2.6.10-rc2 disables even more behind-the-scenes power mgmt
    > - stops counting in C3 (solved? with PIT/PM/RTC read coming out of C3)
    > - drift possible across nodes in NUMA

    The TSC frequency is unknown. During boot an attempt is made to calibrate it by
    comparing it with the PIT. This attempt is flawed by the I/O delays in
    accessing the PIT and so will be off by 5 or more counts per tick (measured on
    an 800 MHZ box, and this was done after changing the calibration time to the max
    PIT count, ~50ms, and attempting to pair the beginning and ending I/O
    instructions so as to, as much as possible, negate the I/O delays). It is also
    not driven by a time keeping "rock" and may also be varied to lower EMI
    radiation (isn't time keeping interesting).
    >
    > local APIC:
    > - fast read (approx same as TSC)
    > - enabling lapic causes some dell laptops to crash
    > - stops counting in C3 (solvable with PIT/PM/RTC read coming out of C3)
    > - shared with scheduler -- easy to manage today
    > - can't be shared with scheduler if we add variable scheduler ticks
    > (can't read CCR and write ICR atomically -- potential to drift)
    > - local apic timer ticks are the best choice for scheduling on SMP
    > because it allows all the CPU schedulers to be skewed and avoid
    > lock conflicts.
    Actually doing this is problematic as it skews the timer expire time. With the
    per cpu timer lists in 2.6 there is very little lock contention. I think we can
    safely dismiss the lock issue.
    > - drift possible across nodes in NUMA?

    The APIC timer is again on a different "rock" which is not designed for time
    keeping and, again, is calibrated at boot up against the "GOLD" standard PIT.

    IMHO, the best time keeping we can get in and x86 box is to:

    a) set up the PIT up to do the 1/HZ ticks (once set up we do not need to touch
    it again so the I/O access issues become mute),

    b) select either the TSC (if we think it is stable) or the pm_timer to do the
    short term between tick interpolation and also to detect and correct for PIT
    interrupt overrun (like we missed a tick or two). We should prefer the TSC here
    because of speed and that it is read every gettimeofday() access.

    c) Use the PIT interrupt (followed by an IPI from the PIT interrupt handler for
    SMP systems) to do the scheduler and timer list servicing. (We really do want
    the timer list to be serviced as close to the jiffies++ as possible.)

    d) Use the APIC timer for both finer (as in High Resolution Timers, HRT) and
    courser timing (as in variable scheduler ticks, VST).

    The current HRT patch (see signature) does a, b, and c. I am currently working
    on d.
    >
    > HPET:
    > - at the moment i know nothing about it (none of my systems have it)

    Well, we do know that it is in I/O space and all that that implies...
    >
    > let me know if i've missed anything.
    >

    -- 
    George Anzinger   george@mvista.com
    High-res-timers:  http://sourceforge.net/projects/high-res-timers/
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Antonino A. Daplas: "Re: Linux 2.6.10-rc2 SAVAGEFB startup crash"

    Relevant Pages

    • Re: summary (Re: [patch] prefer TSC over PM Timer)
      ... >>short term between tick interpolation and also to detect and correct for PIT ... We should prefer the TSC here ... > ticks allows for time inconsistencies. ...
      (Linux-Kernel)
    • Re: LAPIC timer and PIT
      ... The PIT is not very reliable and it's easier to miss ticks using it ... than using other timers. ... Using the TSC is a way to realize that a PIT ...
      (comp.protocols.time.ntp)
    • Re: i8254 and TSC related code
      ... Depending on interrupt latencies ... > a non-buggy PIT. ... > because the system time is calculated based on both TSC timestamps ... The idea of using TSC to detect lost timer ...
      (Linux-Kernel)
    • [patch 15/21] clockevents: drivers for i386
      ... lapic and PIT. ... Update the timer IRQ to call into the PIT driver's event handler ... * of the boot CPU and register the clock event in the framework. ...
      (Linux-Kernel)
    • [patch 15/22] clockevents: drivers for i386
      ... lapic and PIT. ... timer IRQ to call into the PIT driver's event handler and the lapic-timer IRQ ... * of the boot CPU and register the clock event in the framework. ...
      (Linux-Kernel)