Re: Fix time going backward with clock=pit [1/2]



On Fri, 19 May 2006 02:09:29 +0200 (CEST), Roman Zippel <zippel@xxxxxxxxxxxxxx> wrote:
Hi,

On Thu, 18 May 2006, Tim Mann wrote:

I think the whole think should look
something like this:

if (jiffies_t == jiffies_p) {
if (count > count_p) {
underflow or crappy timer;

What should the code do in this case?

Basically the current do_timer_overflow().

Oh, I see, you were also assuming there's a way to fix that.

}
} else {
jiffies_p = jiffies_t;
if (count > LATCH/2 && underflow)
count -= LATCH;

I think I see what you're aiming at here: in the case where we read
count, then the counter wraps, then we read jiffies, you want to detect
that and fix it. Actually if you could detect that, the right way to
fix it would be to set count = LATCH, since the old count is, well, old:
the current time is right after the jiffy.

It's really "-= LATCH". :)

Yeah. :-)

Tangentially, let me point out another thing: letting get_offset_pit
return more than a jiffy is dangerous if we can ever lose a tick (due to
interrupts being disabled for too long, etc.). If that happens,
get_offset_pit might advance time past the lost tick, but when the next
non-lost tick comes in, jiffies is incremented by only 1 and count
recycles again, so time effectively snaps backward to the point where
the lost tick occurred. Losing ticks is bad in itself, of course, but
having that make time actually go backward by about a jiffy (rather than
just stop for one jiffy) seems a bit worse.

However, we don't really have a way to detect that this case happened --
the "&& underflow" in your code is a handwave.

Ok, I'm not that familiar with Intel hardware (it must be crappier than I
thought). Is there really no way to detect the pending interrupt (e.g.
what do_timer_overflow() does)? Without that information one can really
only guess the time.

It's not that important if it's not completely correct for SMP systems,
they usually have other sources, but for the few systems there this is the
only time source, we should at least make an effort to avoid the read
error.

Hmm. If you don't care about SMP systems, that makes the problem
tractable. In that case get_offset_pit can assume that acknowledging
the interrupt and incrementing jiffies happen atomically (since that's
done at interrupt level), so checking whether there's an unacknowledged
interrupt is a sound approach. I'm definitely not expert enough to be
sure how/if you can do that correctly, though. The current code in
do_timer_overflow may be correct for systems using PIC interrupt
routing, but it doesn't seem to work in the APIC systems I've tried it
on, and I don't have a suggestion for how to fix that case. Maybe
someone else does...?

It also would be preferable to fix the SMP case so that at least time
doesn't go backward there, in case someone tries to use the pit
clocksource there. It's quite easy to hit the window where one CPU
calls gettimeofday while another one has ack'd a timer interrupt but
hasn't incremented jiffies yet. Or I suppose we could disable the pit
clocksource for SMP systems, but that seems a bit draconian.

--
Tim Mann work: mann@xxxxxxxxxx home: tim@xxxxxxxxxxxx
http://www.vmware.com http://tim-mann.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Fix time going backward with clock=pit [1/2]
    ... return more than a jiffy is dangerous if we can ever lose a tick (due to ... This can only happen if someone disables the interrupts for too long and ... Is there really no way to detect the pending interrupt (e.g. ... It's not that important if it's not completely correct for SMP systems, ...
    (Linux-Kernel)
  • Re: [PATCH 1/6] new timeofday core subsystem for -mm (v.B3)
    ... tick compensation. ... or interrupt starvation. ... almost all of the arch generic time code. ... I have come to believe the current arch generic tick based timekeeping ...
    (Linux-Kernel)
  • [GIT]: Networking
    ... ATH9K interrupt storm fix. ... Three bluetooth btusb driver fixes to deal with some regression list ...
    (Linux-Kernel)
  • RE: [PATCH] perf_counter: Fix a race on perf_counter_ctx
    ... Subject: perf_counter: Fix a race on perf_counter_ctx ... On my system, the kernel hangs. ... interrupt too slow", and I get a soft lockup bug. ...
    (Linux-Kernel)
  • Re: [RFC][PATCH] new timeofday core subsystem (v.A0)
    ... and this is the rate at which jiffies must tick. ... > What do you do if one clock lacks resolution, and the other clock lacks digits? ... You'll just need an interrupt before roll-over. ...
    (Linux-Kernel)