Re: [patch 15/23] high-res timers: core
- From: Andrew Morton <akpm@xxxxxxxx>
- Date: Sat, 30 Sep 2006 01:43:56 -0700
On Fri, 29 Sep 2006 23:58:34 -0000
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
add the core bits of high-res timers support.
the design makes use of the existing hrtimers subsystem which manages a
per-CPU and per-clock tree of timers, and the clockevents framework, which
provides a standard API to request programmable clock events from. The
core code does not have to know about the clock details - it makes use
of clockevents_set_next_event().
the code also provides dyntick functionality: it is implemented via a
per-cpu sched_tick hrtimer that is set to HZ frequency, but which is
reprogrammed to a longer timeout before going idle, and reprogrammed to
HZ again once the CPU goes busy again. (If an non-timer IRQ hits the
idle task then it will process jiffies before calling the IRQ code.)
the impact to non-high-res architectures is intended to be minimal.
...
@@ -108,17 +134,53 @@ struct hrtimer_cpu_base {
spinlock_t lock;
struct lock_class_key lock_key;
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+#ifdef CONFIG_HIGH_RES_TIMERS
+ ktime_t expires_next;
+ int hres_active;
+ unsigned long check_clocks;
+ struct list_head cb_pending;
+ struct hrtimer sched_timer;
+ struct pt_regs *sched_regs;
+ unsigned long events;
+#endif
You forgot to update the kerneldoc for this struct.
Does `events' needs to be long?
<looks>
oh, it's a scalar this time ;)
+#ifdef CONFIG_HIGH_RES_TIMERS
+
+extern void hrtimer_clock_notify(void);
+extern void clock_was_set(void);
+extern void hrtimer_interrupt(struct pt_regs *regs);
+
+# define hrtimer_cb_get_time(t) (t)->base->get_time()
+# define hrtimer_hres_active (__get_cpu_var(hrtimer_bases).hres_active)
These two could be inline functions?
That might cause include file ordering problems I guess.
+/*
+ * The resolution of the clocks. The resolution value is returned in
+ * the clock_getres() system call to give application programmers an
+ * idea of the (in)accuracy of timers. Timer values are rounded up to
+ * this resolution values.
+ */
+# define KTIME_HIGH_RES (ktime_t) { .tv64 = CONFIG_HIGH_RES_RESOLUTION }
+# define KTIME_MONOTONIC_RES KTIME_HIGH_RES
+
+#else
+
+# define KTIME_MONOTONIC_RES KTIME_LOW_RES
+
/*
* clock_was_set() is a NOP for non- high-resolution systems. The
* time-sorted order guarantees that a timer does not expire early and
* is expired in the next softirq when the clock was advanced.
*/
-#define clock_was_set() do { } while (0)
-#define hrtimer_clock_notify() do { } while (0)
-extern ktime_t ktime_get(void);
-extern ktime_t ktime_get_real(void);
+# define clock_was_set() do { } while (0)
+# define hrtimer_clock_notify() do { } while (0)
these could be inlines.
+# define hrtimer_cb_get_time(t) (t)->base->softirq_time
Does this need parenthesisation? Probably it's OK.. An inline function
would be nicer.
+# define hrtimer_hres_active 0
Perhaps this would be better if it was presented as a function.
+/* High resolution timer related functions */
+#ifdef CONFIG_HIGH_RES_TIMERS
+
+static ktime_t last_jiffies_update;
What's this do?
+/*
+ * Reprogramm the event source with checking both queues for the
"Reprogramme" ;)
+ * next event
+ * Called with interrupts disabled and base->lock held
+ */
+static void hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base)
+{
+ int i;
+ struct hrtimer_clock_base *base = cpu_base->clock_base;
+ ktime_t expires;
+
+ cpu_base->expires_next.tv64 = KTIME_MAX;
+
+ for (i = HRTIMER_MAX_CLOCK_BASES; i ; i--, base++) {
Downcounting loops hurt my brain. Does it actually generate better code?
+ struct hrtimer *timer;
+
+ if (!base->first)
+ continue;
+ timer = rb_entry(base->first, struct hrtimer, node);
+ expires = ktime_sub(timer->expires, base->offset);
+ if (expires.tv64 < cpu_base->expires_next.tv64)
+ cpu_base->expires_next = expires;
+ }
+
+ if (cpu_base->expires_next.tv64 != KTIME_MAX)
+ clockevents_set_next_event(cpu_base->expires_next, 1);
+}
+
+/*
+ * Shared reprogramming for clock_realtime and clock_monotonic
+ *
+ * When a new expires first timer is enqueued, we have
That sentence might need work.
+/*
+ * Retrigger next event is called after clock was set
+ */
+static void retrigger_next_event(void *arg)
+{
+ struct hrtimer_cpu_base *base;
+ struct timespec realtime_offset;
+ unsigned long flags, seq;
+
+ do {
+ seq = read_seqbegin(&xtime_lock);
+ set_normalized_timespec(&realtime_offset,
+ -wall_to_monotonic.tv_sec,
+ -wall_to_monotonic.tv_nsec);
+ } while (read_seqretry(&xtime_lock, seq));
+
+ base = &per_cpu(hrtimer_bases, smp_processor_id());
+
+ /* Adjust CLOCK_REALTIME offset */
+ spin_lock_irqsave(&base->lock, flags);
+ base->clock_base[CLOCK_REALTIME].offset =
+ timespec_to_ktime(realtime_offset);
+
+ hrtimer_force_reprogram(base);
+ spin_unlock_irqrestore(&base->lock, flags);
+}
+
+/*
+ * Clock realtime was set
+ *
+ * Change the offset of the realtime clock vs. the monotonic
+ * clock.
+ *
+ * We might have to reprogram the high resolution timer interrupt. On
+ * SMP we call the architecture specific code to retrigger _all_ high
+ * resolution timer interrupts. On UP we just disable interrupts and
+ * call the high resolution interrupt code.
+ */
+void clock_was_set(void)
+{
+ preempt_disable();
+ if (hrtimer_hres_active) {
+ retrigger_next_event(NULL);
+
+ if (smp_call_function(retrigger_next_event, NULL, 1, 1))
+ BUG();
+ }
+ preempt_enable();
+}
If you use on_each_cpu() here you know that retrigger_next_event() will be
called under local_irq_disable(). The preempt_disable() goes away and the
spin_lock_irqsave() in retrigger_next_event() becomes a spin_lock() and
everything becomes simpler.
+/**
+ * hrtimer_clock_notify - A clock source or a clock event has been installed
+ *
+ * Notify the per cpu softirqs to recheck the clock sources and events
+ */
+void hrtimer_clock_notify(void)
+{
+ int i;
+
+ for (i = 0; i < NR_CPUS; i++)
+ set_bit(0, &per_cpu(hrtimer_bases, i).check_clocks);
+}
This will go splat if/when the arch chooses to not implement per-cpu
storage for not-possible CPUs. Use for_each_possible_cpu().
+
+static const ktime_t nsec_per_hz = { .tv64 = NSEC_PER_SEC / HZ };
+
This could use the same trick as KTIME_HIGH_RES and friends. But perhaps
the compiler will generate the same code..
+/*
+ * We switched off the global tick source when switching to high resolution
+ * mode. Update jiffies64.
+ *
+ * Must be called with interrupts disabled !
+ *
+ * FIXME: We need a mechanism to assign the update to a CPU. In principle this
+ * is not hard, but when dynamic ticks come into play it starts to be. We don't
+ * want to wake up a complete idle cpu just to update jiffies, so we need
+ * something more intellegent than a mere "do this only on CPUx".
+ */
+static void update_jiffies64(ktime_t now)
+{
+ ktime_t delta;
+
+ write_seqlock(&xtime_lock);
+
+ delta = ktime_sub(now, last_jiffies_update);
+ if (delta.tv64 >= nsec_per_hz.tv64) {
+
stray blank line.
+ unsigned long orun = 1;
"orun"?
+
+ delta = ktime_sub(delta, nsec_per_hz);
+ last_jiffies_update = ktime_add(last_jiffies_update,
+ nsec_per_hz);
+
+ /* Slow path for long timeouts */
+ if (unlikely(delta.tv64 >= nsec_per_hz.tv64)) {
+ s64 incr = ktime_to_ns(nsec_per_hz);
+ orun = ktime_divns(delta, incr);
+
+ last_jiffies_update = ktime_add_ns(last_jiffies_update,
+ incr * orun);
+ jiffies_64 += orun;
+ orun++;
+ }
That's a bit of a hack isn't it? do_timer() owns the modification of
jiffies_64, so why is this code modifying it as well?
+ do_timer(orun);
twice?
I suspect a bug.
+ }
+ write_sequnlock(&xtime_lock);
+}
+
+/*
+ * We rearm the timer until we get disabled by the idle code
+ */
+static int hrtimer_sched_tick(struct hrtimer *timer)
+{
+ unsigned long flags;
+ struct hrtimer_cpu_base *cpu_base =
+ container_of(timer, struct hrtimer_cpu_base, sched_timer);
+
+ local_irq_save(flags);
+ /*
+ * Do not call, when we are not in irq context and have
+ * no valid regs pointer
+ */
+ if (cpu_base->sched_regs) {
+ update_process_times(user_mode(cpu_base->sched_regs));
+ profile_tick(CPU_PROFILING, cpu_base->sched_regs);
+ }
+
+ hrtimer_forward(timer, hrtimer_cb_get_time(timer), nsec_per_hz);
+ local_irq_restore(flags);
+
+ return HRTIMER_RESTART;
bah. hrtimer_restart is an `enum hrtimer_restart', not an integer.
+ printk(KERN_INFO "hrtimers: Switched to high resolution mode CPU %d\n",
+ smp_processor_id());
"on CPU"
+
+static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
+ struct hrtimer_clock_base *base)
+{
+ /*
+ * When High resolution timers are active try to reprogram. Note, that
+ * in case the state has HRTIMER_CALLBACK set, no reprogramming and no
+ * expiry check happens. The timer gets enqueued into the rbtree and
+ * the reprogramming / expiry check is done in the hrtimer_interrupt or
+ * in the softirq.
+ */
This (useful) comment should be above the function, not inside it.
+ if (hrtimer_hres_active && hrtimer_reprogram(timer, base)) {
+
+ /* Timer is expired, act upon the callback mode */
+ switch(timer->cb_mode) {
+ case HRTIMER_CB_IRQSAFE_NO_RESTART:
+ /*
+ * We can call the callback from here. No restart
+ * happens, so no danger of recursion
+ */
+ BUG_ON(timer->function(timer) != HRTIMER_NORESTART);
Doing assert(thing-which-has-side-effects) is poor form.
I doubt if the kernel will work if someone goes and disables BUG_ON, but
it's a laudable objective.
+ return 1;
+ case HRTIMER_CB_IRQSAFE_NO_SOFTIRQ:
+ /*
+ * This is solely for the sched tick emulation with
+ * dynamic tick support to ensure that we do not
+ * restart the tick right on the edge and end up with
+ * the tick timer in the softirq ! The calling site
+ * takes care of this.
+ */
+ return 1;
+ case HRTIMER_CB_IRQSAFE:
+ case HRTIMER_CB_SOFTIRQ:
+ /*
+ * Move everything else into the softirq pending list !
+ */
+ hrtimer_add_cb_pending(timer, base);
+ raise_softirq(HRTIMER_SOFTIRQ);
+ return 1;
+ default:
+ BUG();
+ }
+ }
+ return 0;
+}
+
+static inline void hrtimer_resume_jiffie_update(void)
hrtimer_resume_jiffy_update
+{
+ unsigned long flags;
+ ktime_t now = ktime_get();
+
+ write_seqlock_irqsave(&xtime_lock, flags);
+ last_jiffies_update = now;
+ write_sequnlock_irqrestore(&xtime_lock, flags);
+}
+
+#else
+
+# define hrtimer_hres_active 0
+# define hrtimer_check_clocks() do { } while (0)
+# define hrtimer_enqueue_reprogram(t,b) 0
+# define hrtimer_force_reprogram(b) do { } while (0)
+# define hrtimer_cb_pending(t) 0
+# define hrtimer_remove_cb_pending(t) do { } while (0)
+# define hrtimer_init_hres(c) do { } while (0)
+# define hrtimer_init_timer_hres(t) do { } while (0)
+# define hrtimer_resume_jiffie_update() do { } while (0)
+
+#endif /* CONFIG_HIGH_RES_TIMERS */
+
/*
* Timekeeping resumed notification
resume
+#ifdef CONFIG_HIGH_RES_TIMERS
+
+/*
+ * High resolution timer interrupt
+ * Called with interrupts disabled
+ */
+void hrtimer_interrupt(struct pt_regs *regs)
+{
+ struct hrtimer_clock_base *base;
+ ktime_t expires_next, now;
+ int i, raise = 0, cpu = smp_processor_id();
+ struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+
+ BUG_ON(!cpu_base->hres_active);
+
+ /* Store the regs for an possible sched_timer callback */
+ cpu_base->sched_regs = regs;
+ cpu_base->events++;
+
+ retry:
+ now = ktime_get();
+
+ /* Check, if the jiffies need an update */
+ update_jiffies64(now);
+
+ expires_next.tv64 = KTIME_MAX;
+
+ base = cpu_base->clock_base;
+
+ for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
+ ktime_t basenow;
+ struct rb_node *node;
+
+ spin_lock(&cpu_base->lock);
+
+ basenow = ktime_add(now, base->offset);
Would it be better to take the lock outside the loop, rather than hammering
on it like this?
+ while ((node = base->first)) {
+ struct hrtimer *timer;
+
+ timer = rb_entry(node, struct hrtimer, node);
+
+ if (basenow.tv64 < timer->expires.tv64) {
+ ktime_t expires;
+
+ expires = ktime_sub(timer->expires,
+ base->offset);
+ if (expires.tv64 < expires_next.tv64)
+ expires_next = expires;
+ break;
+ }
+
+ /* Move softirq callbacks to the pending list */
+ if (timer->cb_mode == HRTIMER_CB_SOFTIRQ) {
+ __remove_hrtimer(timer, base, HRTIMER_PENDING, 0);
+ hrtimer_add_cb_pending(timer, base);
+ raise = 1;
+ continue;
+ }
+
+ __remove_hrtimer(timer, base, HRTIMER_CALLBACK, 0);
+
+ if (timer->function(timer) != HRTIMER_NORESTART) {
+ BUG_ON(timer->state != HRTIMER_CALLBACK);
+ /*
+ * state == HRTIMER_CALLBACK prevents
+ * reprogramming. We do this when we break out
+ * of the loop !
+ */
+ enqueue_hrtimer(timer, base);
+ }
+ timer->state &= ~HRTIMER_CALLBACK;
+ }
+ spin_unlock(&cpu_base->lock);
+ base++;
+ }
+
+ cpu_base->expires_next = expires_next;
+
+ /* Reprogramming necessary ? */
+ if (expires_next.tv64 != KTIME_MAX) {
+ if (clockevents_set_next_event(expires_next, 0))
+ goto retry;
+ }
+
+ /* Invalidate regs */
+ cpu_base->sched_regs = NULL;
+
+ /* Raise softirq ? */
+ if (raise)
+ raise_softirq(HRTIMER_SOFTIRQ);
+}
+
...
static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode)
@@ -701,7 +1226,8 @@ static int __sched do_nanosleep(struct h
set_current_state(TASK_INTERRUPTIBLE);
hrtimer_start(&t->timer, t->timer.expires, mode);
- schedule();
+ if (likely(t->task))
+ schedule();
Why? Needs a comment.
@@ -0,0 +1,22 @@
+#
+# Timer subsystem related configuration options
+#
+config HIGH_RES_TIMERS
+ bool "High Resolution Timer Support"
+ depends on GENERIC_TIME
+ help
+ This option enables high resolution timer support. If your
+ hardware is not capable then this option only increases
+ the size of the kernel image.
+
+config HIGH_RES_RESOLUTION
+ int "High Resolution Timer resolution (nanoseconds)"
+ depends on HIGH_RES_TIMERS
+ default 1000
+ help
+ This sets the resolution in nanoseconds of the high resolution
+ timers. Too fine a resolution (small a number) will usually
+ not be observable due to normal system latencies. For an
+ 800 MHz processor about 10,000 (10 microseconds) is recommended as a
+ finest resolution. If you don't need that sort of resolution,
+ larger values may generate less overhead.
In that case the default is far too low.
What value are you suggesting that users and vendors set it to?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- [patch 00/23]
- From: Thomas Gleixner
- [patch 15/23] high-res timers: core
- From: Thomas Gleixner
- [patch 00/23]
- Prev by Date: [PATCH 4 of 4] x86-64: Calgary IOMMU: Fix off by one when calculating register space
- Next by Date: Re: [patch 14/23] clockevents: drivers for i386
- Previous by thread: [patch 15/23] high-res timers: core
- Next by thread: [patch 09/23] dynticks: extend next_timer_interrupt() to use a reference jiffie
- Index(es):
Relevant Pages
|
|