Re: PATCH: Race in 2.6.0-test2 timer code

From: Andrea Arcangeli (andrea_at_suse.de)
Date: 07/30/03

  • Next message: Mikael Pettersson: "Re: APIC error on CPU0: 02(02)"
    Date:	Wed, 30 Jul 2003 13:16:39 +0200
    To: Ingo Molnar <mingo@elte.hu>
    
    

    On Wed, Jul 30, 2003 at 12:31:23PM +0200, Ingo Molnar wrote:
    >
    > On Wed, 30 Jul 2003, Andrea Arcangeli wrote:
    >
    > >
    > > cpu0 cpu1
    > > ------------ --------------------
    > >
    > > do_setitimer
    > > it_real_fn
    > > del_timer_sync add_timer -> crash
    >
    > would you mind to elaborate the precise race? I cannot see how the above
    > sequence could crash on current 2.6:
    >
    > del_timer_sync() takes the base pointer in timer->base and locks it, then
    > checks whether the timer has this base or not and repeats the sequence if
    > not. add_timer() locks the current CPU's base, then installs the timer and
    > sets timer->base. Where's the race?

    not sure why you mention timer->base, you know the base lock means
    nothing if it's in two different cpus (if it meant anything, it would
    mean we'd still suffer the cacheline bouncing with timer ops running in
    different cpus). To make it more clear I edit timer.c deleting all
    spin_lock(&base->lock) so the base->lock won't make it look like it can
    serialize anything. I delete the local_irqsave too since it doesn't
    matter too for this example (also assume premption turned off so I don't
    need to add the preempt disable in place of the local irq save).

    so for this example we have this:

    int del_timer(struct timer_list *timer)
    {
            tvec_base_t *base;

    repeat:
             base = timer->base;
            if (!base)
                    return 0;
            if (base != timer->base) {
                    goto repeat;
            }
            list_del(&timer->entry);
            timer->base = NULL;

            return 1;
    }

    against this:

    void add_timer(struct timer_list *timer)
    {
            tvec_base_t *base = &get_cpu_var(tvec_bases);
      
              BUG_ON(timer_pending(timer) || !timer->function);

            internal_add_timer(base, timer);
            timer->base = base;
            put_cpu_var(tvec_bases);
    }

    in del_timer, list_del can be reordered after the timer->base = NULL,
    the C compiler can do that. so list_del will run at the same time of
    internal_add_timer(base, timer) that does the list_add_tail.

    list_del and list_add_tail running at the same time will crash.

    Andrea
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Mikael Pettersson: "Re: APIC error on CPU0: 02(02)"

    Relevant Pages

    • Re: 2.6.9-rc4-bk2 bug report
      ... > serial-console to grab the oops. ... Something appears to have wrecked the system-wide timer queue. ... needed to boot and see if the crash goes away. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: PATCH: Race in 2.6.0-test2 timer code
      ... would you mind to elaborate the precise race? ... sequence could crash on current 2.6: ... checks whether the timer has this base or not and repeats the sequence if ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [PATCH] gdth: bugfix for the at-exit problems
      ... and would not sync with the timer function. ... So del_timer_sync the timer before we delete the cards. ... also the reboot notifier function would crash. ... at reboot notifier, nor would you want to I guess. ...
      (Linux-Kernel)
    • [PATCH ver2] gdth: bugfix for the at-exit problems
      ... and would not sync with the timer function. ... So del_timer_sync the timer before we delete the cards. ... And would crash. ...
      (Linux-Kernel)
    • Re: [PATCH] gdth: bugfix for the at-exit problems
      ... and would not sync with the timer function. ... So del_timer_sync the timer before we delete the cards. ... also the reboot notifier function would crash. ...
      (Linux-Kernel)