Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)



On Wed, 07 Jun 2006 13:52:58 +1000
Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:


work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
kernel-kernel-cpuc-to-mutexes.patch

ug. We cannot convert the cpu.c semaphore into a mutex until we work out
why power4 goes titsup if you enable local interrupts during boot.

What is the exact problem ? Some mutex is forcing local irqs enabled
before init_IRQ() ? (Before the normal enabling of IRQ done by
init/main.c just after init_IRQ() more precisely ?)

Any code which does mutex_lock() will have interrupts reenabled if the
mutex code was compiled in debug mode.

This is bad for any architecture. Basically, at this point, the
interrupt controller can be in _any_ state, with possible pending
interrupts for whatever sources, etc...

As we discussed before, that problem should really be fixed in the mutex
code by not hard-enabling.

There is an incredible amount of crap that could be cleaned up for
example by re-ordering a bit the init code and making things like slab
available before init_IRQ/time_init etc... but all of those will break
because of that.

In addition, even without that re-ordering, I'm pretty sure we are
hitting semaphores/mutexes early, before init_IRQ(), already and if not
in generic code, in arch code somewhere down the call stacks.

I don't think that whole pile of problems lurking around the corner is
worth the couple of cycles saved by hard-enabling irq in the mutex
instead of doing a save/restore.

A couple of cycles repeated a zillion times per second for the entire
uptime, just because we cannot get our act together in the first few
seconds of booting. How much does that suck?

And how much does it suck that we require that an attempt to take a
sleeping lock must keep local interrupts disabled if the lock wasn't
contended?

Fortunately, it only happens (or at least, is only _known_ to happen) when
mutex debugging is enabled, so the performance loss is moot.

I do not know where the offending mutex_lock()s are occuring (although it
would be super-simple to find out).

By far the best solution to this would be to remove this requirement that
local interrupts remain disabled for impractical amounts of time during boot.
Either whack the PIC in setup_arch() or reorganise start_kernel() in some
appropriate manner.

But I'll be merging
work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
so we'll just continue to suck I guess.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 1/19] MUTEX: Introduce simple mutex implementation
    ... > a mutex was a sensible implementation tradeoff. ... are really counting semaphores, ... >> acceptable patch that introduces a separate data structure. ... > general semaphore, because a mutex has stronger invariants. ...
    (Linux-Kernel)
  • Re: [PATCH 1/19] MUTEX: Introduce simple mutex implementation
    ... >> A counting semaphore is NOT a perfectly fine mutex, ... >> People are indeed unhappy with the naming, ...
    (Linux-Kernel)
  • [patch 00/15] Generic Mutex Subsystem
    ... generic mutex subsystem that we have in the -rt kernel, ... 'simple mutex' code recently posted by David Howells.) ... 'struct mutex' is 16 bytes. ... than the semaphore based kernel, _and_ it also had 2.8 times less CPU ...
    (Linux-Kernel)
  • Re: [PATCH 1/19] MUTEX: Introduce simple mutex implementation
    ... I thought you were considering naming unhappiness to be a reason ... _for_ the mutex change. ... a real double aquire, and not a race due to lock ordering) easy to see, ... a hung semaphore isn't exactly hard to debug). ...
    (Linux-Kernel)
  • Re: [PATCH 1/19] MUTEX: Introduce simple mutex implementation
    ... > Outside of the arch directories, yes; but I don't know that I've made the ... I'm in the crowd that thinks that the mutex downs and ups should be ... that should be a semaphore. ... > I've attempted to review everything in 2.6.15-rc5 outside of most of the archs. ...
    (Linux-Kernel)