Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)



On Wed, 07 Jun 2006 15:04:07 +1000
Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:

Either whack the PIC in setup_arch() or reorganise start_kernel() in some
appropriate manner.

Neither would be satisfactory. Whacking the PIC means accessing
hardware, which for a lot of architectures means having page tables up,
some kind of ioremap, etc... Hence the bunch of workarounds done by
various archs like having their PTE allocation function do horrors like
if (mem_init_done) kmalloc() else alloc_bootmem().

Why on earth does the PIC come up pulling an interrupt when it hasn't been
spoken to yet?

It would make so much more sense to have the init code do something
like:

setup_arch();
init_basic_kernel_services(); <--- that's the blob you spotted with mem
init, slab init, ...
init_arch(); <--- new arch hook

and later on, as part of the various inits, you get init_IRQ() and so
on...

In my example, init_arch() would be where the arch code moves the bits
currently in setup_arch() that do things like ioremap system devices and
do things that may want to use the slab etc... thus leaving setup_arch()
to very basic initialisations.

Not being able to do all of those because we have this
hyper-optimized-mutex-blah thing that hard enables interrupt all over
the place seems like a stupid thing to me. In fact, as you mentioned, it
only affects a debug code path which thus could perfectly take the
performance hit.

Nonsense. mutex_lock() can sleep. Sleeping will enable interrupts.
Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED,
that's it.

But now, because some broken piece of hardware is coming out of
reset/firmware asserting an interrupt we need to change the rules to be
"mutex_lock() must preserve local interrupts if the lock is uncontended".
Ditto down(), down_read() and down_write().

And why does this bizarre restriction upon the implementation of our
locking primtives exist? Because of your broken PIC and because of our
inability to sort out the early boot code. And because the early boot code
has this implicit knowledge that the locks will be uncontended, else we're
toast.

We're doing mutex_lock(), down(), down_read() and down_write() with local
interrupts disabled, which is a bug. We have explicit code in there to
*disable* our runtime debugging checks because we know about this bug but
don't know how to fix it.

I call that sucky.

But I'll be merging
work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
so we'll just continue to suck I guess.

How so ? Can you tell me how making the mutex debug code path do
something sane makes it 'suck' ? Don't argue about the couple of cycles
benefit, as you mentionned yourself, it's a debug code path.


Would you prefer "wildly idiotic"?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
    ... just because we cannot get our act together in the first few ... local interrupts remain disabled for impractical amounts of time during boot. ... It would make so much more sense to have the init code do something ... Can you tell me how making the mutex debug code path do ...
    (Linux-Kernel)
  • Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
    ... Therefore, hence, ergo ipso facto mutex_lockcan enable interrupts. ... I think it's a fairly sane thing to require mutexes and other ... inability to sort out the early boot code. ... benefit, as you mentionned yourself, it's a debug code path. ...
    (Linux-Kernel)
  • Re: timing code in 2.6.1
    ... The hardware doesn't produce an interrupt ... For medical equipment which affects patient safety, ... These will produce interrupts. ... very likely that you need accurate and consistent time intervals ...
    (Linux-Kernel)
  • Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
    ... >> you don't reassign IOMMU entries at least). ... >> the hardware, as if all interrupts are shared. ... I guess there could be a problem with drivers ...
    (Linux-Kernel)
  • Re: Stopping execution
    ... ages" of the 8 bit 8085, I understood something about interrupts. ... That driver tells the OS that a hardware event has ... certain system event" slow down the acutal execution of a bit of code ... slowdown due to the OS' kernel having to do something. ...
    (microsoft.public.vc.language)