Re: 2.6.18-rc6-mm1: GPF loop on early boot



On Sunday 10 September 2006 13:57, Ingo Molnar wrote:
* Andi Kleen <ak@xxxxxxx> wrote:
This kernel won't boot here: it starts a GPFs loop on
early boot. I attached a screenshot of the first GPF
(pause_on_oops=120 helped).

It's lockdep's fault. This patch should fix it:>
Well, it's also x86_64's fault: why does it call into a generic C
function (x86_64_start_kernel()) without having a full CPU state up and
running? i686 doesnt do it, never did.

Actually i686 does now (since Jeremy's patches). The patch was for i386.
x86_64 doesn't need it.

But enough blame game. The "fault" in my original mail wasn't completely
serious anyways ...


We had frequent breakages due to this property of the x86_64 arch code
(many more than this single incident with lockdep), tracing and all
sorts of other instrumentation (including earlier versions of lockdep)
was hit by it again and again.

Well, just getting the new unwinder to work in lockdep was a nightmare
too (most of the problems I had with that were on i386, not x86-64)
Just look at the number of patches that were needed for it.


Basically, non-atomic setup of basic architecture state _is_ going to be
a nightmare, lockdep or not, especially if it uses common infrastructure
like 'current', spin_lock() or even something as simple as C functions.
(for example the stack-footprint tracer was once hit by this weakness of
the x86_64 code)

I disagree with that. The nightmare is putting stuff that needs so much
infrastructure into the most basic operations.

Hackish patch to fix lockdep with PDA current

hm, this is ugly beyond words.

Agreed it is :). But it was the quickest way to fix it.

Do you have a config i could try which
exhibits this problem? I'm sure there is a better solution.

You need the latest x86_64 patch (or latest mm) and enable lockdep
on i386. Possibly with some more interdependencies, but I hope not.

I guess a cleaner way would to set the global variable that turns
the tracing off during boot and then enable it when it starts making
sense. Not 100% sure where that point is.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [patch] Re: 2.6.16-rc4-mm1 (bugs and lockups)
    ... last month - no boot at all, ... OK, booting kernel", and silence. ... stream of the "Unknown interrupt or fault" messages. ... Attached is the patch that allows to gather some valueable ...
    (Linux-Kernel)
  • [patch] Re: 2.6.16-rc4-mm1 (bugs and lockups)
    ... last month - no boot at all, ... OK, booting kernel", and silence. ... stream of the "Unknown interrupt or fault" messages. ... Attached is the patch that allows to gather some valueable ...
    (Linux-Kernel)
  • RE: [Full-Disclosure] Whos to blame for malicious code?
    ... The patch for Blaster came out 26 days prior to the ... Is *that* Microsoft's fault ... Do those users not share at least *some* of the blame? ... The point why I brought up OpenBSD is ...
    (Full-Disclosure)
  • Re: [PATCH 0/6] v2: Asynchronous function calls to speed up booting and more
    ... * added a second libata patch to scan the partition tables from inside ... This saves a ton of boot time for my box with 2 sata ... fastboot: Asynchronous function calls to speed up kernel boot ...
    (Linux-Kernel)
  • Re: Problems from tcpip.sys / eventid=2446 patch
    ... If you have not tried so yet see if you can boot into Safe Mode and then do ... a System Restore to a point in time before the patch was applied. ... version from the last windows update patch to this file in june). ...
    (microsoft.public.windowsxp.security_admin)