RE: [Bug #10117] 2.6.25-current-git sometimes hangs on boot - dual-core Sony Vaio





-----Original Message-----
From: Rafael J. Wysocki [mailto:rjw@xxxxxxx]
Sent: Tuesday, April 15, 2008 2:04 PM
To: Adrian Bunk
Cc: Carlos R. Mafra; Linux Kernel Mailing List; Soeren
Sonnenburg; Pallipadi, Venkatesh
Subject: Re: [Bug #10117] 2.6.25-current-git sometimes hangs
on boot - dual-core Sony Vaio

On Tuesday, 15 of April 2008, Adrian Bunk wrote:
On Tue, Apr 15, 2008 at 10:33:38PM +0200, Rafael J. Wysocki wrote:
On Tuesday, 15 of April 2008, Carlos R. Mafra wrote:
On Sun 13.Apr'08 at 17:25:45 -0300, Carlos R. Mafra wrote:
On Sun 13.Apr'08 at 20:56:41 +0200, Rafael J. Wysocki wrote:
This message has been generated automatically as a
part of a report
of recent regressions.

The following bug entry is on the current list of
known regressions
from 2.6.24. Please verify if it still should be listed.


Bug-Entry :
http://bugzilla.kernel.org/show_bug.cgi?id=10117
Subject : 2.6.25-current-git sometimes
hangs on boot - dual-core Sony Vaio
Submitter : Soeren Sonnenburg <kernel@xxxxxx>
Date : 2008-02-23 18:55 (51 days old)
References : http://lkml.org/lkml/2008/2/23/263
http://lkml.org/lkml/2008/4/4/41
http://lkml.org/lkml/2008/4/9/69

Soeren said it no longer happens to him in
http://lkml.org/lkml/2008/4/9/53
but unfortunately it still happens with me using -rc9.
So I kidnapped his
bugzilla report :-)

In the bugzilla entry I said earlier today that
"hpet=disable" apparently
makes the problem go away (42 boots OK so far, whereas
without this
boot option it hangs ~90% using vga=6 and ~10% using
vga=0x0364)

I tried to bisect it, but sometimes in pre 2.6.25-rc1
kernels it takes
30 boots before the first hang to occur. So bisection
is not reliable...

If someone proposes a patch I will be glad to test it!

PS: The similar bug in buzilla 10377 also appears to be "fixed"
by using hpet=disable, see comment #17 in that bug.


From what Mark Lord said in his comments #33 to #35 in
http://bugzilla.kernel.org/show_bug.cgi?id=10117
it appears that this is a much older regression, from april 2007.

So this is a regression, but not from 2.6.24 (although somehow
it never hit me before). I don't know about the policy of closing
regressions that come from way before the previous
kernel version,
if there is any. Then I will let you manage the bugzilla #10117
as you see fit (but I will be "there" to hopefuly test any
proposed patches).

I dropped the bug from the list of recent regressions, so
it doesn't block
bug #9832 any more. However, this still is a bug and
regression, so the
bugzilla entry remains open.

Soerens original report was a 2.6.25 regression.

And #10377 that was closed as a duplicate of #10117 was also
reported as
a 2.6.25 regression.

#10117 seems to suffer from the common disease of people
hijacking an
existing bug, but Soeren's issue that was what was
originally tracked in
#10117 is (or was) a 2.6.25 regression.

Well, I'm really not 100% sure it was a regression from 2.6.24
and I'm not
sure bug #10377 should have been marked as a duplicate.

I made bug #10117 block bug #9832 again, but it would be nice
to sort this out.

Why do we think that the cause of bugs #10117 and #10377 is the same?

Rafael


Both of them probabilistically hang early in the boot.
On both !CPUIDLE and hpet=disable seems to be working around the
problem.
Both are Core 2 Duo based with 64 bit kernel.

One difference I saw was that #10377 fails on battery. That may be
because when on battery CPUs may be running at lower freq during boot
and that is probably helping this problem in terms of timing.

Thanks,
Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages