Re: 2.6.17 x86_64 regression - reboot fails due to deadlock



"Mr. Berkley Shands" <bshands@xxxxxxxxx> wrote:

With a SuperMicro H8DC8 (nvidia chipset), Dual Opteron 285's, 16GB,
Centos 4.3 -

Under 2.6.16 both the tyan 2895 and the supermicro H8DC8 both will
reboot corectly,
in kernel/sys.c machine_restart() gets called. But with the changes to
sys.c under 2.6.17,
a new path is introduced, calling void kernel_restart_prepare(char *cmd)
which calls blocking_notifier_call_chain(&reboot_notifier_list,
SYS_RESTART, cmd); (line 588)
Which looks at the first element of the notifier list, and blocks
forever. But ONLY on the supermicro.
The tyan, a very similar motherboard does not deadlock. It returns and
still calls machine_restart().
So neither reboot nor "shutdown -fh now" actually get to the bios calls.

on the supermicro, (linux-2.6.17/kernel/sys.c)

static int __kprobes notifier_call_chain(struct notifier_block **nl,
unsigned long val, void *v)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb;

nb = rcu_dereference(*nl);
while (nb) {
ret = nb->notifier_call(nb, val, v); /* this is
the deadlock for the first entry */
if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
break;
nb = rcu_dereference(nb->next);
}
return ret;
}

I see that 2.6.18 reworks this code further.

If I want to hurt myself really, really badly, disabling the call to
blocking_notifier_call_chain(&reboot_notifier_list,...
restores the reboot/power off functions.

In kdb, the system sits idle awaiting something to schedule, but nothing
will schedule since there is
a deadlock on the supermicro. Any clues as to how to find which notifier
is deadlocked?


Are you able to do sysrq-T when it's stuck?

Something like this...
diff -puN kernel/sys.c~a kernel/sys.c

--- a/kernel/sys.c~a
+++ a/kernel/sys.c
@@ -70,6 +70,8 @@
int overflowuid = DEFAULT_OVERFLOWUID;
int overflowgid = DEFAULT_OVERFLOWGID;

+static int foo;
+
#ifdef CONFIG_UID16
EXPORT_SYMBOL(overflowuid);
EXPORT_SYMBOL(overflowgid);
@@ -141,6 +143,9 @@ static int __kprobes notifier_call_chain
nb = rcu_dereference(*nl);
while (nb) {
next_nb = rcu_dereference(nb->next);
+ if (foo)
+ print_symbol("calling %s()\n",
+ (unsigned long)nb->notifier_call);
ret = nb->notifier_call(nb, val, v);
if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
break;
@@ -590,6 +595,7 @@ EXPORT_SYMBOL_GPL(emergency_restart);

static void kernel_restart_prepare(char *cmd)
{
+ foo = 1;
blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
device_shutdown();
_


Be aware that there's a known lock_cpu_hotplug()-vs-cpufreq deadlock, but
afaik it's only been reported during suspend. Disabling CONFIG_HOTPLUG_CPU
might make a difference.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [PATCH 1/2] reboot: Comment and factor the main reboot functions
    ... In the lead up to 2.6.13 I fixed a large number of reboot ... */ +extern void kernel_restart_prepare; ... +extern void kernel_power_off_prepare; + extern void kernel_restart(char *cmd); ...
    (Linux-Kernel)
  • RE: Scheduled Reboot
    ... 00" in a .CMD file and set task scheduler to run the .CMD ... Please attach the event log to me. ... >| Subject: Scheduled Reboot ...
    (microsoft.public.windowsxp.security_admin)
  • Re: Kernel bug
    ... > Hi, I receive this error and must reboot my server, ... I'm not very comfortable dealing with Fedora Core kernels: ... The patch has the side-effect of letting the ... void page_add_file_rmap ...
    (Linux-Kernel)
  • [RFC/PATCH 2.6.11.2 1/1] x86 reboot: Add reboot fixup for gx1/cs5530a
    ... This patch incorporates the suggestions from the previous thread. ... I ran into a problem getting reboot working with 2.6.11 on an embedded ... specific fixups, I put it in another file, reboot_fixups.c. ... +void cs5530a_warm_reset ...
    (Linux-Kernel)
  • Re: Oops on 2.6.14 (debian -1-smp pck)
    ... I've updated my usual debug patch to 2.6.14, ... allow you to continue working, gathering more messages, without reboot). ... void page_add_file_rmap ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)