Re: 2.6.29 boot hang



Rusty Russell wrote:
On Wednesday 01 April 2009 15:21:32 Randy Dunlap wrote:
Rusty Russell wrote:
On Wednesday 01 April 2009 07:15:35 Randy Dunlap wrote:
On a 4-proc x86_64 (HP BladeCenter, AMD CPUs) system, booting 2.6.29
(or earlier, back to 2.6.28-6921-g873392c) hangs during boot.

git bisect says:
873392ca514f87eae39f53b6944caf85b1a047cb is first bad commit
commit 873392ca514f87eae39f53b6944caf85b1a047cb
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 23:54:56 2008 +1030

PCI: work_on_cpu: use in drivers/pci/pci-driver.c
...

If I change CONFIG_MICROCODE_AMD=y to CONFIG_MICROCODE_AMD=n & rebuild,
the kernel boots successfully.
How very very odd. My first thought was a deadlock with keventd used
by work_on_cpu (changed in latest Linus tree), but the microcode code at
that version doesn't use work_on_cpu.
Yep, I thought it a bit odd also.

So I don't think that's it, but this patch should canonically eliminate it:

Subject: work_on_cpu(): rewrite it to create a kernel thread on demand
From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
This patch doesn't apply to 2.6.29-final, but it does apply to 2.6.29-git8,

Err, it has 14 line offset. But here's an adjusted one.

Sorry, I had a patch hunk failure for some (odd) reason.

2.6.29 + this patch still hang during boot for me. Same symptoms.

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1f0c509..08bd795 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -971,20 +971,20 @@ undo:
}

#ifdef CONFIG_SMP
-static struct workqueue_struct *work_on_cpu_wq __read_mostly;

struct work_for_cpu {
- struct work_struct work;
+ struct completion completion;
long (*fn)(void *);
void *arg;
long ret;
};

-static void do_work_for_cpu(struct work_struct *w)
+static int do_work_for_cpu(void *_wfc)
{
- struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
-
+ struct work_for_cpu *wfc = _wfc;
wfc->ret = wfc->fn(wfc->arg);
+ complete(&wfc->completion);
+ return 0;
}

/**
@@ -995,17 +995,23 @@ static void do_work_for_cpu(struct work_struct *w)
*
* This will return the value @fn returns.
* It is up to the caller to ensure that the cpu doesn't go offline.
+ * The caller must not hold any locks which would prevent @fn from completing.
*/
long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
{
- struct work_for_cpu wfc;
-
- INIT_WORK(&wfc.work, do_work_for_cpu);
- wfc.fn = fn;
- wfc.arg = arg;
- queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
- flush_work(&wfc.work);
-
+ struct task_struct *sub_thread;
+ struct work_for_cpu wfc = {
+ .completion = COMPLETION_INITIALIZER_ONSTACK(wfc.completion),
+ .fn = fn,
+ .arg = arg,
+ };
+
+ sub_thread = kthread_create(do_work_for_cpu, &wfc, "work_for_cpu");
+ if (IS_ERR(sub_thread))
+ return PTR_ERR(sub_thread);
+ kthread_bind(sub_thread, cpu);
+ wake_up_process(sub_thread);
+ wait_for_completion(&wfc.completion);
return wfc.ret;
}
EXPORT_SYMBOL_GPL(work_on_cpu);
@@ -1021,8 +1027,4 @@ void __init init_workqueues(void)
hotcpu_notifier(workqueue_cpu_callback, 0);
keventd_wq = create_workqueue("events");
BUG_ON(!keventd_wq);
-#ifdef CONFIG_SMP
- work_on_cpu_wq = create_workqueue("work_on_cpu");
- BUG_ON(!work_on_cpu_wq);
-#endif
}


--
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: 2.6.29 boot hang
    ... hangs during boot. ... 873392ca514f87eae39f53b6944caf85b1a047cb is first bad commit ... I thought it a bit odd also. ... It is up to the caller to ensure that the cpu doesn't go offline. ...
    (Linux-Kernel)
  • Re: 2.6.29 boot hang
    ... hangs during boot. ... 873392ca514f87eae39f53b6944caf85b1a047cb is first bad commit ... Subject: work_on_cpu: rewrite it to create a kernel thread on demand ... I get the same boot hang that I was seeing with 2.6.29-final. ...
    (Linux-Kernel)
  • Re: 2.6.29 boot hang
    ... hangs during boot. ... 873392ca514f87eae39f53b6944caf85b1a047cb is first bad commit ... Subject: work_on_cpu: rewrite it to create a kernel thread on demand ... I get the same boot hang that I was seeing with 2.6.29-final. ...
    (Linux-Kernel)
  • Re: 2.6.29 boot hang
    ... hangs during boot. ... 873392ca514f87eae39f53b6944caf85b1a047cb is first bad commit ... Subject: work_on_cpu: rewrite it to create a kernel thread on demand ... I get the same boot hang that I was seeing with 2.6.29-final. ...
    (Linux-Kernel)
  • Re: 2.6.26-git0: IDE oops during boot
    ... Trying to boot 2.6.25-git0, ... Could you please bisect it down to the guilty commit? ... the gdb output, also points to the changes made by the guilty patch ... able to mount the filesystem and panics, am i not sure what is likely causing the panic. ...
    (Linux-Kernel)