[bug] hung bootup in various drivers, was: "2.6.21-rc5: known regressions"




i just found a new category of driver regressions in 2.6.21, doing
allyesconfig bzImage bootup tests: the init methods of various drivers
hangs in driver_unregister().

It is caused by this problem: the semantics of driver_unregister() [also
implicitly called in pci_driver_unregister()] has apparently changed
recently. If a driver does:

pci_register_driver(&my_driver);
...
if (some_failure) {
pci_unregister_driver(&my_driver);
...
}

it will hang the bootup in the following piece of code:

drivers/base/driver.c:

void driver_unregister(struct device_driver * drv)
{
bus_remove_driver(drv);
wait_for_completion(&drv->unloaded);

the completion is never done - because nobody removes the bus while the
init is still happening, obviously. (and bootup is serialized anyway)

now, the majority of drivers does the driver unregistry from its
module-cleanup function, so it's not affected by this problem. But if
you apply the debug patch attached further below, and do an allyesconfig
bzImage bootup, there's 3 hits already:

BUG: at drivers/base/driver.c:187 driver_unregister()
[<c0105ff9>] show_trace_log_lvl+0x19/0x2e
[<c01063e2>] show_trace+0x12/0x14
[<c01063f8>] dump_stack+0x14/0x16
[<c063f7e6>] driver_unregister+0x3d/0x43
[<c0488048>] pci_unregister_driver+0x10/0x5f
[<c1b5f7c7>] slgt_init+0x9b/0x1ca
[<c1b31a2d>] init+0x15d/0x2bd
[<c0105bc3>] kernel_thread_helper+0x7/0x10

BUG: at drivers/base/driver.c:187 driver_unregister()
[<c0105ff9>] show_trace_log_lvl+0x19/0x2e
[<c01063e2>] show_trace+0x12/0x14
[<c01063f8>] dump_stack+0x14/0x16
[<c063f7e6>] driver_unregister+0x3d/0x43
[<c0488048>] pci_unregister_driver+0x10/0x5f
[<c0619505>] init_ipmi_si+0x70a/0x738
[<c1b31a2d>] init+0x15d/0x2bd
[<c0105bc3>] kernel_thread_helper+0x7/0x10

BUG: at drivers/base/driver.c:187 driver_unregister()
[<c0105ff9>] show_trace_log_lvl+0x19/0x2e
[<c01063e2>] show_trace+0x12/0x14
[<c01063f8>] dump_stack+0x14/0x16
[<c063f7e6>] driver_unregister+0x3d/0x43
[<c0488048>] pci_unregister_driver+0x10/0x5f
[<c1b6d2d8>] tlan_probe+0x2dd/0x30e
[<c1b31a2d>] init+0x15d/0x2bd
[<c0105bc3>] kernel_thread_helper+0x7/0x10

possibly more could trigger. Each of these 3 places caused an actual
bootup hang on my testbox, so these are real regressions and need to be
fixed.

because there are a good number of drivers that do
pci_unregister_device() from their init function, and because i cannot
see anything obviously wrong in doing an unregister call after a
failure, i think it's driver_unregister() that needs to be fixed. Greg,
what do you think?

Ingo

Index: linux/drivers/base/driver.c
===================================================================
--- linux.orig/drivers/base/driver.c
+++ linux/drivers/base/driver.c
@@ -183,7 +183,8 @@ int driver_register(struct device_driver
void driver_unregister(struct device_driver * drv)
{
bus_remove_driver(drv);
- wait_for_completion(&drv->unloaded);
+ if (!drv->unloaded.done)
+ WARN_ON(1);
}

/**
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Help to fix SLOW WinXP bootup
    ... If the driver is failing just uninstall it or delete it so that it will not ... Randem Systems ... The symptom is slow bootup, approximately 10 minutes, slow application ... record during bootup shows that a driver, aic78xx.sys failed to load. ...
    (microsoft.public.windowsxp.general)
  • Re: [bug] hung bootup in various drivers, was: "2.6.21-rc5: known regressions"
    ... i just found a new category of driver regressions in 2.6.21, ... allyesconfig bzImage bootup tests: the init methods of various drivers ... init is still happening, obviously. ... bootup hang on my testbox, so these are real regressions and need to be ...
    (Linux-Kernel)
  • Re: Intel 2200BG Wifi / doesnt auto connect on powering on the machine / HP Laptop
    ... usually take that to mean all "critical updates" applied and none of ... The current version is 11.1.1.11 with a driver version of 9.0.4.36. ... To make sure it was not the case with Intel's software, we tried with both SSID broadcast enabled and disabled. ... Any idea why it is like this at bootup but not when coming out of sleep mode? ...
    (alt.internet.wireless)
  • Re: Booting on HORM setup blanks the monitor screen
    ... I am quite sure I understood what exactly issue you are having but I may guess you are talking about Display driver initialization. ... After every bootup,the first mouse or keyboard event will blank the system monitor for 1-2 seconds. ... > for every bootup. ...
    (microsoft.public.windowsxp.embedded)
  • CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE]
    ... It turns out that reverting to an older version of the mps driver did not ... I've been running gstat in a loop to monitor the machine. ... Right after the hang occurs a number of drives seem stuck (see full ...
    (freebsd-stable)