Re: How should an exit routine wait for release() callbacks?



Tejun, it just occurred to me that you would be interested in this email
thread. Just to bring you up to speed, here's the original question:

I've got a module which registers a struct device. (It represents a
virtual device, not a real one, but that doesn't matter.) Obviously the
module's exit routine has to wait until the release() routine for that
device has been invoked -- if it returned too early then the release()
call would oops.

How should it wait?

The most straightforward approach is to use a struct completion, like
this:

static struct {
struct device dev;
...
} my_dev;

static DECLARE_COMPLETION(my_completion);

static void my_release(struct device *dev)
{
complete(&my_completion);
}

static void __exit my_exit(void)
{
device_unregister(&my_dev.dev);
wait_for_completion(&my_completion);
}

The problem is that there is no guarantee a context switch won't take
place after my_release() has called complete() and before my_release()
returns. If that happens and my_exit() finishes running, then the module
will be unloaded and the next context switch back to finish off
my_release() will oops.


On Fri, 13 Apr 2007, Cornelia Huck wrote:

In this case the race is not a user space vs. kernel object one (where
you can track users). Basically the problem is as follows:

- A module registers a device. The device's release function is defined
in the module.
- Since the device can now be looked up in the device tree, someone can
obtain a reference to it (e. g. by walking the tree).
- The module is unloaded. In its exit function, it deregisters the
device. The module has now given up any reference to the device it
held, however the someone from above still holds a reference. While no
new reference to the device can be obtained, the device still exists.
- After the module is unloaded, the device's release function goes away.
- The last reference to the device is given up. The driver core now
tries to call the device's release function, which was in the deleted
module. Oops.

The completion approach unfortunately still leaves a race window, as
Alan explained in his original mail.

After thinking about it some more, I realized the conventional answer
would be to give out a module reference. When the device is registered, a
reference to it and its release routine gets passed to the driver core.
Hence a module reference to the owner of the release routine must be
passed as well.

Unforunately that won't work very well in this case, because it would
create circular module references preventing the driver from ever being
unloaded at all! Here's what I mean:

my_device is registered with some core subsystem. The subsystem
acquires a module reference to my module and registers a child
of my_device. It can't drop the module reference until the
child is gone and it has finished using my_device. So my driver
can't be unloaded until it deregisters my_device or the subsystem
itself is unloaded (and unregisters the child).

But the module containing my driver depends on the subsystem,
because it calls the subsystem's registration routine. So the
subsystem can't be unloaded without unloading my driver first.

What's needed is a way to force my driver to unregister itself without
actually unloading it from memory. One way to accomplish this would be to
tell rmmod that it should call the module's exit routine but then wait
for the module's refcount to drop to 0 before unloading it. Like doing
rmmod -w.

However even this would have a problem. Let's say my_exit() unregisters
my_device. Eventually the child's release routine runs, and it does a
put_module() on my module. At that point my module can be unloaded. But
my_release() still hasn't been called! The subsystem's release routine
runs first, because children are released before their parents.

I have to admit, this is a puzzler. I'm beginning to think there should
be two types of module references: Those which (like module dependency)
will prevent rmmod from running, and those which (like the one here) would
automatically be dropped by deregistration. Then every kobject could have
an owner and could hold a reference of the second type to its owner until
its release routine returns.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Illegal instruction - code c000001d (!!! second chance !!!)
    ... Maybe the outstanding IRP below completes before you get unloaded, ... the address of the routine now points ... the Build versions for this driver Checked and Release. ... outstanding before unloading." ...
    (microsoft.public.development.device.drivers)
  • Re: 2.6.11-rc2-mm1
    ... >> It is internal structure that has reference counter, not module itself, ... >> structure may be in use, when module is unloaded, thus unloading must ... It is only because those structures can live outside the driver. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: [linux-usb-devel] Re: [OOPS, usbcore, releaseintf] 2.6.0-test10-mm1
    ... and the simple fix has been not to clean up so aggressively. ... In an earlier message I wrote that the HC driver couldn't unload so long ... as the device usbfs was using held a reference to its bus. ... hcd_free routine is no longer present and you get an oops. ...
    (Linux-Kernel)
  • Re: Illegal instruction - code c000001d (!!! second chance !!!)
    ... the address of the routine now points ... the Build versions for this driver Checked and Release. ... outstanding before unloading." ... This means a trap occurred in kernel mode, and it's a trap of a kind ...
    (microsoft.public.development.device.drivers)
  • Re: Illegal instruction - code c000001d (!!! second chance !!!)
    ... what is the output of!wdflogdump <your driver>? ... > outstanding before unloading. ... >> It's possible that the routine that was passed to the WFD framework ... This means a trap occurred in kernel mode, and it's a trap of a kind ...
    (microsoft.public.development.device.drivers)