Re: device struct bloat



On Tue, 6 Nov 2007, Peter Zijlstra wrote:

On Tue, 2007-11-06 at 10:36 -0500, Alan Stern wrote:

I gather the idea is to convert dev->sem to a mutex. This idea had
occurred to me a long time ago but I didn't pursue it because of the
sheer number of places where dev->sem gets used

That should never stop us from doing the right thing :-), also dev->sem
isn't used nearly as much as for example work_struct which was changed.

My hat is off to David Howells. He's willing to carry out far-ranging
changes well beyond the point I can stomach.

You can't possibly solve the lockdep problems here with a simple-minded
approach like your DRIVER_NORMAL, DRIVER_PARENT, etc. The device tree
is arbitrarily deep & wide, and there is at least one routine that
acquires the semaphores for _all_ the devices in the tree.

*blink* someone needs to take all locks - why?

It's for system suspend. All the devices are locked to prevent driver
binding and unbinding during the suspend transition.

Even apart from that, there are places where the USB core needs to
acquire multiple layers of locks (more than just two). For example, if
somebody has hubs nested several deep and then unplugs the hub closest
to the computer, this will happen.

Shucks, any time a driver's probe routine tries to register a child
device you will run into problems. probe() is always called with the
device locked, and registration of a child will acquire the child's
lock in order to probe drivers for the child.

This fact
alone seems to preclude using lockdep for device locks. (If there was
a form of mutex_lock() that bypassed the lockdep checks, you could use
it and avoid these issues.)

I'm sceptical of this, since its a simple tree (as opposed to a balanced
tree) a simple lock-coupling approach should be enough to guarantee
consistency.

I don't follow your reasoning. Let's say a task wants to lock all the
direct children of a particular device. In what order should the locks
be acquired? There's no natural tree-related ordering to follow.

In the simple case where locks are acquired along a path in the tree,
you could solve the lockdep issues by passing mutex_lock_nested() a key
equal to the device's depth in the tree. But that won't work with more
complicated cases.

Deadlock is a serious consideration. For the most part, routines
locking devices do so along a single path in the tree. For this simple
case the rule is: Never acquire a parent's lock while holding the
child's lock.

Sure, but once you have a parent's lock, you could unlock your
grandparent, no? (it having a locked child, your parent, should be
enough to guarantee its continued existence)

Of course. So what? In many cases the code needs to keep all three
locks. The locks don't merely guarantee the device structs' continued
existence (a kref could do that) -- they serialize a number of related
operations.

The routine that locks all the devices acquires the locks in order of
device registration. The idea here is that children are always
registered _after_ their parents, so this should be compatible with the
previous rule. But there is a potential problem: device_move() can
move an older child under a younger parent!

Seems like a weird rule, a typical tree locking rule would be to lock
them top-down - such a rule can easily cope with moves: lock old parent,
lock child, lock new parent, move child, unlock all in reverse order.

Yep. But this isn't a typical tree-locking. System suspend sometimes
requires that devices be powered down in a specific order, and there
are constraints not captured by the positions in the tree. We get
around this by powering devices down in reverse order of detection,
which should always be safe. Although the locking need not necessarily
follow these same constraints, it is certainly the easiest approach.

(And by the way, your example rule "lock old parent, lock child, lock
new parent" is subject to deadlocks. What if another task tries to
move a different device from under the new parent to the old parent at
the same time?)

Like said, I think the tree locking model should be revisited. A
top-down locking model with lock-coupling should, from my ignorant
perspective, solve your problems.

I don't understand what you mean by "lock-coupling".

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Need help about lock hierarchy design
    ... Suppose we have two types of objects: Parent and Child. ... But if locks to be nested then a lock ... unlock; // Unlock parent... ...
    (microsoft.public.win32.programmer.kernel)
  • Re: lock used in thread and by event
    ... First class has a method which use lockand then it put itself ... public Parent() ... lock ... public void CalledByEvent() ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: form field auto numbering with unique value
    ... i.e if you're on the child record and it's number is 1-1 then if you pop-up ... The big issue isn't the parent becuase I can get that and make it unqiue, ... and not modifying an existing one I can't lock it either. ... >> child record also having the parent assigment value. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: How do I know when a thread quits?
    ... parent.lock.acquire() do_something_with_vars_fromparent.lock.release# don't do anything with parents vars after releasing the lock ... taken away from it and given to the parent thread(note that the child thread hasn't quit yet). ... The parent thread which is waiting on the lock gets released and quits. ... Now the "still running" child thread tries to exit and based on my assumption attempts to call some cleanup func in some module which has been GC'ed due to the exit of the parent thread. ...
    (comp.lang.python)
  • Re: Collection Lock
    ... on a parent and it flows down to the child, you should not be able to change ... it on the child. ... it flows to all the child sites and shows up with a lock because ... >> the child site are lock. ...
    (microsoft.public.sms.misc)