Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
- From: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
- Date: Mon, 20 Nov 2006 14:31:39 -0500 (EST)
On Sun, 19 Nov 2006, Stefan Richter wrote:
Knowing nothing at all about ieee1394, I get the feeling that the culprit
here is a strange subsystem design. In fact, I don't understand exactly
what's going wrong. Evidently the rmmod thread owns the locks for both
the host being removed and its parent, and it wants to stop knodemgrd,
which is waiting to acquire the host's parent's lock because it is
attempting to rescan the parent. Is that right?
That's right.
It doesn't make sense. If knodemgrd is rescanning the parent then the
parent must not have a driver. If it doesn't have a driver, how can it
have children?
Well, I don't fully understand the reasoning behind it. (But even less
do I grasp the driver core.) Short story: It may indeed be possible to
get rid of the parent relationship to the host device and of the
rescanning of the host device. Long story:
Wait a minute. Above you agreed that the problem was caused by knodemgrd
attempting to rescan the host's _parent_. So which is the focus of the
deadlock: the host or its parent?
ieee1394 has:
- struct hpsb_host (contains a struct device and a struct class_device)
ieee1394's nodemgr in addition has:
- struct node_entry (ditto)
- struct unit_directory (ditto)
nodemgr also provides different .release callbacks for each of these
devices and class_devices, an .uevent callback for unit_directory
class_devices and the struct bus_type for the three kinds of device.
All of the existing protocol drivers bind only to unit_directories, so
these are the most important ones as far as driver matching is
concerned. (There is only a little out-of-tree driver, mem1394 for
forensics, which binds differently because it doesn't require any actual
protocol on a remote node.)
A hpsb_host is merely access to controllers; each one controls a bus. On
a bus live a few node_entries, as many as there are nodes with an active
link layer controller. This includes the local host, therefore there is
always at least one node_entry per bus. A node may implement 0, 1 or
more units and presents them in a hierarchical manner. A unit talks a
protocol; and our protocol drivers access such a unit and, if the
protocol has this necessity, the unit's node. (While doing so, it also
accesses the hpsb_host in front of the node.)
If support for eth1394 is configured in, our drivers add a unit to each
of the local nodes which indicate IP over 1394 capability to external
nodes. This unit_directory also matches the eth1394 when the driver core
scans the ieee1394_bus_type.
So in short, I think we could actually live with a minimum sysfs
implementation which exposes only unit directories. (But I may be
missing something --- I'm not quite familiar with the sysfs interfacing
tools, that is udev and hald.) At least device file naming of SBP-2
units wouldn't suffer because udev takes the unique name of SBP-2 units
from a sysfs attribute of the SCSI device, not from the unit_directory's
device.
The /sys/class/net/eth?/device links of eth1394 class devices currently
point to the hpsb_host devices i.e. fw-host devices; maybe they could as
well point to respective unit directories.
So maybe a simple, flat sysfs representation without node_entry devices,
perhaps also without fw-host devices --- or at least a representation
(1) without parent relationship to the fw-host devices and (2) without
ieee1394_bus_type in fw-host devices --- is possible without breaking
userspace. It probably would eliminate the problem which we are
discussing here. This would mean that we loose the ability to bind
protocol drivers to a hpsb_host (and depending on how far the
respresentation is cut down, also the ability to bind protocol drivers
to a node_entry). But as I said I see no actual need for this ability.
If I understand all this, you've got an hpsb_host, directly below which
are one or more node_entry's, below each of which may be some
unit_directory's. Right?
But how is this relevant if the problem is caused by knodemgrd trying to
rescan the hpsb_host's parent?
Is the problem caused by the fact that some of these struct device's
aren't bound to a driver? Remember, bus_rescan_devices() will skip over
anything that already has a driver. Could you solve your problem by
adding a do_nothing driver that would bind to these otherwise unused
devices?
Alan Stern
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
- From: Stefan Richter
- Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
- References:
- Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
- From: Stefan Richter
- Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
- Prev by Date: Re: [PATCH 3/4] atl1: Main C file for Attansic L1 driver
- Next by Date: [PATCH 2.6.19-rc6] netfilter: fix panic on ip_conntrack_h323 with CONFIG_IP_NF_CT_ACCT
- Previous by thread: Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
- Next by thread: Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
- Index(es):
Relevant Pages
|