Re: [2.6.30-rc2] usb reset during big file transfer and ext3 error



Hi, Alan.

Sorry for the late reply, but I had some problems with an HD of mine
giving me trouble. :-( Of course, I have backups. :-)

On Apr 22 2009, Alan Stern wrote:
On Wed, 22 Apr 2009, Rogério Brito wrote:
Is there any way of controlling the number of retries in the host
controller? Or, perhaps, of controlling the time between retries so
that the device can shape it up again?

It's not all that simple. The host controller allows the OS to set the
number of hardware retries to 1, 2, 3, or unlimited. Linux uses 3;
those XactErr debugging messages in your log show that the driver was
extending the number of retries in software.

Right. I didn't know that. Obviously, setting it to unlimited can give
undefined behavior of the computer.

It's not possible to change the time interval between retries done by
the hardware. While it is possible in theory to change the interval
between retries done by the driver, it would be rather difficult and
so ehci-hcd doesn't attempt it.

Oh, what a pity. It seems that the device at hand sort of gets in shape
again after some time, since I have an automounter here and the device
nodes appear again under dev and it auto-mounts the device at the
appropriate mount point. Weird.

The software retries were introduced to solve one particular problem:
Many EHCI controllers will generate a transaction error if a data
transfer is occurring on one port at the same time as a device is
being unplugged on another port.

Right. I just got myself a (non powered) USB hub and I noticed one thing
(unrelated to this problem): if I plug a USB disk to this hub and, then,
plug a printer, very weird things happen, like the disc being unmounted
or things like that.

I can give you precise details of what happens here, if you're
interested.

OTOH, I think that I may be seeing some other problems with a pen drive
being connected to a port of this machine I'm typing this message on. I
will try to compile a newer kernel, now that -rc4 is released and I
would appreciate if you could help me with the issues that I'm seeing.

This is clearly a hardware bug, and the software retries were intended
to work around it. In practice only a couple of software retries are
needed; if the transfer hasn't succeeded by that point then it's never
going to succeed. I set the upper limit to 32 retries just to be
conservative.

OK. Thanks for the nice and clear explanation of the problem. I only
wonder why I not seeing these errors on other machines while I *do* see
them on other machines (this one is an intel ICH5).

If transaction errors aren't caused by noise in the cable then they
are almost always caused by bugs or failures in the device.

I will try again with a shorter and newer cable. Let's see how that
works. BTW, is there any way to check the quality of a cable? I have a
multimeter here and I would be willing to do some extensive tests.
Testing the USB enclosure is also pretty feasible.

Once a device's firmware has crashed, it doesn't magically fix itself.

Oh, what a pity that it doesn't recovers itself with a watchdog-like
mechanism.


Thanks for all your help, Rogério.

--
Rogério Brito : rbrito@{mackenzie,ime.usp}.br : GPG key 1024D/7C2CAEB8
http://www.ime.usp.br/~rbrito : http://meusite.mackenzie.com.br/rbrito
Projects: algorithms.berlios.de : lame.sf.net : vrms.alioth.debian.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages


Loading