Bug 130326
Summary: | usb-storage fails under load with external USB HDD | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Luke Hutchison <luke.hutch> | ||||
Component: | kernel | Assignee: | Pete Zaitcev <zaitcev> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2 | CC: | bugzilla, byte, davej, greymane, jclcheng, rbremer, wtogami | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-04-16 04:47:57 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Luke Hutchison
2004-08-19 04:24:44 UTC
Created attachment 102865 [details]
The relevant part of /var/log/messages
There is some junk in the log due to the fact that hald was not running, and
/dev/sda is a memorystick slot, which did not have media in it.
Actually the data was not corrupted, it seems that the files that it said had copied OK never ended up on the disk. (May be an ext3 journalling side effect, it looks like the disk was automatically checked next time it was mounted.) --- linux-2.6.8-rc4-mm1/drivers/usb/storage/usb.c 2004-08-16 12:13:06.000000000 -0700 +++ linux-2.6.8-rc4-mm1-ub/drivers/usb/storage/usb.c 2004-08-18 23:48:09.335107648 -0700 @@ -285,7 +285,7 @@ static int usb_stor_control_thread(void */ daemonize("usb-storage"); - current->flags |= PF_NOFREEZE; + current->flags |= PF_NOFREEZE|PF_MEMALLOC; unlock_kernel(); Looks clear like day to me, but maybe I'm missing something. OK, we're having a controversy among VM gurus. In theory (according to Hugh, anyway), usb-storage thread uses GFP_NOIO for its allocations on 2.6 and it ought to be sufficient without PF_MEMALLOC trick as in 2.4. Was my assessment of the problem correct -- that this is indeed a problem with usb-storage? Could it be happening at a lower level than usb-storage, e.g. in the USB subsystem? How does the drive number/letter keep increasing when this happens? Why does the drive letter change survive a power cycle of both the computer and the drive? There is such problem with ehci_hcd module, i.e. USB 2.0, but with uhci_hcd (USB 1.1) is OK at all. There is such problem with ehci_hcd module, i.e. USB 2.0, but with uhci_hcd (USB 1.1) is OK at all. Alexandr: you mean you've seen this problem yourself? I have an interesting situation in that my setup: Machine: USB-1.1 only (I have AMD760 + Via 82C686B) Hub: USB-2.0 Drive: USB-2.0 So I'm guessing that the Machine<->Hub link was falling back to USB-1.1. I have one of these "mobile disk" enclosures as well and am suffering problems as well. The chipset in this enclosure is from Prolific Technologies Inc PL-3507. Searching around on the web has found others with similar (although some only intermittant) problems. Turning on verbose USB debugging shows the problem is in usb- storage. Any time a substantial block of data is moved to the drive ("substantial" seems to vary considerably though) usb-storage hangs until it eventually comes back with a status code (it varies but I know I've seen statu code-71 and -104) then returns "unknown error" and resets itself and tries to repeat the transfer. Similar problems are being reported by those using the firewire port although I haven't tried that myself yet. If you would like to see an example of the errors there is a posting at http://www.uwsg.iu.edu/hypermail/linux/kernel/0402.2/1203.html with specifics. this fixed in the 2.6.9 errata kernel ? Well, if you applied Riel's I/O throttling patch, then it may be fixed. Otherwise, nope... The litmus test is: grep throttle_vm_writeout mm/page-writeback.c Give me exact release of this "2.6.9 errata" and I'll do prep and grep. I think I've observed this too, connecting to an Intel USB port (I'll check which ?hci driver later). Same PL-3507 USB/1394->ATA bridge chipset. Firmware updates *may* help (do so at your own risk, though - <http://member.newsguy.com/~siccos/PL3507%20Firmware.htm>). I can't flash my firmware as the flash memories in my caddies are dual-voltage and cannot be reprogrammed in-circuit. Alex -- thanks for the info. I don't know if I will risk flashing the drive. (For now I'm just not using it.) I'm sure in the long run it would be better to find a workaround. Pete Z, how does the I/O throttling patch work? Is Riel's patch designed to fix a generic kernel problem, or is this specific to the problem experienced by the PL-3507 chipset? Your comment #11 seems to suggest it is generic, but Alex's comment #13 seems to indicate this is a common problem with a specific chipset. As an aside, after reading about the problems with >128KB transfers over the 1394 interface of the PL-3507, and >64 (or >128) sector transfers with the Genesys bridges (see /drivers/usb/storage/scsiglue.c), I thought I would experiment to see if tweaking max_sectors via sysfs would help with the PL-3507 also. I tweaked it from the default of 256 (for non-Genesys devices) down to 64, did a badblocks read test, then 128 and badblocks again, then 256 and badblocks again. I was expecting 128 and/or 64 to complete successfully and 256 to fail. Instead, all tests worked. Was my testing methodology legitimate? Oh, I'm using the latest FC3 kernel (2.6.9-1.681), BTW, so this isn't fixed there. :-/ To the USB storage gurus: is there any way that the order of loading the ?hci_hcd modules could make any difference to behaviour? I say this, because since switching to loading ehci_hcd last instead of first, my problems seem to have gone away. Of course, this could just be coincidence. :-/ From /etc/modprobe.conf: #alias usb-controller ehci-hcd #alias usb-controller1 ohci-hcd #alias usb-controller2 uhci-hcd alias usb-controller ohci-hcd alias usb-controller1 uhci-hcd alias usb-controller2 ehci-hcd To Luke: give this a try and see if it makes any difference for you... Alex: I tried loading ehci_hcd last, but the problem still exists for me. I'm also using kernel-2.6.9-1.681_FC3. Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you. Does anyone else on the list have a rig they can test this on, with either FC3 or FC4-development? My drive enclosure doesn't currently have a drive in it. I suspect the problem is not yet fixed, since this is not an FC2-specific problem. I am reasonably confident that original problem with the write not throttled is alleviated in FC4, if not fixed completely. However, people suffering from entirely different causes attached to the bug. Just file new bugs if you hit problems. It's 100 times easier to dup bugs than to split them. The bug's already closed, but I just wanted to add this comment: I have been using the same drive enclosure with recent 2.6.13 kernels in FC5-development, and can report that I haven't seen this problem this time around, so it does appear fixed. Thanks again. |