From Bugzilla Helper: User-Agent: Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U;) Gecko/20030131 Description of problem: I am having kernel crashes when using USB DiskOnKey devices, with an application that I use once per day almost every day. The crashes happen only with Red Hat 9 (all updates applied); Red Hat 8 worked fine with the same machine and devices for months. The application is a shell script that mounts the DiskOnKey and writes files on it. Then it verifies the files with a sync in the background, waits for the sync, and unmounts the DisOnKey. The kernel crashes at this point. The interesting thing about it is that I get a message in /var/log/messages saying: Apr 23 19:33:46 tornado kernel: usb.c: USB disconnect on device 00:07.2-2 address 3 (Needless to say, the device is plugged). Also, the data in the DiskOnKey is *not* modified at all, as if the mount in the beginning of the process never happened. (Not surprising perhaps -- the device is "disconnected". But so, how is it possible that the script is running without any errors until the verify/sync?). (Notice that this has some similarities with bugs 85821 and 85822: the disconnect is also there in bug 85821. I am reporting it as a new bug since bug 85821 seems to have been solved). I was able to make the kernel crash three times today by trampling usb-storage (nothing heavy -- plug/unplug and mount/unmount two DiskOnKeys plus a digital camera, one at a time, and run things like the Bonnie disk benchmark, sync, or df. Then repeat until it crashes). The photographs (attached files) have the crash messages. These three seem to follow a pattern, but the call stack for the crashes on the shell script above seemed different -- I'll attach a photograph when I get one. If you need more tests or data, just ask. Version-Release number of selected component (if applicable): kernel-2.4.20-9 How reproducible: Sometimes Steps to Reproduce: See description above ("nothing heavy..."). Actual Results: Kernel crashes. Expected Results: Kernel should have written the data correctly (to say the least). Additional info:
Created attachment 91330 [details] Photo of first crash
Created attachment 91331 [details] Photo of second crash
Created attachment 91332 [details] Photo of third crash
Please do this on a working RHL8 box: ls /proc/scsi/usb-storage*/* cat /proc/scsi/usb-storage*/* If the stick uses CBI transport, it's a known problem. I am hunting for a test case to fix it.
Does it have to be RHL8? (None of these around now). I ran the commands on the problematic RHL9 box, and: fred@tornado[21]% ls /proc/scsi/usb-storage*/* /proc/scsi/usb-storage-0/1 fred@tornado[22]% cat /proc/scsi/usb-storage-0/1 Host scsi1: usb-storage Vendor: M-Systems Product: DiskOnKey Serial Number: 021E811619002DC6 Protocol: Transparent SCSI Transport: Bulk GUID: 08ec0010021e811619002dc6 Attached: Yes (Is this CBI?). BTW, here is what usbview says about it: DiskOnKey Manufacturer: M-Systems Serial Number: 021E811619002DC6 Speed: 12Mb/s (full) USB Version: 1.10 Device Class: 00(>ifc ) Device Subclass: 00 Device Protocol: 00 Maximum Default Endpoint Size: 64 Number of Configurations: 1 Vendor Id: 08ec Product Id: 0010 Revision Number: 2.00 Config Number: 1 Number of Interfaces: 1 Attributes: 80 MaxPower Needed: 94mA Interface Number: 0 Name: usb-storage Alternate Number: 0 Class: 08(stor.) Sub Class: 6 Protocol: 50 Number of Endpoints: 2 Endpoint Address: 81 Direction: in Attribute: 2 Type: Bulk Max Packet Size: 64 Interval: 0ms Endpoint Address: 01 Direction: out Attribute: 2 Type: Bulk Max Packet Size: 64 Interval: 0ms
BTW, Pete, if you want me to test patches, or any kernel versions (say, Rawhide?), just tell me what to test.
I found a simple and reproductible way to crash the kernel: (1) Boot the machine without the DiskOnKey attached (2) Attach the DiskOnKey (3) Run: "mount /mnt/diskonkey ; touch /mnt/diskonkey/xxx ; rm /mnt/diskonkey/xxx ; df /mnt/diskonkey ; umount /mnt/diskonkey" (4) Detach the DiskOnKey (5) Repeat step 2 (6) Repeat step 3 Things work correctly until step 5, then the command line (not the kernel) freezes after step 6 (df only prints the "Filesystem..." header and freezes). Other terminals still work at this point (unless you run "sync", which freezes). The kernel only crashes about 1 1/2 minute after step 6, (again, not right after it). The crash message is similar to the one of the first attachment (id=91330). If sync is used instead of df in steps 3 and 6, the kernel also crashes. However, everything works correctly if touch and rm are not done (i.e., if no data is written to the device?). The problem happens with two different DiskOnKey devices (both work correctly under RHL 8.0 and 7.3). I wasn't able to make it crash with the digital camera (also a USB block device). I'm *really* accepting suggestions of kernel versions or patches to test (or any other tests). Thanks in advance.
I did some tests. The command line above crashes kernel-2.4.20-8 (so it's not linux-2.4.20-usb-storage.patch). The same command line works on Rawhide's kernel-2.4.20-1.1988 (2.4.21-rc1-ac1 plus patches). On Wednesday I'll do some more tests (removing linux-2.4.20-usb.patch, trying 2.4.20 vanilla, 2.4.21-rc1 and 2.4.21-rc1-ac1).k
Tested vanilla 2.4.20, and it crashes. So the problem is in 2.4.20, and apparently has been corrected as of 2.4.21-rc1-ac1. Which means I'll be using rawhide kernels for a while.
The problem was corrected in the kernel-2.4.20-13.9 update (quite probably by the "bugfixes from 2.4.21-rc1-ac3" in it). Thanks!!! Closing.
Yes, it's an upstream fix.