Bug 89782 - Kernel crashes when using USB storage devices
Summary: Kernel crashes when using USB storage devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 9
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pete Zaitcev
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-04-28 09:02 UTC by fred-m
Modified: 2007-04-18 16:53 UTC (History)
0 users

Fixed In Version: 2.4.20-13.9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-05-16 07:47:23 UTC
Embargoed:


Attachments (Terms of Use)
Photo of first crash (276.46 KB, image/jpeg)
2003-04-28 09:14 UTC, fred-m
no flags Details
Photo of second crash (325.44 KB, image/jpeg)
2003-04-28 09:15 UTC, fred-m
no flags Details
Photo of third crash (310.17 KB, image/jpeg)
2003-04-28 09:23 UTC, fred-m
no flags Details

Description fred-m 2003-04-28 09:02:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U;) Gecko/20030131

Description of problem:
I am having kernel crashes when using USB DiskOnKey devices, with an application
that I use once per day almost every day. The crashes happen only with Red Hat 9
(all updates applied); Red Hat 8 worked fine with the same machine and devices
for months.

The application is a shell script that mounts the DiskOnKey and writes files on
it. Then it verifies the files with a sync in the background, waits for the
sync, and unmounts the DisOnKey. The kernel crashes at this point.

The interesting thing about it is that I get a message in /var/log/messages saying:

Apr 23 19:33:46 tornado kernel: usb.c: USB disconnect on device 00:07.2-2 address 3

(Needless to say, the device is plugged). Also, the data in the DiskOnKey is
*not* modified at all, as if the mount in the beginning of the process never
happened. (Not surprising perhaps -- the device is "disconnected". But so, how
is it possible that the script is running without any errors until the
verify/sync?).

(Notice that this has some similarities with bugs 85821 and 85822: the
disconnect is also there in bug 85821. I am reporting it as a new bug since bug
85821 seems to have been solved).

I was able to make the kernel crash three times today by trampling usb-storage
(nothing heavy -- plug/unplug and mount/unmount two DiskOnKeys plus a digital
camera, one at a time, and run things like the Bonnie disk benchmark, sync, or
df. Then repeat until it crashes). The photographs (attached files) have the
crash messages. These three seem to follow a pattern, but the call stack for the
crashes on the shell script above seemed different -- I'll attach a photograph
when I get one.

If you need more tests or data, just ask.

Version-Release number of selected component (if applicable):
kernel-2.4.20-9

How reproducible:
Sometimes

Steps to Reproduce:
See description above ("nothing heavy...").

Actual Results:  Kernel crashes.

Expected Results:  Kernel should have written the data correctly (to say the least).

Additional info:

Comment 1 fred-m 2003-04-28 09:14:55 UTC
Created attachment 91330 [details]
Photo of first crash

Comment 2 fred-m 2003-04-28 09:15:49 UTC
Created attachment 91331 [details]
Photo of second crash

Comment 3 fred-m 2003-04-28 09:23:09 UTC
Created attachment 91332 [details]
Photo of third crash

Comment 4 Pete Zaitcev 2003-04-28 18:27:39 UTC
Please do this on a working RHL8 box:

ls /proc/scsi/usb-storage*/*
cat /proc/scsi/usb-storage*/*

If the stick uses CBI transport, it's a known problem.
I am hunting for a test case to fix it.


Comment 5 fred-m 2003-04-30 06:52:28 UTC
Does it have to be RHL8? (None of these around now). I ran the commands on the
problematic RHL9 box, and:

fred@tornado[21]% ls /proc/scsi/usb-storage*/*
/proc/scsi/usb-storage-0/1
fred@tornado[22]% cat /proc/scsi/usb-storage-0/1 
   Host scsi1: usb-storage
       Vendor: M-Systems
      Product: DiskOnKey
Serial Number: 021E811619002DC6
     Protocol: Transparent SCSI
    Transport: Bulk
         GUID: 08ec0010021e811619002dc6
     Attached: Yes

(Is this CBI?). BTW, here is what usbview says about it:

DiskOnKey
Manufacturer: M-Systems
Serial Number: 021E811619002DC6
Speed: 12Mb/s (full)
USB Version:  1.10
Device Class: 00(>ifc )
Device Subclass: 00
Device Protocol: 00
Maximum Default Endpoint Size: 64
Number of Configurations: 1
Vendor Id: 08ec
Product Id: 0010
Revision Number:  2.00

Config Number: 1
	Number of Interfaces: 1
	Attributes: 80
	MaxPower Needed:  94mA

	Interface Number: 0
		Name: usb-storage
		Alternate Number: 0
		Class: 08(stor.) 
		Sub Class: 6
		Protocol: 50
		Number of Endpoints: 2

			Endpoint Address: 81
			Direction: in
			Attribute: 2
			Type: Bulk
			Max Packet Size: 64
			Interval: 0ms

			Endpoint Address: 01
			Direction: out
			Attribute: 2
			Type: Bulk
			Max Packet Size: 64
			Interval: 0ms

Comment 6 fred-m 2003-05-04 11:43:20 UTC
BTW, Pete, if you want me to test patches, or any kernel versions (say, 
Rawhide?), just tell me what to test.

Comment 7 fred-m 2003-05-07 07:04:46 UTC
I found a simple and reproductible way to crash the kernel:
(1) Boot the machine without the DiskOnKey attached
(2) Attach the DiskOnKey
(3) Run: "mount /mnt/diskonkey ; touch /mnt/diskonkey/xxx ; rm
/mnt/diskonkey/xxx ; df /mnt/diskonkey ; umount /mnt/diskonkey"
(4) Detach the DiskOnKey
(5) Repeat step 2
(6) Repeat step 3

Things work correctly until step 5, then the command line (not the kernel)
freezes after step 6 (df only prints the "Filesystem..." header and freezes).
Other terminals still work at this point (unless you run "sync", which freezes).
The kernel only crashes about 1 1/2 minute after step 6, (again, not right after
it). The crash message is similar to the one of the first attachment (id=91330).

If sync is used instead of df in steps 3 and 6, the kernel also crashes.
However, everything works correctly if touch and rm are not done (i.e., if no
data is written to the device?).

The problem happens with two different DiskOnKey devices (both work correctly
under RHL 8.0 and 7.3). I wasn't able to make it crash with the digital camera
(also a USB block device).

I'm *really* accepting suggestions of kernel versions or patches to test (or any
other tests). Thanks in advance.

Comment 8 fred-m 2003-05-10 07:18:31 UTC
I did some tests. The command line above crashes kernel-2.4.20-8 (so it's not 
linux-2.4.20-usb-storage.patch). The same command line works on Rawhide's 
kernel-2.4.20-1.1988 (2.4.21-rc1-ac1 plus patches).

On Wednesday I'll do some more tests (removing linux-2.4.20-usb.patch, trying 
2.4.20 vanilla, 2.4.21-rc1 and 2.4.21-rc1-ac1).k

Comment 9 fred-m 2003-05-15 04:13:09 UTC
Tested vanilla 2.4.20, and it crashes. So the problem is in 2.4.20, and
apparently has been corrected as of 2.4.21-rc1-ac1. Which means I'll be using
rawhide kernels for a while.

Comment 10 fred-m 2003-05-16 07:47:23 UTC
The problem was corrected in the kernel-2.4.20-13.9 update (quite probably by
the "bugfixes from 2.4.21-rc1-ac3" in it). Thanks!!!

Closing.

Comment 11 Pete Zaitcev 2003-05-16 09:08:16 UTC
Yes, it's an upstream fix.




Note You need to log in before you can comment on or make changes to this bug.