Bug 119585 - (USB SCSI)empty CF reader causes hang on boot (kudzu)
(USB SCSI)empty CF reader causes hang on boot (kudzu)
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i586 Linux
medium Severity high
: ---
: ---
Assigned To: Pete Zaitcev
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-03-31 12:21 EST by Dave Goldblatt
Modified: 2007-11-30 17:10 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-04 08:45:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sysrq-t trace (120.78 KB, text/plain)
2004-04-21 12:41 EDT, Dave Goldblatt
no flags Details
trace from runlevel 3 (51.00 KB, text/plain)
2004-04-22 19:26 EDT, Dave Goldblatt
no flags Details

  None (edit)
Description Dave Goldblatt 2004-03-31 12:21:38 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040312

Description of problem:
If I have a USB CompactFlash reader attached during boot the system
does not come up; further investigation showed I can reproduce this
(almost 100% of the time) manually by running kudzu.  Detaching the
USB device continues the boot process/causes kudzu to exit.

Version-Release number of selected component (if applicable):
kudzu-1.1.53-1

How reproducible:
Always

Steps to Reproduce:
Run kudzu with the empty CF reader attached.  strace of kudzu yields:

[...]
open("/proc/scsi/usb-storage-0",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file
or directory)
open("/etc/modprobe.conf", O_RDONLY)    = 8
stat64("/etc/modprobe.conf", {st_mode=S_IFREG|0644, st_size=270, ...}) = 0
read(8, "alias eth0 b44\n\ninclude /etc/mod"..., 270) = 270
close(8)                                = 0
access("/proc/scsi/scsi", R_OK)         = 0
open("/proc/scsi/scsi", O_RDONLY)       = 8
read(8,
[process hangs]

unplugging the device continues the output:
 "Attached devices:\nHost: scsi3 Ch"..., 16384) = 176
brk(0)                                  = 0x95ab000
brk(0x95ce000)                          = 0x95ce000
read(8, "", 16384)                      = 0
close(8)                                = 0
stat64("/proc/scsi/usb-storage-0", 0xfef3310c) = -1 ENOENT (No such
file or directory)
[...]

Additional info:

Reproduced with kernel 2.6.3-2.1.253 and 2.6.4-1.298.
Comment 1 Dave Goldblatt 2004-03-31 12:26:14 EST
Verified boot is successful with a CF present in the reader.

Output from /proc/bus/usb/devices for the CF reader:
T:  Bus=04 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0b30 ProdID=0005 Rev= 0.70
S:  Manufacturer=CF Media-Shuttle
S:  Product=CF Media-Shuttle
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=100mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
 
Comment 2 Pete Zaitcev 2004-04-14 19:13:20 EDT
It would be helpful to get an output of SysRq-T.
It's going to be big, please do not drop it into the
comments box, but attach it to the bug instead.

Also, I'd like to see /proc/scsi/usb-storage-*/*
(to get actual protocol and transport)
Comment 3 Pete Zaitcev 2004-04-20 00:53:33 EDT
So, about that SysRq-T output... ? Needinfo-ing.
Comment 4 Dave Goldblatt 2004-04-20 14:01:03 EDT
Sorry for the delay.  Can't do the SysRq-T, since magic SysRq isn't in
the stock kernels.  Still fails with .332.

As for the other:

$ cat /proc/scsi/usb-storage/0
   Host scsi0: usb-storage
       Vendor: CF Media-Shuttle
      Product: CF Media-Shuttle
Serial Number: None
     Protocol: Transparent SCSI
    Transport: Bulk
       Quirks:

Comment 5 Pete Zaitcev 2004-04-20 15:03:32 EDT
SysRq is always there, just has to be enabled with
echo > /proc/sys/kernel/sysrq
Comment 6 Dave Goldblatt 2004-04-20 21:34:28 EDT
Yeah, except this is on boot..

I guess I can stuff it in an rc file - any suggestions where?
Comment 7 Pete Zaitcev 2004-04-20 21:45:40 EDT
So, what about this:
"further investigation showed I can reproduce this
(almost 100% of the time) manually by running kudzu."

In any case, it is set in /etc/sysctl.conf.
A good alternative is to do "echo t > /proc/sysrq-trigger"
with the same trick you used to run strace on kudzu.
Comment 8 Dave Goldblatt 2004-04-20 22:38:06 EDT
My bad - should have gone back to reread my original description.  I'll try try repro it 
tomorrow.
Comment 9 Dave Goldblatt 2004-04-21 12:41:28 EDT
Created attachment 99606 [details]
sysrq-t trace
Comment 10 Pete Zaitcev 2004-04-21 19:37:26 EDT
Dave, thanks, that's a good material. However, I suspect a key
process was lost (usb-storage or scsi_eh_X), and this is why.
The reading from the /proc/scsi/scsi hungs
on a semaphore (bus->subsys.rwsem), which is apparently taken
by the khubd, but what is holding khubd? It's a USB storage.
Now, we do have one usb-storage thread, which is idle. The
corresponding SCSI helper, however, has #2. The question is
where are the other two?

It is possible to race something, or to forget a callback,
and cause khubd to hang, without having extra usb-storage
threads. But I cannot be sure what's going on.

Please, retry the capture, but reduce the number of processes
to minimum. Above all, do not do it under X. The best is
to have extra serives stopped as well.
Comment 11 Pete Zaitcev 2004-04-21 20:01:03 EDT
Oh, and another thing. Please let it simmer for 100 seconds
before doing anything.

        do {
.......
        } while (spintime &&
                 time_after(spintime_value + 100 * HZ, jiffies));
Comment 12 Dave Goldblatt 2004-04-22 19:25:09 EDT
For what it's worth:

% time kudzu
real    3m33.575s
user    0m0.372s
sys     0m0.044s
 
Comment 13 Dave Goldblatt 2004-04-22 19:26:53 EDT
Created attachment 99641 [details]
trace from runlevel 3
Comment 14 Pete Zaitcev 2004-08-18 00:17:17 EDT
OK. The cause of this is more or less clear, especially with
comment #13 we see how kudzu gets stuck trying to spin up the thing.
Now, the fix is not as clear... I really do not look forward to
touching the logics around START_STOP_UNIT.

BTW, is it all the same with 2.6.7-1.494.2.2 ?
Comment 15 Dave Goldblatt 2004-08-20 14:21:43 EDT
Still occurs with 2.6.8-1.524.
Comment 16 Dave Jones 2004-12-07 01:01:19 EST
and 2.6.9 ?
Comment 17 Dave Goldblatt 2004-12-20 13:06:30 EST
Still occurs with 2.6.9-1.681_FC3.
Comment 18 Dave Jones 2005-04-16 01:00:27 EDT
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.
Comment 19 Dave Goldblatt 2005-04-19 23:18:49 EDT
as per my last update, it continued to occur with FC3.
Comment 20 Pete Zaitcev 2005-05-13 22:46:23 EDT
OK, I see that generic solution is not likely to emerge here.
We knew that START_STOP fails in April of 2004. We have to find some
sort of device-specific workaround.

Dave, how averse are you to building kernels from source? I'll need
instrumentation.
Comment 21 Dave Goldblatt 2005-05-14 00:13:23 EDT
No problem with testing kernels -- but hey, if it's easier, I can send you the
device in question.
Comment 22 Pete Zaitcev 2005-05-14 01:30:42 EDT
- working out of band with the requestor, will update.
Comment 23 Pete Zaitcev 2005-07-06 03:08:50 EDT
I requested Dave to get CONFIG_USB_STORAGE_DEBUG back in May by e-mail,
and received no reply. Fell through cracks?
Comment 24 Dave Jones 2005-07-15 15:03:29 EDT
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.
Comment 25 Dave Goldblatt 2005-07-18 16:04:23 EDT
Still occurs with FC3 2.6.12-1.1372.

Will talk to Pete offline about the hardware.
Comment 26 Pete Zaitcev 2005-09-17 00:53:08 EDT
Dave, how about FC4? That one has SCSI scanning on a separate thread.
If nothing else, it would permit to boot without delay.
Comment 27 charles harris 2005-10-19 19:14:46 EDT
This sounds similar to the problem I am having with the usb ports on a Dell 2405
widescreen monitor. The monitor has two builtin usb ports and a cf flash card
reader. When I boot with the monitor usb connected the system hangs while
scanning hardware. Booting with the ports unconnected and then plugging in the
usb cable works fine. I nearly went bald figuring out how to get the machine to
boot with my shiny new monitor! Anyway, I can try some other tests if that would
be helpful.

PS. Fails for both fc3 and fc4, latest kernels.
Comment 28 Dave Jones 2006-01-16 17:29:00 EST
This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.
Comment 29 Dave Jones 2006-02-03 01:11:21 EST
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.
Comment 30 John Thacker 2006-05-04 08:45:30 EDT
Still inactive, no response.  Closing per previous comment.

Note You need to log in before you can comment on or make changes to this bug.