Bug 37437 - USB failure requiring reboot
USB failure requiring reboot
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Pete Zaitcev
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-04-24 11:36 EDT by Derek Price
Modified: 2007-04-18 12:32 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-05-01 19:41:39 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Derek Price 2001-04-24 11:36:32 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2 i686)


USB failed.  Unplugging the hub and replugging it in failed to reaquire the
devices and I was forced to reboot to use my keyboard again.  I found the
following messages in syslog:

Apr 24 00:59:50 empress kernel: usb-uhci.c: interrupt, status 3, frame#
1204
Apr 24 00:59:50 empress kernel: usb-uhci.c: interrupt, status 3, frame#
1216

and so on.  There were several hundred of these messages with steadily
increasing frame numbers.  They reset back to near zero at least once, from
2044.  Then the following appears:

Apr 24 00:59:50 empress kernel: usb.c: USB disconnect on device 67
Apr 24 00:59:50 empress kernel: usb.c: USB disconnect on device 68
Apr 24 00:59:50 empress kernel: usb-uhci.c: interrupt, status 3, frame#
1228
Apr 24 00:59:50 empress kernel: keybdev.c: Removing keyboard: input0
Apr 24 00:59:50 empress kernel: keybdev.c: Removing keyboard: input1
Apr 24 00:59:50 empress kernel: usb.c: USB disconnect on device 69
Apr 24 00:59:50 empress kernel: usb.c: USB disconnect on device 71
Apr 24 00:59:50 empress kernel: hub.c: error resetting hub 66 -
disconnecting
Apr 24 00:59:50 empress kernel: usb.c: USB disconnect on device 66


Reproducible: Didn't try
Steps to Reproduce:
1.  Unknown.  Just left my system running this time.
2.
3.
Comment 1 Arjan van de Ven 2001-04-24 11:41:01 EDT
Pete: does this look familiar?
Comment 2 Pete Zaitcev 2001-04-24 12:21:11 EDT
The messages were saved which was a good first step.
It allows me to speculate a bit, if nothing else.

First note is that message "too many bad statuses" is missing,
so, perhaps, there were less than 1000 "status 3" messages.

Secondly, I have observed a problem where pulling a power
on a hub would make it unuseable, on a stock 2.4.3 kernel.
The problem was discussed with USB maintainer and is fixed
(missing assignment to urb->dev for re-used URB in usb_hub_reset()).
However, I cannot say for sure that it is what we see here.
The test would be to insert a custom printout in usb_hub_reset().

One thing that would be incredibly helpful in such case
is to have an ssh connection or a serial terminal ready,
then run "ps alx" and note the state of khubd. If khubd
gets stuck in "D" state, then we know what it is
(recursive lock in usb_hub_disconnect()), and more:
we know that it is fixed in errata.

In the end, I have a couple of candidate problems,
but I cannot diagnose the problem completely.
Comment 3 Derek Price 2001-04-25 12:08:14 EDT
This turns out to be reproducible and has happened three days in a row now.  I
will post the result of `ps alx |grep khubd' next time it happens.
Comment 4 Derek Price 2001-04-26 09:28:34 EDT
I think the answer is yes, khubd is stuck in the 'D' state:

[oberon@empress oberon]$ cat khubd.state 
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND
040     0    60     1   9   0     0    0 down   DW   ?          0:00 [khubd]
000   500 25734 25638  11   0  1624  608 pipe_w S    pts/15     0:00 grep khubd

Comment 5 Pete Zaitcev 2001-04-26 13:25:03 EDT
In case of khubd stuck, I think I know what it is.
The fix is in the errata CVS.

The problem was introduced when I fixed bug #29102. In the
following fix, order of up(&hub->khubd_sem) and usb_hub_disconnect()
is reversed (around line 728):
https://bugzilla.redhat.com/bugzilla/showattachment.cgi?attach_id=12162
This causes a lockup of khubd when it attempts a recursive lock.

USB maintainer resolved it differently, but for the
sake of stability we have code that just swaps calls
in 7.1-errata.

If an immediate resolution is required, the requestor would need
to download a 7.1 kernel SRPM, swap calls mentioned above, and
rebuild the kernel.
Comment 6 Derek Price 2001-04-27 08:41:33 EDT
Well, I swapped the two lines and recompiled the kernel from SRPMS with no other
changes.  Now USB shuts off quicker and attempts to remount the devices after a
reconnect, but it still fails:

Apr 27 08:31:15 empress kernel: hub.c: USB new device connect on bus1/2,
assigned device number 16
Apr 27 08:31:18 empress kernel: usb_control/bulk_msg: timeout
Apr 27 08:31:18 empress kernel: usb.c: USB device not accepting new address=16
(error=-110)
Apr 27 08:31:18 empress kernel: hub.c: USB new device connect on bus1/2,
assigned device number 17
Apr 27 08:31:21 empress kernel: usb_control/bulk_msg: timeout
Apr 27 08:31:21 empress kernel: usb.c: USB device not accepting new address=17
(error=-110)

I'm going back to the stock kernel since crashes once or twice a day are
preferrable to once or twice an hour (the USB shut off twice, I think - I may
have cycle the hub too quickly the first time - the second time it wouldn't come
back with the above error messages).
Comment 7 Derek Price 2001-04-27 08:42:46 EDT
Forgot to mention, I checked and khubd isn't getting stuck in the 'D' state
anymore.
Comment 8 Pete Zaitcev 2001-04-30 14:05:38 EDT
Update: the requestor (his name is Derek) was willing to
test source level updates and we started with fixes that
were included into the errata.

khubd lockup was fixed with swapping of locking and disconnect,
but the hub does not come up yet.

We are testing the forgotten assignment.
--- linux-2.4.3/drivers/usb/hub.c       Sun Mar 25 18:14:21 2001
+++ linux-2.4.3-nfs/drivers/usb/hub.c   Thu Apr 12 19:37:38 2001
@@ -406,4 +414,7
 	if (usb_reset_device(dev))
 		return -1;
 
+	hub->urb->dev = dev;
 	if (usb_submit_urb(hub->urb))
 		return -1;
 
Comment 9 Derek Price 2001-05-01 18:01:26 EDT
I applied the fix Pete gave me and USB has not shut off in three or four days. 
The machine has been running constantly for this time.
Comment 10 Pete Zaitcev 2001-05-01 19:41:34 EDT
I am happy to hear that our upcoming errata is going
to be good in this respect.

I'll keep the bug open until we know that it is fixed
in the field.

BTW, Account <oberon@umich.edu> produces bounces with
"User oberon OVER QUOTA". Perhaps Derek would want
to create a new Bugzilla account with working e-mail.
Comment 11 Pete Zaitcev 2001-06-25 16:42:56 EDT
Derek, errata 2.4.3-12 with fixes is out.
Reopen if it fails your KVM torture test.
ftp://ftp.redhat.com/pub/redhat/linux/updates/7.1/en/os/

Note You need to log in before you can comment on or make changes to this bug.