Bug 165453 - Panic after ENXIO with usb-uhci
Panic after ENXIO with usb-uhci
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Pete Zaitcev
Brian Brock
Depends On:
Blocks: 168424
  Show dependency treegraph
Reported: 2005-08-09 11:33 EDT by Bastien Nocera
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-03-15 11:22:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
oops-multitech1.txt (33.36 KB, text/plain)
2005-08-09 11:33 EDT, Bastien Nocera
no flags Details
oops-multitech2.txt (35.07 KB, text/plain)
2005-08-09 11:34 EDT, Bastien Nocera
no flags Details
Candidate #1 - backport from 2.6 (2.17 KB, patch)
2005-08-11 04:09 EDT, Pete Zaitcev
no flags Details | Diff

  None (edit)
Description Bastien Nocera 2005-08-09 11:33:13 EDT
While using a Multitech MT5634ZBA V92 modem, and after some ENXIO errors in the
log files.
Panics attached below.
Comment 1 Bastien Nocera 2005-08-09 11:33:13 EDT
Created attachment 117576 [details]
Comment 2 Bastien Nocera 2005-08-09 11:34:13 EDT
Created attachment 117577 [details]
Comment 3 Pete Zaitcev 2005-08-10 20:57:38 EDT
Hmm. This is something that my fixes in 2.4.21-31.EL.usbserial.4 are not
likely to fix.

The ENXIO is a good clue. It happens upon disconnect, before the disconnect
method had a chance to run (either real disconnect, or just the device
giving up the ghost).

What did you actually do before getting the oops? I need to recreate this
Comment 4 Pete Zaitcev 2005-08-10 21:47:01 EDT
I happen to have a Multitech, and it actually does have endpoint 0x86,
believe it or not:

T:  Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=12  MxCh= 0
D:  Ver= 1.00 Cls=02(comm.) Sub=00 Prot=00 MxPS= 8 #Cfgs=  2
P:  Vendor=06e0 ProdID=f107 Rev= 1.00
S:  Manufacturer=Multi-Tech Systems, Inc.
S:  Product=MultiModemUSB
C:  #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=400mA
I:  If#= 0 Alt= 0 #EPs= 0 Cls=ff(vend.) Sub=ff Prot=ff Driver=
I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=
E:  Ad=02(O) Atr=02(Bulk) MxPS=  16 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  63 Ivl=2ms
C:* #Ifs= 2 Cfg#= 2 Atr=a0 MxPwr=400mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
E:  Ad=84(I) Atr=03(Int.) MxPS=  32 Ivl=128ms
I:  If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=86(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms

I was wrong about ENXIO, by the way. The printout is misleading.
It is trying to tell us that a URB was submitted for an endpoint
which already has a URB submitted.
Comment 5 Pete Zaitcev 2005-08-11 03:11:34 EDT
I think I know what is happening here. We have open and close racing,
and as a result, open attempts to submit acm->ctrlurb and acm->readurb
which were not unlinked yet. The double submission of a bulk URB is
checked by usb-uhci and is refused with the "ENXIO" message. The double
submission of the control-interrupt URB "succeeds" quietly, and corrupts
something. Double termination results in oops (with urb->dev == NULL).

Unfortunately, lock_kernel is not enough to have opens and closes separated,
because some of operations the close path does are blocking. I expect we'll
need a semaphore here somewhere.
Comment 6 Pete Zaitcev 2005-08-11 04:09:56 EDT
Created attachment 117636 [details]
Candidate #1 - backport from 2.6

As it happens, Oliver already implemented the semaphore in 2.6.
Great minds think alike. Also, RHEL 4 is not affected.
I am using same code conventions for similarity.
Comment 7 Pete Zaitcev 2005-08-11 04:12:27 EDT
I'm de-needinfoing this bug, but I still need a precise scenario for surety.
The fix is only based on analysis of oops captures.
Comment 8 Bastien Nocera 2005-08-11 04:47:14 EDT
From what I know the current usage is "normal" usage as a fax server, using
Hylafax. I'll see whether I can get something more precise.
Comment 9 Bastien Nocera 2005-08-12 05:23:45 EDT
When a fax can't be sent, the send is retried at a later time. Every now and
then, the retry will trigger the panic. When the panic occurs, the lock file
from Hylafax usually contains "LOCKWAIT".

Would you be able to provide a test kernel for testing purposes?
Comment 10 Pete Zaitcev 2005-08-31 05:10:33 EDT
Plese find the kernel to test in ftp://people.redhat.com/zaitcev/165453/
Let me know how it went, and assuming success I'll post for acks.
Comment 12 Ernie Petrides 2005-09-15 00:17:30 EDT
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.2.EL).
Comment 15 Red Hat Bugzilla 2006-03-15 11:22:21 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.