Bug 165453
| Summary: | Panic after ENXIO with usb-uhci | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Bastien Nocera <bnocera> | ||||||||
| Component: | kernel | Assignee: | Pete Zaitcev <zaitcev> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 3.0 | CC: | petrides | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2006-03-15 16:22:21 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 168424 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Bastien Nocera
2005-08-09 15:33:13 UTC
Created attachment 117576 [details]
oops-multitech1.txt
Created attachment 117577 [details]
oops-multitech2.txt
Hmm. This is something that my fixes in 2.4.21-31.EL.usbserial.4 are not likely to fix. The ENXIO is a good clue. It happens upon disconnect, before the disconnect method had a chance to run (either real disconnect, or just the device giving up the ghost). What did you actually do before getting the oops? I need to recreate this situation. I happen to have a Multitech, and it actually does have endpoint 0x86, believe it or not: T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=02(comm.) Sub=00 Prot=00 MxPS= 8 #Cfgs= 2 P: Vendor=06e0 ProdID=f107 Rev= 1.00 S: Manufacturer=Multi-Tech Systems, Inc. S: Product=MultiModemUSB C: #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=400mA I: If#= 0 Alt= 0 #EPs= 0 Cls=ff(vend.) Sub=ff Prot=ff Driver= I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver= E: Ad=02(O) Atr=02(Bulk) MxPS= 16 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 63 Ivl=2ms C:* #Ifs= 2 Cfg#= 2 Atr=a0 MxPwr=400mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm E: Ad=84(I) Atr=03(Int.) MxPS= 32 Ivl=128ms I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=86(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I was wrong about ENXIO, by the way. The printout is misleading. It is trying to tell us that a URB was submitted for an endpoint which already has a URB submitted. I think I know what is happening here. We have open and close racing, and as a result, open attempts to submit acm->ctrlurb and acm->readurb which were not unlinked yet. The double submission of a bulk URB is checked by usb-uhci and is refused with the "ENXIO" message. The double submission of the control-interrupt URB "succeeds" quietly, and corrupts something. Double termination results in oops (with urb->dev == NULL). Unfortunately, lock_kernel is not enough to have opens and closes separated, because some of operations the close path does are blocking. I expect we'll need a semaphore here somewhere. Created attachment 117636 [details]
Candidate #1 - backport from 2.6
As it happens, Oliver already implemented the semaphore in 2.6.
Great minds think alike. Also, RHEL 4 is not affected.
I am using same code conventions for similarity.
I'm de-needinfoing this bug, but I still need a precise scenario for surety. The fix is only based on analysis of oops captures. From what I know the current usage is "normal" usage as a fax server, using Hylafax. I'll see whether I can get something more precise. When a fax can't be sent, the send is retried at a later time. Every now and then, the retry will trigger the panic. When the panic occurs, the lock file from Hylafax usually contains "LOCKWAIT". Would you be able to provide a test kernel for testing purposes? Plese find the kernel to test in ftp://people.redhat.com/zaitcev/165453/ Let me know how it went, and assuming success I'll post for acks. A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.2.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html |