Description of problem: This problem is very similar to BZ231226--this was discovered while testing the fix that's in 231226. When an empty USB port on the rear of a system has an overcurrent (I use a paper clip to briefly connect the power pin to the chassis), the front ports on this system quit working. The front ports on this system are connected through an internal Cypress USB2.0 hub. The problem is that the overcurrent causes the EHCI controller to get an error when it tries to talk to the hub, and hub_events() will see hub->error, and try to reset the controller by calling hub_reset(). The hub_reset() function calls __usb_reset_device, which refuses to reset a hub (see the FIXME in __usb_reset_device()), so hub_reset() fails, and the hub is gone. This was fixed some time ago upstream, with this patch: http://marc.info/?l=linux-usb-devel&m=109511190511780&w=2 It was trivial to port this patch to RHEL4.5 (2.6.9-55.EL). I'll attach the patch for RHEL4.5. Version-Release number of selected component (if applicable): 2.6.9-55.EL How reproducible: every time Steps to Reproduce: 1. get system with internal cypress usb hub (many dell servers have this) 2. short power pin to chassis on unused rear USB port 3. observe that hub is no longer listed in "lsusb" output Actual results: hub disappears Expected results: hub should reappear after overcurrent error handling Additional info:
Created attachment 155272 [details] patch for rhel4.5 (2.6.9-55.EL)
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Just for the record, I found the patch above did not compile cleanly in RHEL-4 because usb_disconnect_nolock() was no longer called. I removed this function, built rpms with Brew, and asked Stuart to test Brew-built release which he did successfully (thanks again). I could not find usb_disconnect_nolock() upstream but it was added to RHEL-4 by another patch. Submitted to rhkernel-list so setting state to post.
The usb_disconnect_nolock was added to fix bug 171220.
It looks like the code that calls usb_disconnnect_nolock() is hub_start_disconnect() in the -55 kernel (before the patch in comment #1 is applied). After the patch from comment #1 is applied, that code is in hub_pre_reset(). So, to apply the patch from comment #1 AND keep the fix for bug 171220, I believe all you'd need to do is apply the patch from comment #1 and then modify hub_pre_reset() to call usb_disconnect_nolock() instead of usb_disconnect().
I disagree with Stuart about the usb_disconnect_nolock. The code to disconnect the hub itself (if reset fails) is removed completely by the patch in question. The code to disconnect hub's children was moved from hub_reset to hub_pre_reset. The deadlock in bug 171220 was caused by hub_start_disconnect, and thus cannot happen if hub_start_disconnect is removed. I may be wrong, but it looks this way. John, please always attach the patch to the bug as posted for review. Don't make us guess later.
Pete, Sorry if there was confusion about the patch. I did attach the patch that I created from Stuart's when I posted it to rhkernel-list. I wrote you an email recently where I tried to explain the code and it crossed my mind that I should include the new patch as well, but I didn't. Perhaps that is where I went wrong. John
I would assume Pete is correct in comment #6. I didn't spend too much time looking at the code when I posted comment #5. I'll look at it again today to convince myself.
OK, yeah, I agree with Pete. With the patch from comment #1 applied to the - 55 kernel, the code in hub_pre_reset is disconnecting the hub's children. The code that caused the deadlock in bug 171220 was trying to disconnect the hub itself. With the patch from comment #1 applied to the -55 kernel, it doesn't look like the hub itself gets disconnected when there's a hub error--it will disconnect the children, and try to reset the hub, and spew an error message if it can't reset the hub, but I don't see it actually disconnecting the hub itself. I'm sorry about the confusion I caused with comment #5.
committed in stream U6 build 55.22. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html