Bug 1003926
Summary: | PANIC: "Oops: 0002 [#1] SMP " when plugging in a dodgy USB3 SSD HD | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | kubrick <kubrick> | ||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 19 | CC: | dblack, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, marcelo.barbosa | ||||||||
Target Milestone: | --- | Flags: | jforbes:
needinfo?
|
||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-03-10 14:45:27 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
kubrick@fgv6.net
2013-09-03 14:15:44 UTC
Created attachment 793213 [details]
vmcore-dmesg.txt
Created attachment 793214 [details]
kernel crash backtrace
Sorry I'm not a kernel specialist but it I make an attempt at trying to find what's wrong, that's what I think: drivers/usb/host/xhci-ring.c: in handle_reset_ep_completion(), TRB_TO_EP_INDEX is apparently returning an unexpected value (0x40 in the case of this crash), causing: andl $0xfffffffd,0x40(%rcx) or, in the source, xhci->devs[slot_id]->eps[ep_index].ep_state &= ~EP_HALTED; to trigger a NULL pointer dereference. I don't even know if it's really possible, but shouldn't we using the "xhci_event_cmd *event" structure instead with ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1; ? I'll try that and see how it goes. Even when it doesn't crash, this USB device / hard drive systematically corrupts the USB stack: Sep 4 12:00:07 fguerrazltp kernel: [ 533.774596] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8802038cc500 Sep 4 12:00:07 fguerrazltp kernel: [ 533.774600] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8802038cc540 Sep 4 12:00:07 fguerrazltp systemd-udevd[2809]: inotify_add_watch(7, /dev/sdc, 10) failed: No such file or directory Sep 4 12:00:07 fguerrazltp systemd-udevd[2809]: inotify_add_watch(7, /dev/sdc, 10) failed: No such file or directory Sep 4 12:00:13 fguerrazltp kernel: [ 539.256996] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command Sep 4 12:00:18 fguerrazltp kernel: [ 544.463844] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command Sep 4 12:00:18 fguerrazltp kernel: [ 544.665048] usb 4-1: device not accepting address 4, error -62 Sep 4 12:00:24 fguerrazltp kernel: [ 550.229274] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command Sep 4 12:00:24 fguerrazltp kernel: [ 550.229291] xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd with unknown completion code of 24. Sep 4 12:00:29 fguerrazltp kernel: [ 555.436123] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command Sep 4 12:00:29 fguerrazltp kernel: [ 555.637347] usb 4-1: device not accepting address 6, error -62 Sep 4 12:00:35 fguerrazltp kernel: [ 561.201584] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command Sep 4 12:00:35 fguerrazltp kernel: [ 561.402824] xhci_hcd 0000:00:14.0: ERROR: unexpected command completion code 0x0. Sep 4 12:00:35 fguerrazltp kernel: [ 561.604045] usb 4-1: device not accepting address 8, error -22 Sep 4 12:00:40 fguerrazltp kernel: [ 566.609664] xhci_hcd 0000:00:14.0: Timeout while waiting for a slot Sep 4 12:00:40 fguerrazltp kernel: [ 566.609677] hub 4-0:1.0: couldn't allocate port 1 usb_device And then, when I disconnect this device and try plugging in a working USB stick: Sep 4 12:03:23 fguerrazltp kernel: [ 730.167231] xhci_hcd 0000:00:14.0: ERROR: unexpected command completion code 0x0. Sep 4 12:03:24 fguerrazltp kernel: [ 730.368462] xhci_hcd 0000:00:14.0: Setup ERROR: address device command for slot 10. Sep 4 12:03:24 fguerrazltp kernel: [ 730.569702] usb 4-1: device not accepting address 9, error -22 *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. Yes, still happens on 3.11.1-200.fc19 : KERNEL: /usr/lib/debug/lib/modules/3.11.1-200.fc19.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 4 DATE: Thu Sep 19 11:39:34 2013 UPTIME: 00:11:49 LOAD AVERAGE: 0.93, 0.61, 0.37 TASKS: 429 NODENAME: fguerrazltp RELEASE: 3.11.1-200.fc19.x86_64 VERSION: #1 SMP Sat Sep 14 15:04:51 UTC 2013 MACHINE: x86_64 (2494 Mhz) MEMORY: 7.7 GB PANIC: WARNING: log buf data structure(s) have changed "" PID: 0 COMMAND: "swapper/2" TASK: ffff8802133bd3e0 (1 of 4) [THREAD_INFO: ffff880212c6e000] CPU: 2 STATE: TASK_RUNNING (PANIC) Created attachment 799836 [details]
kernel crash backtrace 3.11.1-200.fc19.x86_64
I haven't experienced a panic, or gotten a core or abrt report, but with $ uname -a Linux dblack 3.11.1-200.fc19.x86_64 #1 SMP Sat Sep 14 15:04:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux and a USB3 external HDD that I don't *think* is dodgy, I consistently get a failure of xhci similar to the OP's log above. Sep 24 20:41:24 dblack kernel: [ 310.762507] usb 3-1: new full-speed USB device number 2 using xhci_hcd Sep 24 20:41:35 dblack chronyd[711]: Selected source 88.190.34.44 Sep 24 20:41:43 dblack kernel: [ 315.760676] xhci_hcd 0000:0d:00.0: Timeout while waiting for address device command Sep 24 20:41:43 dblack kernel: [ 329.601351] xhci_hcd 0000:0d:00.0: Stopped the command ring failed, maybe the host is dead Sep 24 20:41:43 dblack kernel: [ 329.645507] xhci_hcd 0000:0d:00.0: Host not halted after 16000 microseconds. Sep 24 20:41:43 dblack kernel: [ 329.645509] xhci_hcd 0000:0d:00.0: Abort command ring failed Sep 24 20:41:43 dblack kernel: [ 329.645904] xhci_hcd 0000:0d:00.0: HC died; cleaning up Sep 24 20:41:43 dblack rtkit-daemon[725]: The canary thread is apparently starving. Taking action. Sep 24 20:41:43 dblack rtkit-daemon[725]: Demoting known real-time threads. Sep 24 20:41:43 dblack rtkit-daemon[725]: Successfully demoted thread 1920 of process 1889 (/usr/bin/pulseaudio). Sep 24 20:41:43 dblack rtkit-daemon[725]: Successfully demoted thread 1909 of process 1889 (/usr/bin/pulseaudio). Sep 24 20:41:43 dblack rtkit-daemon[725]: Successfully demoted thread 1903 of process 1889 (/usr/bin/pulseaudio). Sep 24 20:41:43 dblack rtkit-daemon[725]: Successfully demoted thread 1897 of process 1889 (/usr/bin/pulseaudio). Sep 24 20:41:43 dblack rtkit-daemon[725]: Successfully demoted thread 1889 of process 1889 (/usr/bin/pulseaudio). Sep 24 20:41:43 dblack rtkit-daemon[725]: Demoted 5 threads. Sep 24 20:41:48 dblack kernel: [ 334.852703] xhci_hcd 0000:0d:00.0: Timeout while waiting for address device command Sep 24 20:41:48 dblack kernel: [ 334.852713] xhci_hcd 0000:0d:00.0: Abort the command ring, but the xHCI is dead. Sep 24 20:41:48 dblack kernel: [ 335.053616] usb 3-1: device not accepting address 2, error -108 Sep 24 20:41:53 dblack kernel: [ 340.051550] xhci_hcd 0000:0d:00.0: Timeout while waiting for a slot Sep 24 20:41:53 dblack kernel: [ 340.051562] xhci_hcd 0000:0d:00.0: Abort the command ring, but the xHCI is dead. Sep 24 20:41:58 dblack kernel: [ 345.049489] xhci_hcd 0000:0d:00.0: Timeout while waiting for a slot Sep 24 20:41:58 dblack kernel: [ 345.049502] xhci_hcd 0000:0d:00.0: Abort the command ring, but the xHCI is dead. Sep 24 20:42:03 dblack kernel: [ 350.047391] xhci_hcd 0000:0d:00.0: Timeout while waiting for a slot Sep 24 20:42:03 dblack kernel: [ 350.047401] xhci_hcd 0000:0d:00.0: Abort the command ring, but the xHCI is dead. After this happens, subsequent plug/unplug of the device to the USB 3 port results in no logging whatsoever. The device can be plugged into a legacy USB port and function normally (albeit slowly). The USB 3 port no longer seems to accept connection from any USB device -- nothing is logged and no tested devices work. So far, I can't resolve this and use the USB 3 port again (for legacy devices) without rebooting the system. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those. As if somebody cared... I don't own the hardware that allowed be to trigger the bug anyway. *********** MASS BUG UPDATE ************** This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested. |