Bug 957500
Summary: | [USB] [abrt] BUG: unable to handle kernel NULL pointer dereference at 00000024 IP:xhci_free_dev | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mikhail <mikhail.v.gavrilov> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 22 | CC: | fedora, fhrbata, gansalmon, greg.martyn, hrrares, info, itamar, jonathan, kernel-maint, madhu.chinakonda, michele, m.likholet | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | abrt_hash:12df2c4fcd3c7777b1a7b49a9422c065a180642c,first=3.6.7,latest=3.9.0,Triage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-05-27 13:41:15 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mikhail
2013-04-28 14:36:52 UTC
Created attachment 741168 [details]
File: dmesg
Code: e8 16 ba ff ff 83 f8 ed 0f 85 bd 00 00 00 8b 86 a8 02 00 00 bb 1f 00 00 00 8b 55 f0 8b 94 82 a4 00 00 00 8d 7a 3c 90 8d 74 26 00 <83> 67 e8 fb 89 f8 81 c7 8c 00 00 00 e8 bf 4f c5 ff 83 eb 01 75 All code ======== 0: e8 16 ba ff ff call 0xffffba1b 5: 83 f8 ed cmp $0xffffffed,%eax 8: 0f 85 bd 00 00 00 jne 0xcb e: 8b 86 a8 02 00 00 mov 0x2a8(%esi),%eax 14: bb 1f 00 00 00 mov $0x1f,%ebx 19: 8b 55 f0 mov -0x10(%ebp),%edx 1c: 8b 94 82 a4 00 00 00 mov 0xa4(%edx,%eax,4),%edx 23: 8d 7a 3c lea 0x3c(%edx),%edi 26: 90 nop 27: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi 2b:* 83 67 e8 fb andl $0xfffffffb,-0x18(%edi) <-- trapping instruction 2f: 89 f8 mov %edi,%eax 31: 81 c7 8c 00 00 00 add $0x8c,%edi 37: e8 bf 4f c5 ff call 0xffc54ffb 3c: 83 eb 01 sub $0x1,%ebx 3f: 75 .byte 0x75 Code starting with the faulting instruction =========================================== 0: 83 67 e8 fb andl $0xfffffffb,-0x18(%edi) 4: 89 f8 mov %edi,%eax 6: 81 c7 8c 00 00 00 add $0x8c,%edi c: e8 bf 4f c5 ff call 0xffc54fd0 11: 83 eb 01 sub $0x1,%ebx 14: 75 .byte 0x75 http://download.eng.brq.redhat.com/pub/fedora/linux//development/19/i386/debug/k/kernel-PAE-debuginfo-3.9.0-0.rc8.git0.2.fc19.i686.rpm $ objdump -t vmlinux | grep xhci_free_dev c08064d0 g F .text 0000003d xhci_free_device_endpoint_resources c0806e30 g F .text 00000128 xhci_free_dev $ objdump -d --start-address=0xc0806e30 --stop-address=0xc0806f60 vmlinux vmlinux: file format elf32-i386 Disassembly of section .text: c0806e30 <xhci_free_dev>: c0806e30: 55 push %ebp c0806e31: 89 e5 mov %esp,%ebp c0806e33: 57 push %edi c0806e34: 56 push %esi c0806e35: 53 push %ebx c0806e36: 83 ec 14 sub $0x14,%esp c0806e39: e8 d6 35 19 00 call c099a414 <mcount> c0806e3e: 8b 88 08 01 00 00 mov 0x108(%eax),%ecx ecx = xhci c0806e44: c7 44 24 08 d8 83 9f movl $0xc09f83d8,0x8(%esp) c0806e4b: c0 c0806e4c: 89 d6 mov %edx,%esi c0806e4e: 89 4d f0 mov %ecx,-0x10(%ebp) c0806e51: 31 c9 xor %ecx,%ecx c0806e53: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp) c0806e5a: 00 c0806e5b: c7 04 24 00 00 00 00 movl $0x0,(%esp) c0806e62: 89 45 ec mov %eax,-0x14(%ebp) c0806e65: e8 16 ba ff ff call c0802880 <xhci_check_args> c0806e6a: 83 f8 ed cmp $0xffffffed,%eax c0806e6d: 0f 85 bd 00 00 00 jne c0806f30 <xhci_free_dev+0x100> c0806e73: 8b 86 a8 02 00 00 mov 0x2a8(%esi),%eax c0806e79: bb 1f 00 00 00 mov $0x1f,%ebx initialize i for the loop 0 to 31 i = 0x1f c0806e7e: 8b 55 f0 mov -0x10(%ebp),%edx edx = xhci c0806e81: 8b 94 82 a4 00 00 00 mov 0xa4(%edx,%eax,4),%edx edx = virt_dev(struct xhci_virt_device) c0806e88: 8d 7a 3c lea 0x3c(%edx),%edi 0x3c if offset to xhci_virt_device.eps[0].stop_cmd_timer edi = virt_dev->eps[0].stop_cmd_timer c0806e8b: 90 nop c0806e8c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi c0806e90: 83 67 e8 fb andl $0xfffffffb,-0x18(%edi) -0x18 is offset from stop_cmd_timer to ep_state in xhci_virt_ep c0806e94: 89 f8 mov %edi,%eax c0806e96: 81 c7 8c 00 00 00 add $0x8c,%edi edi = next stop_cmd_timer in the esp array(0x8c should be sizeof xhci_virt_ep) c0806e9c: e8 bf 4f c5 ff call c045be60 <del_timer_sync> c0806ea1: 83 eb 01 sub $0x1,%ebx i-- c0806ea4: 75 ea jne c0806e90 <xhci_free_dev+0x60> !i loop <rest of disasm is snipped because it's not important for this problem> EBX: 0000001f means first iteration(i = 0) EDI: 0000003c this should be address of xhci_virt_device.eps[0].stop_cmd_timer meaning that the virt_dev is NULL(lea 0x3c(%edx),%edi) NULL pointer dereference at 00000024 == 0x3c - 0x18 (stop_cmd_timer - ep_state) So the bottom line is that virt_dev = xhci->devs[udev->slot_id] is NULL and the trap is on virt_dev->eps[i].ep_state &= ~EP_HALT_PENDING; [ 34.489095] usb 3-2.4: new high-speed USB device number 7 using xhci_hcd [ 39.398980] xhci_hcd 0000:05:00.0: Timeout while waiting for address device command [ 41.823024] usb 3-2.4: Device not responding to set address. [ 42.020549] usb 3-2.4: device not accepting address 7, error -71 [ 46.663202] xhci_hcd 0000:05:00.0: Timeout while waiting for a slot [ 63.038765] xhci_hcd 0000:05:00.0: Stopped the command ring failed, maybe the host is dead [ 63.038783] xhci_hcd 0000:05:00.0: Abort command ring failed [ 63.039033] xhci_hcd 0000:05:00.0: HC died; cleaning up [ 63.040009] [sched_delayed] sched: RT throttling activated [ 63.045558] hub 3-2:1.0: cannot reset port 4 (err = -19) [ 63.045567] hub 3-2:1.0: cannot disable port 4 (err = -19) [ 63.045592] BUG: unable to handle kernel NULL pointer dereference at 00000024 xhci_alloc_dev => xhci_cancel_cmd => usb_hc_died => usb_kick_khubd Based on the messages it looks like xhci_alloc_dev failed on timeout, meaning no xhci_alloc_virt_device call and no eps array initialization. Now hub_events calls hub_port_reset and hub_port_disable reporting [ 63.045558] hub 3-2:1.0: cannot reset port 4 (err = -19) [ 63.045567] hub 3-2:1.0: cannot disable port 4 (err = -19) and setting USB_STATE_NOTATTACHED which later triggers hub_quiesce => ... => xhci_free_dev and we got the NULL pointer dereference. *** Bug 959016 has been marked as a duplicate of this bug. *** *** Bug 921659 has been marked as a duplicate of this bug. *** *** Bug 892935 has been marked as a duplicate of this bug. *** FYI, I am still seeing this with kernel-3.10.0-0.rc1.git7.2.fc20.x86_64. I haven't tested kernel-3.10.0-0.rc3.git0.2.fc20.x86_64 yet though. Ping? Any update on this? I don't see much on the linked thread (did it continue on the usb mailing list; I couldn't find much there). Sarah Sharp wrote that this should be fixed by the following patch http://marc.info/?l=linux-usb&m=137476546506888&w=2 I've been unable to reproduce the bug (so far) by doing the same thing as before (using my phone over USB) with 3.11.0-0.rc0.git7.1.fc20.x86_64. I haven't seen it on F19 (haven't tried), but I don't know if it's fixed there either; never saw it on F18. Nevermind, just happened again with that kernel. This bug appears to have been reported against 'rawhide' during the Fedora 20 development cycle. Changing version to '20'. More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora20 The patch mentioned at c#10 has made it in kernel 3.11. Has anyone been able to reproduce it with the latest kernel? If yes can we have a backtrace/messages? Thanks, Michele The same trigger as before has caused an issue (plugging my phone with debugging on into the keyboard USB port): Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: new high-speed USB device number 6 using xhci_hcd Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: New USB device found, idVendor=04e8, idProduct=6860 Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: New USB device strings: Mfr=2, Product=3, SerialNumber=4 Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: Product: Galaxy Nexus Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: Manufacturer: samsung Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: SerialNumber: 014994AF0901501E Jan 20 16:12:48 bronto-burt xlock[23047]: Start: boeckb, boeckb, :0 Jan 20 16:12:53 bronto-burt xlock[23047]: Stop: boeckb, boeckb, :0, 2590m 42s Jan 20 16:13:36 bronto-burt su[14686]: (to root) boeckb on pts/6 Jan 20 16:13:36 bronto-burt su[14686]: pam_unix(su-l:session): session opened for user root by (uid=1000) Jan 20 16:14:00 bronto-burt kernel: usb 3-1.2: USB disconnect, device number 6 Jan 20 16:14:01 bronto-burt kernel: usb 3-1.2: new high-speed USB device number 7 using xhci_hcd Jan 20 16:14:16 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for address device command Jan 20 16:14:17 bronto-burt kernel: usb 3-1.2: Device not responding to set address. Jan 20 16:14:17 bronto-burt kernel: usb 3-1.2: device not accepting address 7, error -71 Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: Stopped the command ring failed, maybe the host is dead Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort command ring failed Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: HC died; cleaning up Jan 20 16:14:41 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot Jan 20 16:14:41 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead. Jan 20 16:14:46 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot Jan 20 16:14:46 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead. Jan 20 16:14:46 bronto-burt kernel: usb 3-1: USB disconnect, device number 2 Jan 20 16:14:46 bronto-burt kernel: usb 3-1.1: USB disconnect, device number 3 Jan 20 16:14:46 bronto-burt kernel: usb 3-1.4: USB disconnect, device number 4 Maybe it's different, but my phone is kicking my devices off the USB bus when plugged into the keyboard. kernel-3.13.0-0.rc4.git4.1.fc21.x86_64 I also have rc8 from the nodebug repo, but I have things to do before I try again. Just occurred with: 3.13.0-0.rc8.git4.2.fc21.x86_64. Same logs as the last comment. OK, so I just booted into 3.14.0-0.rc2.git4.1.fc21.x86_64 and I have something new. Probably a new bug, but I'm not sure (xhci except from the journal): Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: xHCI Host Controller Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: new USB bus registered, assigned bus number 3 Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 45 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 46 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 47 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 48 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 49 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 50 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 51 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 52 for MSI/MSI-X Feb 16 21:24:35 bronto-burt kernel: usb usb3: Manufacturer: Linux 3.14.0-0.rc2.git4.1.fc21.x86_64 xhci_hcd Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: xHCI Host Controller Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: new USB bus registered, assigned bus number 4 Feb 16 21:24:35 bronto-burt kernel: usb usb4: Manufacturer: Linux 3.14.0-0.rc2.git4.1.fc21.x86_64 xhci_hcd Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: Stopped the command ring failed, maybe the host is dead Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort command ring failed Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: HC died; cleaning up Feb 16 21:24:58 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot Feb 16 21:24:58 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead. Feb 16 21:25:03 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot Feb 16 21:25:03 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead. Feb 16 21:25:08 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot Feb 16 21:25:08 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead. All of my other USB ports work, just not the bus with my keyboard and mouse plugged in. I'll try moving the keyboard to another port. If that's it, I guess there's something hokey about my keyboard (Das) :( . This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle. Changing version to '22'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22 *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs. Fedora 22 has now been rebased to 4.2.3-200.fc22. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23. If you experience different issues, please open a new bug report for those. Just updated my Fedora 22 to kernel 4.2.3-200. During kernel boot, I get "HC died; cleaning up". No keyboard nor mouse available and Fedora 22 simply hangs. Had to go back to kernel 4.1.10-200. |