Bug 957500

Summary: [USB] [abrt] BUG: unable to handle kernel NULL pointer dereference at 00000024 IP:xhci_free_dev
Product: [Fedora] Fedora Reporter: Mikhail <mikhail.v.gavrilov>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: fedora, fhrbata, gansalmon, greg.martyn, hrrares, info, itamar, jonathan, kernel-maint, madhu.chinakonda, michele, m.likholet
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Unspecified   
Whiteboard: abrt_hash:12df2c4fcd3c7777b1a7b49a9422c065a180642c,first=3.6.7,latest=3.9.0,Triage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-27 13:41:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: dmesg none

Description Mikhail 2013-04-28 14:36:52 UTC
Additional info:
BUG: unable to handle kernel NULL pointer dereference at 00000024
IP: [<c0806e90>] xhci_free_dev+0x60/0x130
*pdpt = 0000000000000000 *pde = f000eef3f000eef3 
Oops: 0002 [#1] SMP 
Modules linked in: ebtable_nat ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack bnep bluetooth rfkill ebtable_filter ebtables ip6table_filter ip6_tables iTCO_wdt iTCO_vendor_support ppdev snd_ice1724 snd_ak4113 snd_pt2258 snd_ak4114 mperf snd_i2c snd_ice17xx_ak4xxx coretemp snd_ak4xxx_adda crc32_pclmul snd_rawmidi crc32c_intel snd_ac97_codec ac97_bus serio_raw microcode i2c_i801 snd_seq snd_seq_device lpc_ich snd_pcm mfd_core snd_page_alloc snd_timer snd soundcore atl1c mei parport_pc parport vhost_net tun macvtap macvlan kvm_intel binfmt_misc kvm usb_storage i915 i2c_algo_bit drm_kms_helper drm i2c_core video uinput
Pid: 34, comm: khubd Not tainted 3.9.0-0.rc8.git0.2.fc19.i686.PAE #1 To Be Filled By O.E.M. To Be Filled By O.E.M./H61M/U3S3
EIP: 0060:[<c0806e90>] EFLAGS: 00010246 CPU: 0
EIP is at xhci_free_dev+0x60/0x130
EAX: 00000000 EBX: 0000001f ECX: f12f7000 EDX: 00000000
ESI: ec2f4000 EDI: 0000003c EBP: f1323ec8 ESP: f1323ea8
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: 00000024 CR3: 00d0f000 CR4: 000407f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process khubd (pid: 34, ti=f1322000 task=f122d940 task.ti=f1322000)
Stack:
 00000000 00000001 c09f83d8 f09f4e00 f12f7000 ec2f4000 ffffff00 f0aec034
 f1323f6c c07d97b4 00000001 c0b7f660 00000004 c0c00540 f1323f64 c09911c6
 f75ba2c0 f1323efc f122d940 f0aec0a0 00000064 f122d940 f0aafe1c f09f4e00
Call Trace:
 [<c07d97b4>] hub_thread+0x704/0x1520
 [<c09911c6>] ? __schedule+0x366/0x780
 [<c046ccf0>] ? wake_up_bit+0x20/0x20
 [<c07d90b0>] ? hub_port_debounce+0x140/0x140
 [<c046c2b4>] kthread+0x94/0xa0
 [<c0999d77>] ret_from_kernel_thread+0x1b/0x28
 [<c046c220>] ? insert_kthread_work+0x30/0x30
Code: e8 16 ba ff ff 83 f8 ed 0f 85 bd 00 00 00 8b 86 a8 02 00 00 bb 1f 00 00 00 8b 55 f0 8b 94 82 a4 00 00 00 8d 7a 3c 90 8d 74 26 00 <83> 67 e8 fb 89 f8 81 c7 8c 00 00 00 e8 bf 4f c5 ff 83 eb 01 75
EIP: [<c0806e90>] xhci_free_dev+0x60/0x130 SS:ESP 0068:f1323ea8
CR2: 0000000000000024

Comment 1 Mikhail 2013-04-28 14:36:57 UTC
Created attachment 741168 [details]
File: dmesg

Comment 2 Frantisek Hrbata 2013-05-06 14:00:55 UTC
Code: e8 16 ba ff ff 83 f8 ed 0f 85 bd 00 00 00 8b 86 a8 02 00 00 bb 1f 00 00 00 8b 55 f0 8b 94 82 a4 00 00 00 8d 7a 3c 90 8d 74 26 00 <83> 67 e8 fb 89 f8 81 c7 8c 00 00 00 e8 bf 4f c5 ff 83 eb 01 75
All code
========
   0:   e8 16 ba ff ff          call   0xffffba1b
   5:   83 f8 ed                cmp    $0xffffffed,%eax
   8:   0f 85 bd 00 00 00       jne    0xcb
   e:   8b 86 a8 02 00 00       mov    0x2a8(%esi),%eax
  14:   bb 1f 00 00 00          mov    $0x1f,%ebx
  19:   8b 55 f0                mov    -0x10(%ebp),%edx
  1c:   8b 94 82 a4 00 00 00    mov    0xa4(%edx,%eax,4),%edx
  23:   8d 7a 3c                lea    0x3c(%edx),%edi
  26:   90                      nop
  27:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
  2b:*  83 67 e8 fb             andl   $0xfffffffb,-0x18(%edi)     <-- trapping instruction
  2f:   89 f8                   mov    %edi,%eax
  31:   81 c7 8c 00 00 00       add    $0x8c,%edi
  37:   e8 bf 4f c5 ff          call   0xffc54ffb
  3c:   83 eb 01                sub    $0x1,%ebx
  3f:   75                      .byte 0x75

Code starting with the faulting instruction
===========================================
   0:   83 67 e8 fb             andl   $0xfffffffb,-0x18(%edi)
   4:   89 f8                   mov    %edi,%eax
   6:   81 c7 8c 00 00 00       add    $0x8c,%edi
   c:   e8 bf 4f c5 ff          call   0xffc54fd0
  11:   83 eb 01                sub    $0x1,%ebx
  14:   75                      .byte 0x75


http://download.eng.brq.redhat.com/pub/fedora/linux//development/19/i386/debug/k/kernel-PAE-debuginfo-3.9.0-0.rc8.git0.2.fc19.i686.rpm

$ objdump -t vmlinux | grep xhci_free_dev
c08064d0 g     F .text  0000003d xhci_free_device_endpoint_resources
c0806e30 g     F .text  00000128 xhci_free_dev

$ objdump -d --start-address=0xc0806e30 --stop-address=0xc0806f60 vmlinux

vmlinux:     file format elf32-i386


Disassembly of section .text:

c0806e30 <xhci_free_dev>:
c0806e30:       55                      push   %ebp
c0806e31:       89 e5                   mov    %esp,%ebp
c0806e33:       57                      push   %edi
c0806e34:       56                      push   %esi
c0806e35:       53                      push   %ebx
c0806e36:       83 ec 14                sub    $0x14,%esp
c0806e39:       e8 d6 35 19 00          call   c099a414 <mcount>
c0806e3e:       8b 88 08 01 00 00       mov    0x108(%eax),%ecx

ecx = xhci

c0806e44:       c7 44 24 08 d8 83 9f    movl   $0xc09f83d8,0x8(%esp)
c0806e4b:       c0
c0806e4c:       89 d6                   mov    %edx,%esi
c0806e4e:       89 4d f0                mov    %ecx,-0x10(%ebp)
c0806e51:       31 c9                   xor    %ecx,%ecx
c0806e53:       c7 44 24 04 01 00 00    movl   $0x1,0x4(%esp)
c0806e5a:       00
c0806e5b:       c7 04 24 00 00 00 00    movl   $0x0,(%esp)
c0806e62:       89 45 ec                mov    %eax,-0x14(%ebp)
c0806e65:       e8 16 ba ff ff          call   c0802880 <xhci_check_args>
c0806e6a:       83 f8 ed                cmp    $0xffffffed,%eax
c0806e6d:       0f 85 bd 00 00 00       jne    c0806f30 <xhci_free_dev+0x100>
c0806e73:       8b 86 a8 02 00 00       mov    0x2a8(%esi),%eax
c0806e79:       bb 1f 00 00 00          mov    $0x1f,%ebx

initialize i for the loop 0 to 31
i = 0x1f

c0806e7e:       8b 55 f0                mov    -0x10(%ebp),%edx

edx = xhci

c0806e81:       8b 94 82 a4 00 00 00    mov    0xa4(%edx,%eax,4),%edx

edx = virt_dev(struct xhci_virt_device)

c0806e88:       8d 7a 3c                lea    0x3c(%edx),%edi

0x3c if offset to xhci_virt_device.eps[0].stop_cmd_timer
edi = virt_dev->eps[0].stop_cmd_timer

c0806e8b:       90                      nop
c0806e8c:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
c0806e90:       83 67 e8 fb             andl   $0xfffffffb,-0x18(%edi)

-0x18 is offset from stop_cmd_timer to ep_state in xhci_virt_ep

c0806e94:       89 f8                   mov    %edi,%eax
c0806e96:       81 c7 8c 00 00 00       add    $0x8c,%edi

edi = next stop_cmd_timer in the esp array(0x8c should be sizeof xhci_virt_ep)

c0806e9c:       e8 bf 4f c5 ff          call   c045be60 <del_timer_sync>
c0806ea1:       83 eb 01                sub    $0x1,%ebx

i--

c0806ea4:       75 ea                   jne    c0806e90 <xhci_free_dev+0x60>

!i loop

<rest of disasm is snipped because it's not important for this problem>

EBX: 0000001f means first iteration(i = 0)
EDI: 0000003c this should be address of xhci_virt_device.eps[0].stop_cmd_timer meaning that the virt_dev is NULL(lea    0x3c(%edx),%edi)

NULL pointer dereference at 00000024 == 0x3c - 0x18 (stop_cmd_timer - ep_state)

So the bottom line is that

virt_dev = xhci->devs[udev->slot_id] is NULL

and the trap is on

virt_dev->eps[i].ep_state &= ~EP_HALT_PENDING;

Comment 3 Frantisek Hrbata 2013-05-06 14:44:50 UTC
[   34.489095] usb 3-2.4: new high-speed USB device number 7 using xhci_hcd
[   39.398980] xhci_hcd 0000:05:00.0: Timeout while waiting for address device command
[   41.823024] usb 3-2.4: Device not responding to set address.
[   42.020549] usb 3-2.4: device not accepting address 7, error -71
[   46.663202] xhci_hcd 0000:05:00.0: Timeout while waiting for a slot
[   63.038765] xhci_hcd 0000:05:00.0: Stopped the command ring failed, maybe the host is dead
[   63.038783] xhci_hcd 0000:05:00.0: Abort command ring failed
[   63.039033] xhci_hcd 0000:05:00.0: HC died; cleaning up
[   63.040009] [sched_delayed] sched: RT throttling activated
[   63.045558] hub 3-2:1.0: cannot reset port 4 (err = -19)
[   63.045567] hub 3-2:1.0: cannot disable port 4 (err = -19)
[   63.045592] BUG: unable to handle kernel NULL pointer dereference at 00000024

xhci_alloc_dev => xhci_cancel_cmd => usb_hc_died => usb_kick_khubd

Based on the messages it looks like xhci_alloc_dev failed on timeout, meaning no xhci_alloc_virt_device
call and no eps array initialization. Now hub_events calls hub_port_reset and hub_port_disable reporting
[   63.045558] hub 3-2:1.0: cannot reset port 4 (err = -19)
[   63.045567] hub 3-2:1.0: cannot disable port 4 (err = -19)
and setting USB_STATE_NOTATTACHED which later triggers hub_quiesce => ... => xhci_free_dev
and we got the NULL pointer dereference.

Comment 4 Frantisek Hrbata 2013-05-07 08:56:04 UTC
*** Bug 959016 has been marked as a duplicate of this bug. ***

Comment 5 Frantisek Hrbata 2013-05-07 08:58:25 UTC
*** Bug 921659 has been marked as a duplicate of this bug. ***

Comment 6 Frantisek Hrbata 2013-05-07 12:35:13 UTC
*** Bug 892935 has been marked as a duplicate of this bug. ***

Comment 7 Frantisek Hrbata 2013-05-07 14:17:28 UTC
http://marc.info/?l=linux-kernel&m=136793543514532&w=2

Comment 8 Ben Boeckel 2013-06-01 01:46:31 UTC
FYI, I am still seeing this with kernel-3.10.0-0.rc1.git7.2.fc20.x86_64. I haven't tested kernel-3.10.0-0.rc3.git0.2.fc20.x86_64 yet though.

Comment 9 Ben Boeckel 2013-07-26 19:06:27 UTC
Ping? Any update on this? I don't see much on the linked thread (did it continue on the usb mailing list; I couldn't find much there).

Comment 10 Frantisek Hrbata 2013-07-27 07:55:28 UTC
Sarah Sharp wrote that this should be fixed by the following patch
http://marc.info/?l=linux-usb&m=137476546506888&w=2

Comment 11 Ben Boeckel 2013-08-06 03:57:19 UTC
I've been unable to reproduce the bug (so far) by doing the same thing as before (using my phone over USB) with 3.11.0-0.rc0.git7.1.fc20.x86_64. I haven't seen it on F19 (haven't tried), but I don't know if it's fixed there either; never saw it on F18.

Comment 12 Ben Boeckel 2013-08-08 00:18:08 UTC
Nevermind, just happened again with that kernel.

Comment 13 Fedora End Of Life 2013-09-16 13:41:50 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 20 development cycle.
Changing version to '20'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora20

Comment 14 Michele Baldessari 2013-11-30 09:31:42 UTC
The patch mentioned at c#10 has made it in kernel 3.11. 

Has anyone been able to reproduce it with the latest kernel? If yes can we have
a backtrace/messages?

Thanks,
Michele

Comment 15 Ben Boeckel 2014-01-20 23:35:04 UTC
The same trigger as before has caused an issue (plugging my phone with debugging on into the keyboard USB port):

Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: new high-speed USB device number 6 using xhci_hcd
Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: New USB device found, idVendor=04e8, idProduct=6860
Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: New USB device strings: Mfr=2, Product=3, SerialNumber=4
Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: Product: Galaxy Nexus
Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: Manufacturer: samsung
Jan 20 16:10:14 bronto-burt kernel: usb 3-1.2: SerialNumber: 014994AF0901501E
Jan 20 16:12:48 bronto-burt xlock[23047]: Start: boeckb, boeckb, :0
Jan 20 16:12:53 bronto-burt xlock[23047]: Stop: boeckb, boeckb, :0, 2590m 42s
Jan 20 16:13:36 bronto-burt su[14686]: (to root) boeckb on pts/6
Jan 20 16:13:36 bronto-burt su[14686]: pam_unix(su-l:session): session opened for user root by (uid=1000)
Jan 20 16:14:00 bronto-burt kernel: usb 3-1.2: USB disconnect, device number 6
Jan 20 16:14:01 bronto-burt kernel: usb 3-1.2: new high-speed USB device number 7 using xhci_hcd
Jan 20 16:14:16 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for address device command
Jan 20 16:14:17 bronto-burt kernel: usb 3-1.2: Device not responding to set address.
Jan 20 16:14:17 bronto-burt kernel: usb 3-1.2: device not accepting address 7, error -71
Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot
Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: Stopped the command ring failed, maybe the host is dead
Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort command ring failed
Jan 20 16:14:36 bronto-burt kernel: xhci_hcd 0000:07:00.0: HC died; cleaning up
Jan 20 16:14:41 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot
Jan 20 16:14:41 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead.
Jan 20 16:14:46 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot
Jan 20 16:14:46 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead.
Jan 20 16:14:46 bronto-burt kernel: usb 3-1: USB disconnect, device number 2
Jan 20 16:14:46 bronto-burt kernel: usb 3-1.1: USB disconnect, device number 3
Jan 20 16:14:46 bronto-burt kernel: usb 3-1.4: USB disconnect, device number 4

Maybe it's different, but my phone is kicking my devices off the USB bus when plugged into the keyboard.

kernel-3.13.0-0.rc4.git4.1.fc21.x86_64

I also have rc8 from the nodebug repo, but I have things to do before I try again.

Comment 16 Ben Boeckel 2014-02-17 00:46:58 UTC
Just occurred with: 3.13.0-0.rc8.git4.2.fc21.x86_64. Same logs as the last comment.

Comment 17 Ben Boeckel 2014-02-17 02:38:13 UTC
OK, so I just booted into 3.14.0-0.rc2.git4.1.fc21.x86_64 and I have something new. Probably a new bug, but I'm not sure (xhci except from the journal):

Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: xHCI Host Controller
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: new USB bus registered, assigned bus number 3
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 45 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 46 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 47 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 48 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 49 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 50 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 51 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: irq 52 for MSI/MSI-X
Feb 16 21:24:35 bronto-burt kernel: usb usb3: Manufacturer: Linux 3.14.0-0.rc2.git4.1.fc21.x86_64 xhci_hcd
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: xHCI Host Controller
Feb 16 21:24:35 bronto-burt kernel: xhci_hcd 0000:07:00.0: new USB bus registered, assigned bus number 4
Feb 16 21:24:35 bronto-burt kernel: usb usb4: Manufacturer: Linux 3.14.0-0.rc2.git4.1.fc21.x86_64 xhci_hcd
Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot
Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: Stopped the command ring failed, maybe the host is dead
Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort command ring failed
Feb 16 21:24:53 bronto-burt kernel: xhci_hcd 0000:07:00.0: HC died; cleaning up
Feb 16 21:24:58 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot
Feb 16 21:24:58 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead.
Feb 16 21:25:03 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot
Feb 16 21:25:03 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead.
Feb 16 21:25:08 bronto-burt kernel: xhci_hcd 0000:07:00.0: Timeout while waiting for a slot
Feb 16 21:25:08 bronto-burt kernel: xhci_hcd 0000:07:00.0: Abort the command ring, but the xHCI is dead.

All of my other USB ports work, just not the bus with my keyboard and mouse plugged in. I'll try moving the keyboard to another port. If that's it, I guess there's something hokey about my keyboard (Das) :( .

Comment 18 Jaroslav Reznik 2015-03-03 14:55:27 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 19 Justin M. Forbes 2015-10-20 19:39:05 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 20 Frank Cornelis 2015-10-23 07:11:29 UTC
Just updated my Fedora 22 to kernel 4.2.3-200.
During kernel boot, I get "HC died; cleaning up". No keyboard nor mouse available and Fedora 22 simply hangs.
Had to go back to kernel 4.1.10-200.