Bug 737438

Summary: general protection fault: 0000 [#1] SMP : usb_driver_release_interface() : TAINTED -------D
Product: [Fedora] Fedora Reporter: Mikko Tiihonen <mikko.tiihonen>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: gansalmon, itamar, jforbes, jonathan, kernel-maint, madhu.chinakonda, manisandro, pebolle
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:2f5f069c016e6bbb92cf6cb88877da09e1cae630
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-12 14:55:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Mikko Tiihonen 2011-09-12 05:36:57 UTC
abrt version: 2.0.5
cmdline:        ro root=UUID=10033589-14c8-48f0-8d13-e2b4bf44fce6 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=fi selinux=0 pcie_aspm=force i915.i915_enable_rc6=1 cgroup_disable=memory dma_debug_entries=65536
comment:        Disabled wireless/bluetooth on the hard-off switch of W520 laptop right after boot-up.
kernel:         undefined
kernel_tainted: 128
kernel_tainted_long: Kernel has oopsed before.
reason:         general protection fault: 0000 [#1] SMP 
time:           Mon Sep 12 08:33:56 2011

:general protection fault: 0000 [#1] SMP 
:CPU 4 
:Modules linked in: ppdev parport_pc lp parport fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc lockd rfcomm bnep coretemp arc4 cdc_wdm uvcvideo cdc_ncm usbnet mii videodev media cdc_acm v4l2_compat_ioctl32 snd_hda_codec_hdmi btusb microcode bluetooth joydev snd_hda_codec_conexant i2c_i801 iwlagn iTCO_wdt iTCO_vendor_support mac80211 cfg80211 snd_seq snd_hda_intel snd_seq_device snd_hda_codec e1000e snd_hwdep snd_pcm snd_timer snd_page_alloc thinkpad_acpi binfmt_misc rfkill snd soundcore virtio_net kvm_intel kvm uinput sunrpc btrfs zlib_deflate libcrc32c firewire_ohci sdhci_pci firewire_core sdhci crc_itu_t mmc_core nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video [last unloaded: scsi_wait_scan]
:Pid: 47, comm: khubd Not tainted 3.1.0-0.rc5.git0.0.fc16.x86_64 #1 LENOVO 42844DG/42844DG
:RIP: 0010:[<ffffffff81375e75>]  [<ffffffff81375e75>] usb_driver_release_interface+0xf/0x7f
:RSP: 0018:ffff88042a2c3ba0  EFLAGS: 00010202
:RAX: 0000000000000000 RBX: ffff880417e52520 RCX: 000000000000c306
:RDX: 0000000000000006 RSI: 6b6b6b6b6b6b6b6b RDI: ffffffffa02dc140
:RBP: ffff88042a2c3bc0 R08: ffffea00102cc220 R09: 0000000000000001
:R10: 000000043d807418 R11: ffff88041d2a5668 R12: ffff8804221fa520
:R13: ffff88041d2a5668 R14: ffffffffa02dc1e8 R15: 0000000000000000
:FS:  0000000000000000(0000) GS:ffff88042ce00000(0000) knlGS:0000000000000000
:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
:CR2: 0000003d40e3c045 CR3: 0000000001a05000 CR4: 00000000000406e0
:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
:Process khubd (pid: 47, threadinfo ffff88042a2c2000, task ffff88042a1a24d0)
: 0000000000000000 ffff880417e52520 ffff8804221fa520 ffff88041d2a5668
: ffff88042a2c3bf0 ffffffffa02d9c3c ffff88042a2c3ac0 ffff8804221fa550
: ffff8804221fd388 ffff8804221fa520 ffff88042a2c3c30 ffffffff81375d9e
:Call Trace:
: [<ffffffffa02d9c3c>] btusb_disconnect+0x98/0xbf [btusb]
: [<ffffffff81375d9e>] usb_unbind_interface+0x5c/0x124
: [<ffffffff81316feb>] __device_release_driver+0x86/0xd2
: [<ffffffff8131705c>] device_release_driver+0x25/0x32
: [<ffffffff81316bbb>] bus_remove_device+0xc4/0xd4
: [<ffffffff813140de>] device_del+0x136/0x184
: [<ffffffff81373f86>] usb_disable_device+0xb6/0x1ef
: [<ffffffff8136c717>] usb_disconnect+0xec/0x160
: [<ffffffff8136e89a>] hub_thread+0x83a/0x12af
: [<ffffffff8108f4df>] ? trace_hardirqs_on+0xd/0xf
: [<ffffffff8107aa38>] ? __init_waitqueue_head+0x4b/0x4b
: [<ffffffff8136e060>] ? usb_remote_wakeup+0x64/0x64
: [<ffffffff8107a1cd>] kthread+0xa8/0xb0
: [<ffffffff8150dd84>] kernel_thread_helper+0x4/0x10
: [<ffffffff815051f4>] ? retint_restore_args+0x13/0x13
: [<ffffffff8107a125>] ? __init_kthread_worker+0x5a/0x5a
: [<ffffffff8150dd80>] ? gs_change+0x13/0x13
:Code: 00 00 00 75 08 4c 89 e7 e8 4a fe ff ff 41 5c 5b 41 5c 41 5d 41 5e 41 5f 31 c0 5d c3 55 48 89 e5 41 55 41 54 53 50 66 66 66 66 90 
:RIP  [<ffffffff81375e75>] usb_driver_release_interface+0xf/0x7f
: RSP <ffff88042a2c3ba0>

Comment 1 Paul Bolle 2011-09-26 15:21:54 UTC
0) I ran into something very similar on a (self compiled) vanilla v3.0.4 (under fc14). As far as I can remember what triggered this in my case is just suspending that machine (while bluetooth was on).

1) Some interesting lines of the log of that session:
usb 3-1: USB disconnect, device number 4
btusb_bulk_complete: hci0 urb ffff88012f957948 failed to resubmit (19)
btusb_bulk_complete: hci0 urb ffff880130ed8210 failed to resubmit (19)
btusb_intr_complete: hci0 urb ffff88013b40b318 failed to resubmit (19)
btusb_send_frame: hci0 urb ffff880132e2d108 submission failed
general protection fault: 0000 [#1] SMP 
CPU 0 
Modules linked in: rfcomm bnep nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc kvm_intel kvm uinput snd_hda_codec_analog arc4 snd_hda_intel snd_hda_codec iwl4965 snd_hwdep snd_seq snd_seq_device thinkpad_acpi btusb snd_pcm iwl_legacy e1000e snd_timer snd soundcore iTCO_wdt iTCO_vendor_support pcspkr mac80211 cfg80211 bluetooth microcode snd_page_alloc rfkill i2c_i801 firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci mmc_core yenta_socket i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]

Pid: 21, comm: khubd Not tainted 3.0.4-local0.fc14.x86_64 #1 LENOVO 76735GG/76735GG
RIP: 0010:[<ffffffff8137d9c5>]  [<ffffffff8137d9c5>] usb_driver_release_interface+0x12/0x82
RSP: 0018:ffff880134e27b80  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff880132646360 RCX: 0040000000004006
RDX: 0000000000000006 RSI: 6b6b6b6b6b6b6b6b RDI: ffffffffa0165010
RBP: ffff880134e27ba0 R08: ffff88013b400000 R09: 000000000000005a
R10: ffff880134e27b20 R11: ffffffffa0162d25 R12: ffff880132644240
R13: ffff88013b428000 R14: 0000000000000000 R15: ffffffffa0165010
FS:  0000000000000000(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003f43803b35 CR3: 000000012f9df000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khubd (pid: 21, threadinfo ffff880134e26000, task ffff880134e48000)
 ffff880134e27ba0 ffff880132646360 ffff880132644240 ffff88013b428000
 ffff880134e27bd0 ffffffffa0162cb9 ffff880132644270 ffff880132644270
 ffff880132644240 ffff880132642968 ffff880134e27c20 ffffffff8137d8e0
Call Trace:
 [<ffffffffa0162cb9>] btusb_disconnect+0xa1/0xc7 [btusb]
 [<ffffffff8137d8e0>] usb_unbind_interface+0x74/0x147
 [<ffffffff8131cef6>] __device_release_driver+0x86/0xd2
 [<ffffffff8131cf60>] device_release_driver+0x1e/0x2b
 [<ffffffff8131ca97>] bus_remove_device+0xcb/0xdc
 [<ffffffff8131a392>] device_del+0x138/0x18a
 [<ffffffff8137ba58>] usb_disable_device+0xb7/0x1f0
 [<ffffffff813741b7>] usb_disconnect+0xe9/0x161
 [<ffffffff813762c0>] hub_thread+0x7eb/0x120a
 [<ffffffff8104e9cd>] ? mmdrop+0x1a/0x2a
 [<ffffffff8104ec08>] ? finish_task_switch+0xac/0xdd
 [<ffffffff8107a76a>] ? wake_up_bit+0x2a/0x2a
 [<ffffffff81375ad5>] ? usb_remote_wakeup+0x64/0x64
 [<ffffffff8107a25c>] kthread+0xa0/0xa8
 [<ffffffff8108fdfe>] ? trace_hardirqs_on_caller+0x10b/0x12f
 [<ffffffff814f7464>] kernel_thread_helper+0x4/0x10
 [<ffffffff814efa54>] ? retint_restore_args+0x13/0x13
 [<ffffffff8107a1bc>] ? __init_kthread_worker+0x5b/0x5b
 [<ffffffff814f7460>] ? gs_change+0x13/0x13
Code: 4c 89 ef e8 24 fe ff ff 48 83 c4 18 31 c0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 
RIP  [<ffffffff8137d9c5>] usb_driver_release_interface+0x12/0x82
 RSP <ffff880134e27b80>
---[ end trace 4fa8e296e963698c ]---

2) Peeking at vmlinux with gdb:
(gdb) info line 484
Line 484 of "drivers/usb/core/driver.c" starts at address 0xffffffff8137d9c5 <usb_driver_release_interface+18>
   and ends at 0xffffffff8137d9cc <usb_driver_release_interface+25>.
(gdb) disass /m usb_driver_release_interface+18,+7
Dump of assembler code from 0xffffffff8137d9c5 to 0xffffffff8137d9cc:
480	{

481		struct device *dev = &iface->dev;
483		/* this should never happen, don't release something that's not ours */
484		if (!dev->driver || dev->driver != &driver->drvwrap.driver)
   0xffffffff8137d9c5 <usb_driver_release_interface+18>:	mov    0x140(%rsi),%rax

End of assembler dump.
(gdb) print ((size_t) &((struct usb_interface *)0)->dev)
$4 = 48
(gdb) print ((size_t) &((struct device *)0)->driver)
$5 = 272

3) It seems the code faults while accessing iface->dev->driver, which is at offset (48 + 272 =) 0x140 of struct usb_interface. Somehow, *iface is used after free (and after poisoning, see the contents of the RSI register). I'm not fluent enough in the btusb driver, the usb subsystem and the base driver code to pinpoint what chain of events makes that possible.

Comment 2 Paul Bolle 2011-09-26 15:28:22 UTC
Already discussed at lkml: https://lkml.org/lkml/2011/8/15/400 (with a similar analysis as I cobbled together in comment #1). No solution yet, apparently.

Comment 3 Chuck Ebbert 2011-09-28 14:52:07 UTC
From the LKML thread:
> Also, can you add some debugging statements to
> drivers/bluetooth/btusb.c?  in btusb_disconnect(), it would be good to
> see the values of intf, data, data->intf, and data->isoc.  In addition
> (with caution), data->intf->dev.driver and data->isoc->dev.driver.

Comment 4 Paul Bolle 2011-09-28 15:11:33 UTC
(In reply to comment #3)
> From the LKML thread:
> > Also, can you add some debugging statements to
> > drivers/bluetooth/btusb.c?  in btusb_disconnect(), it would be good to
> > see the values of intf, data, data->intf, and data->isoc.  In addition
> > (with caution), data->intf->dev.driver and data->isoc->dev.driver.

I already did. Nothing very suspicious showed up. To my annoyance I've also been unable to reproduce this. I hope to post a, well, analysis of the debug output at some later time.

Comment 5 Paul Bolle 2011-09-28 16:17:00 UTC
(In reply to comment #4)
> I hope to post a, well, analysis of the debug
> output at some later time.

But first a preliminary theory. What happens if

is called before

That will basically 

which would pull the "intfdata" from under "intf" (since hdev->driver_data and intf's "intfdata" are the same thing) doesn't it?

btusb_destruct() is called by the the bluetooth code, while btusb_disconnect() is called by the usb code.

Is there anything that makes sure that btusb_destruct() doesn't run too early?

Comment 6 Paul Bolle 2011-09-28 16:35:48 UTC
(In reply to comment #5) 
> Is there anything that makes sure that btusb_destruct() doesn't run too early?

And see, just before the faulting call of

we have a call to

which finally calls

which will finally invoke

which here will be

So, perhaps, every now and then, while the kernel is rather busy going into suspend to ram, that last call will already be finished before

is run, giving "intf" an already freed "intfdata". Does this make sense?

Comment 7 Paul Bolle 2011-09-28 19:49:13 UTC
(In reply to comment #6)
> Does this make sense?

0) The call to hci_unregister_dev() is of course wrapped in a neatly balanced pair of __hci_dev_hold() and __hci_dev_put(). But as far as I can see hci_unregister_dev() decrements hdev->refcnt twice:
   1) first through the hci_dev_do_close() call;
   2) then through the __hci_dev_put() call.

1) So even if hdev->refcnt is one after we've been trough btusb_open(), as it seems to be, it should be zero once we call usb_driver_release_interface().

2) But that would mean the non-crashing case - ie, the common case - would be a case of use after free, wouldn't it?

3) A fresh pair of eyes parsing this code might be needed ...

Comment 8 Josh Boyer 2012-02-28 21:58:36 UTC
Paul, Mikko, does this still happen on 3.2.7 or newer?

Comment 9 Mikko Tiihonen 2012-02-29 08:15:13 UTC
I started bluetooth pairing and switched bluetooth+wifi off with hw-switch.

The resulting stack trace looks quite much like the original one. The code seems to manage to progress two more methods further into the stack before failing.

general protection fault: 0000 [#1] SMP 
CPU 7 
Modules linked in: fuse rfcomm bnep btusb bluetooth lockd sunrpc coretemp arc4 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev cdc_ncm v4l2_compat_ioctl32 usbnet cdc_acm
ec_conexant microcode iwlwifi i2c_i801 mac80211 iTCO_wdt cfg80211 iTCO_vendor_support snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc e1000e snd_timer thinkpad_acpi snd soundco
an kvm_intel kvm uinput btrfs libcrc32c zlib_deflate sdhci_pci firewire_ohci sdhci firewire_core mmc_core crc_itu_t
 nouveau mxm_wmi wmi video drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]

Pid: 47, comm: khubd Not tainted 3.3.0-0.rc4.git1.4.fc17.x86_64 #1 LENOVO 42844DG/42844DG
RIP: 0010:[<ffffffffa078d175>]  [<ffffffffa078d175>] hci_conn_hash_flush+0x95/0x120 [bluetooth]
RSP: 0018:ffff880429e73b40  EFLAGS: 00010296
RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffea00109da780 RDI: ffffffffa03c98e0
RBP: ffff880429e73b60 R08: ffff88042769f260 R09: 0000000100180010
R10: ffff88042769f268 R11: 0000000000000000 R12: ffff8804274f4d00
R13: ffff8804274f4bd0 R14: ffff880394438000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88042d400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f077512067c CR3: 000000041b7b5000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khubd (pid: 47, threadinfo ffff880429e72000, task ffff880429d42690)
 ffff880429e73b60 6b6b6b6b6b6b6b6b ffff8804274f4290 ffff8804274f42a0
 ffff880429e73b90 ffffffffa07866b8 ffff8804274f4290 0000000000000000
 ffff88041b6b5cd0 ffff880394438000 ffff880429e73bb0 ffffffffa078a267
Call Trace:
 [<ffffffffa07866b8>] hci_dev_do_close+0xc8/0x340 [bluetooth]
 [<ffffffffa078a267>] hci_unregister_dev+0x47/0x1a0 [bluetooth]
 [<ffffffffa03c62b2>] btusb_disconnect+0x72/0x100 [btusb]
 [<ffffffff8148d403>] usb_unbind_interface+0x53/0x180
 [<ffffffff8141215c>] __device_release_driver+0x7c/0xe0
 [<ffffffff814121ee>] device_release_driver+0x2e/0x40
 [<ffffffff814118a2>] bus_remove_device+0x102/0x180
 [<ffffffff8140eb37>] device_del+0x137/0x1c0
 [<ffffffff8148afd0>] usb_disable_device+0xb0/0x250
 [<ffffffff81481c1a>] usb_disconnect+0xaa/0x160
 [<ffffffff81484029>] hub_thread+0x4f9/0x1630
 [<ffffffff810cda3d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8108ac00>] ? wake_up_bit+0x40/0x40
 [<ffffffff81483b30>] ? usb_remote_wakeup+0x70/0x70
 [<ffffffff8108a3d7>] kthread+0xb7/0xc0
 [<ffffffff810cda3d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff816a88f4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8169ecb4>] ? retint_restore_args+0x13/0x13
 [<ffffffff8108a320>] ? kthread_worker_fn+0x1a0/0x1a0
 [<ffffffff816a88f0>] ? gs_change+0x13/0x13
Code: fd ff ff 48 8b 03 48 89 45 e8 48 8b 5d e8 e8 83 98 8f e0 85 c0 74 0f e8 7a 98 8f e0 85 c0 75 56 66 0f 1f 44 00 00 49 39 dc 74 3b <0f> b6 53 21 66 c7 43 1e 09 00 80 fa 01 74 14 73 8a be 16 00 00 
RIP  [<ffffffffa078d175>] hci_conn_hash_flush+0x95/0x120 [bluetooth]
 RSP <ffff880429e73b40>

Comment 10 Josh Boyer 2012-02-29 14:38:33 UTC
OK, let's move it to rawhide since it is still hitting the latest kernel.

Comment 11 Dave Jones 2012-03-05 22:31:23 UTC
*** Bug 797590 has been marked as a duplicate of this bug. ***

Comment 12 Paul Bolle 2012-03-06 11:16:24 UTC
(In reply to comment #8)
> Paul, Mikko, does this still happen on 3.2.7 or newer?

0) I've never been able to reproduce this.

1) For what it's worth, I seem to remember trying different things to trigger this (eg, disabling bluetooth manually while already going into suspend) for some time without any luck. I've also run with bluetooth enabled on two machines (one running F14 and now F16, one tracking Rawhide) for months on end, in which I suspended hundreds of times, without triggering this GPF even once.

1) So, from my perspective this is just an odd bug I ran into just once and which I was also unable to pinpoint in the code (see comment #1, comment #5, comment #6, and comment #7).

Comment 13 Dave Jones 2012-03-06 15:24:36 UTC
one of the bluetooth developers noted this morning that commit 3c4e0df028935618d052235ba85bc7079be13394 (not yet in mainline) might solve this. I'll backport it to f15/f16 when I figure out where that commit lives.

Comment 14 Mikko Tiihonen 2012-07-12 10:43:26 UTC
I can no longer reproduce this bug in kernel-3.5.0-0.rc6.git2.1.fc18.

The bug might have been fixed already in earlier kernels, but I have not tried to reproduce this bug on them.