Bug 1565131

Summary: Random crash with Lenovo Thunderbolt 3 dock
Product: [Fedora] Fedora Reporter: Christian Kellner <ckellner>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: airlied, awilliam, bskeggs, ewk, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, kernel-maint, kparal, linville, mchehab, mjg59, redhat-bugs2eran, redhat-bugzilla, sgallagh, shopper2k, steved, thomas
Target Milestone: ---Keywords: CommonBugs, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: https://fedoraproject.org/wiki/Common_F28_bugs#thunderbolt-crash
Fixed In Version: kernel-4.16.5-200.fc27 kernel-4.16.5-300.fc28 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-30 21:18:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469207    
Attachments:
Description Flags
dmesg
none
dmesg another crash none

Description Christian Kellner 2018-04-09 13:08:31 UTC
Created attachment 1419253 [details]
dmesg

External display attached via DisplayPort
USB keyboard and mouse attached to the tunderbolt dock
thunderbolt security is USER

Linux x1.local 4.16.0-300.fc28.x86_64 #1 SMP Tue Apr 3 03:44:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[15434.727091] usb 7-4.3.1: reset high-speed USB device number 6 using xhci_hcd
[15487.079334] CPU2: Core temperature above threshold, cpu clock throttled (total events = 43)
[15487.079335] CPU6: Core temperature above threshold, cpu clock throttled (total events = 43)
[15487.079336] CPU0: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079337] CPU7: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079338] CPU3: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079339] CPU4: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079341] CPU6: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079346] CPU2: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079367] CPU5: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079367] CPU1: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.083333] CPU2: Core temperature/speed normal
[15487.083334] CPU6: Core temperature/speed normal
[15487.083335] CPU4: Package temperature/speed normal
[15487.083336] CPU0: Package temperature/speed normal
[15487.083337] CPU7: Package temperature/speed normal
[15487.083339] CPU5: Package temperature/speed normal
[15487.083339] CPU1: Package temperature/speed normal
[15487.083340] CPU3: Package temperature/speed normal
[15487.083341] CPU6: Package temperature/speed normal
[15487.083344] CPU2: Package temperature/speed normal
[17356.195303] CPU0: Core temperature above threshold, cpu clock throttled (total events = 716)
[17356.195303] CPU4: Core temperature above threshold, cpu clock throttled (total events = 716)
[17356.195305] CPU2: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195305] CPU6: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195306] CPU1: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195307] CPU7: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195308] CPU5: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195309] CPU3: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195311] CPU4: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195312] CPU0: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.196310] CPU4: Core temperature/speed normal
[17356.196310] CPU0: Core temperature/speed normal
[17356.196311] CPU6: Package temperature/speed normal
[17356.196312] CPU2: Package temperature/speed normal
[17356.196313] CPU7: Package temperature/speed normal
[17356.196313] CPU5: Package temperature/speed normal
[17356.196314] CPU3: Package temperature/speed normal
[17356.196314] CPU1: Package temperature/speed normal
[17356.196315] CPU0: Package temperature/speed normal
[17356.196316] CPU4: Package temperature/speed normal
[17391.437327] pciehp 0000:06:01.0:pcie204: Slot(1): Link Down
[17391.437850] pcieport 0000:09:03.0: Refused to change power state, currently in D3
[17391.438067] xhci_hcd 0000:0c:00.0: remove, state 1
[17391.438079] usb usb8: USB disconnect, device number 1
[17391.438083] usb 8-1: USB disconnect, device number 2
[17391.438188] xhci_hcd 0000:0c:00.0: xHCI host controller not responding, assume dead
[17391.476845] xhci_hcd 0000:0c:00.0: USB bus 8 deregistered
[17391.476919] xhci_hcd 0000:0c:00.0: remove, state 1
[17391.476928] usb usb7: USB disconnect, device number 1
[17391.476930] usb 7-3: USB disconnect, device number 3
[17391.520967] usb 7-4: USB disconnect, device number 4
[17391.520971] usb 7-4.3: USB disconnect, device number 5
[17391.520973] usb 7-4.3.1: USB disconnect, device number 6
[17391.524015] usb 7-4.3.2: USB disconnect, device number 7
[17391.565614] usb 7-4.3.3: USB disconnect, device number 8
[17391.631743] xhci_hcd 0000:0c:00.0: Host halt failed, -19
[17391.631748] xhci_hcd 0000:0c:00.0: Host not accessible, reset failed.
[17391.635891] xhci_hcd 0000:0c:00.0: USB bus 7 deregistered
[17391.653661] pcieport 0000:09:02.0: Refused to change power state, currently in D3
[17391.653804] pcieport 0000:09:01.0: Refused to change power state, currently in D3
[17391.653893] xhci_hcd 0000:0a:00.0: remove, state 4
[17391.653900] usb usb6: USB disconnect, device number 1
[17391.654106] xhci_hcd 0000:0a:00.0: USB bus 6 deregistered
[17391.654173] xhci_hcd 0000:0a:00.0: xHCI host controller not responding, assume dead
[17391.654180] xhci_hcd 0000:0a:00.0: remove, state 1
[17391.654186] usb usb5: USB disconnect, device number 1
[17391.654187] usb 5-1: USB disconnect, device number 2
[17391.672046] usb 5-4: USB disconnect, device number 3
[17391.675007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
[17391.675015] IP: tty_unregister_driver+0x9/0x80
[17391.675016] PGD 0 P4D 0 
[17391.675019] Oops: 0000 [#1] SMP PTI
[17391.675021] Modules linked in: cdc_ether usbnet r8152 mii snd_usb_audio snd_usbmidi_lib snd_rawmidi rfcomm fuse ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw devlink ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc vfat fat snd_hda_codec_hdmi rmi_smbus rmi_core arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iwlmvm snd_soc_skl iTCO_wdt mei_wdt kvm_intel iTCO_vendor_support
[17391.675056]  snd_soc_skl_ipc mac80211 kvm snd_hda_ext_core snd_soc_sst_dsp snd_hda_codec_realtek snd_soc_sst_ipc snd_soc_acpi snd_hda_codec_generic wmi_bmof snd_soc_core intel_wmi_thunderbolt snd_compress snd_pcm_dmaengine ac97_bus irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlwifi snd_hda_intel btusb intel_cstate btrtl btbcm snd_hda_codec btintel intel_uncore intel_rapl_perf bluetooth snd_hda_core tpm_crb cfg80211 snd_hwdep thunderbolt uvcvideo snd_seq joydev snd_seq_device videobuf2_vmalloc snd_pcm videobuf2_memops nvmem_core videobuf2_v4l2 videobuf2_common videodev idma64 mei_me thinkpad_acpi media processor_thermal_device tpm_tis snd_timer tpm_tis_core i2c_i801 intel_lpss_pci mei ucsi_acpi shpchp intel_soc_dts_iosf tpm intel_lpss intel_pch_thermal snd ecdh_generic typec_ucsi
[17391.675089]  typec wmi soundcore pinctrl_sunrisepoint int3403_thermal rfkill int340x_thermal_zone pinctrl_intel int3400_thermal acpi_pad acpi_thermal_rel btrfs xor zstd_decompress zstd_compress xxhash uas usb_storage raid6_pq i915 i2c_algo_bit drm_kms_helper e1000e drm nvme ptp nvme_core crc32c_intel serio_raw pps_core video
[17391.675106] CPU: 7 PID: 29122 Comm: kworker/u16:0 Not tainted 4.16.0-300.fc28.x86_64 #1
[17391.675107] Hardware name: LENOVO 20KG0027GE/20KG0027GE, BIOS N23ET33W (1.08 ) 01/22/2018
[17391.675111] Workqueue: pciehp-1 pciehp_power_thread
[17391.675115] RIP: 0010:tty_unregister_driver+0x9/0x80
[17391.675116] RSP: 0018:ffffb85146cdfcc8 EFLAGS: 00010246
[17391.675118] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[17391.675119] RDX: ffff88bc09d59fc0 RSI: ffffdb318e722640 RDI: 0000000000000000
[17391.675121] RBP: ffff88bbd92dc230 R08: ffff88bb1c8993b8 R09: 00000001801e001b
[17391.675122] R10: ffff88bb0e6124a8 R11: 0000000000000000 R12: ffff88bbd92dc000
[17391.675123] R13: ffff88bbd92dc398 R14: 0000000000000060 R15: ffff88babe91f290
[17391.675125] FS:  0000000000000000(0000) GS:ffff88bc215c0000(0000) knlGS:0000000000000000
[17391.675126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17391.675127] CR2: 0000000000000034 CR3: 00000003e720a002 CR4: 00000000003606e0
[17391.675129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17391.675130] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17391.675131] Call Trace:
[17391.675136]  xhci_dbc_tty_unregister_driver+0x11/0x30
[17391.675139]  xhci_dbc_exit+0x2a/0x40
[17391.675142]  xhci_stop+0x50/0x1c0
[17391.675144]  usb_remove_hcd+0xf9/0x240
[17391.675147]  usb_hcd_pci_remove+0x67/0x130
[17391.675150]  pci_device_remove+0x3b/0xb0
[17391.675152]  device_release_driver_internal+0x15a/0x220
[17391.675156]  pci_stop_bus_device+0x80/0xa0
[17391.675158]  pci_stop_bus_device+0x2b/0xa0
[17391.675160]  pci_stop_bus_device+0x3c/0xa0
[17391.675162]  pci_stop_and_remove_bus_device+0xe/0x20
[17391.675164]  pciehp_unconfigure_device+0xb8/0x160
[17391.675166]  pciehp_disable_slot+0x51/0xd0
[17391.675169]  pciehp_power_thread+0x82/0xa0
[17391.675171]  process_one_work+0x187/0x340
[17391.675173]  worker_thread+0x2e/0x380
[17391.675176]  ? pwq_unbound_release_workfn+0xd0/0xd0
[17391.675178]  kthread+0x112/0x130
[17391.675181]  ? kthread_create_worker_on_cpu+0x70/0x70
[17391.675183]  ? do_syscall_64+0x74/0x180
[17391.675186]  ? SyS_exit_group+0x10/0x10
[17391.675188]  ret_from_fork+0x35/0x40
[17391.675190] Code: 31 e4 e8 4b 0b dd ff 48 83 4d 68 01 eb a6 e8 1f 2e b7 ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 89 fb <8b> 77 34 8b 7f 2c c1 e7 14 0b 7b 30 e8 46 2f d5 ff 48 c7 c7 60 
[17391.675816] RIP: tty_unregister_driver+0x9/0x80 RSP: ffffb85146cdfcc8
[17391.675817] CR2: 0000000000000034
[17391.675819] ---[ end trace ee00b4bf1a911772 ]---
[17392.111392] thinkpad_acpi: EC reports that Thermal Table has changed
[17392.259238] thinkpad_acpi: undocked from hotplug port replicator
[17392.529587] thinkpad_acpi: EC reports that Thermal Table has changed
[17393.199924] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.

Comment 1 Christian Kellner 2018-04-09 13:17:36 UTC
Created attachment 1419256 [details]
dmesg another crash

Just happened again, directly after plugging the device in.

Comment 2 Christian Kellner 2018-04-12 14:48:56 UTC
Having talked to Mika from Intel he suggested this due to a bug in the DBC driver and to work around it for now by disabling CONFIG_USB_XHCI_DBGCAP 
As this is related to thunderbolt native enumeration he also suggested to include the fowlling patch set https://www.spinics.net/lists/linux-pci/msg71006.html

Comment 3 Christian Kellner 2018-04-12 18:35:33 UTC
Tested locally to confirm that disabling CONFIG_USB_XHCI_DBGCAP works around the kernel oops.

Comment 4 Christian Kellner 2018-04-13 16:00:15 UTC
The proper fix has been submitted as https://patchwork.kernel.org/patch/10340045/

Comment 5 Laura Abbott 2018-04-16 16:52:59 UTC
*** Bug 1562991 has been marked as a duplicate of this bug. ***

Comment 6 Adam Williamson 2018-04-24 14:39:01 UTC
We should definitely document this for F28. It'd actually have been good to fix it, but we're *really* late now unless we slip. I'm proposing it as an FE in case we do slip and can pull it in.

Comment 7 Stephen Gallagher 2018-04-25 11:53:23 UTC
+1 to FE if we end up slipping. Looking at the patch, it appears to be very non-intrusive.

Comment 8 Fedora Update System 2018-04-28 16:13:22 UTC
kernel-4.16.5-200.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-09afae3bb9

Comment 9 Fedora Update System 2018-04-28 16:14:23 UTC
kernel-4.16.5-300.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-a9d6bb6a8e

Comment 10 Fedora Update System 2018-04-29 09:41:54 UTC
kernel-4.16.5-300.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-a9d6bb6a8e

Comment 11 Fedora Update System 2018-04-29 14:29:43 UTC
kernel-4.16.5-200.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-09afae3bb9

Comment 12 Fedora Update System 2018-04-30 16:37:02 UTC
kernel-4.16.5-200.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Adam Williamson 2018-04-30 16:45:09 UTC
This was reported for F28, not F27, so re-opening.

Comment 14 Fedora Update System 2018-04-30 21:18:28 UTC
kernel-4.16.5-300.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.