Bug 1565131 - Random crash with Lenovo Thunderbolt 3 dock
Summary: Random crash with Lenovo Thunderbolt 3 dock
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://fedoraproject.org/wiki/Common...
Keywords: CommonBugs, Reopened
: 1562991 (view as bug list)
Depends On:
Blocks: F28FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2018-04-09 13:08 UTC by Christian Kellner
Modified: 2018-05-01 21:59 UTC (History)
23 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-04-30 21:18:28 UTC


Attachments (Terms of Use)
dmesg (124.95 KB, text/plain)
2018-04-09 13:08 UTC, Christian Kellner
no flags Details
dmesg another crash (137.49 KB, text/plain)
2018-04-09 13:17 UTC, Christian Kellner
no flags Details

Description Christian Kellner 2018-04-09 13:08:31 UTC
Created attachment 1419253 [details]
dmesg

External display attached via DisplayPort
USB keyboard and mouse attached to the tunderbolt dock
thunderbolt security is USER

Linux x1.local 4.16.0-300.fc28.x86_64 #1 SMP Tue Apr 3 03:44:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[15434.727091] usb 7-4.3.1: reset high-speed USB device number 6 using xhci_hcd
[15487.079334] CPU2: Core temperature above threshold, cpu clock throttled (total events = 43)
[15487.079335] CPU6: Core temperature above threshold, cpu clock throttled (total events = 43)
[15487.079336] CPU0: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079337] CPU7: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079338] CPU3: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079339] CPU4: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079341] CPU6: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079346] CPU2: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079367] CPU5: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.079367] CPU1: Package temperature above threshold, cpu clock throttled (total events = 764)
[15487.083333] CPU2: Core temperature/speed normal
[15487.083334] CPU6: Core temperature/speed normal
[15487.083335] CPU4: Package temperature/speed normal
[15487.083336] CPU0: Package temperature/speed normal
[15487.083337] CPU7: Package temperature/speed normal
[15487.083339] CPU5: Package temperature/speed normal
[15487.083339] CPU1: Package temperature/speed normal
[15487.083340] CPU3: Package temperature/speed normal
[15487.083341] CPU6: Package temperature/speed normal
[15487.083344] CPU2: Package temperature/speed normal
[17356.195303] CPU0: Core temperature above threshold, cpu clock throttled (total events = 716)
[17356.195303] CPU4: Core temperature above threshold, cpu clock throttled (total events = 716)
[17356.195305] CPU2: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195305] CPU6: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195306] CPU1: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195307] CPU7: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195308] CPU5: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195309] CPU3: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195311] CPU4: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.195312] CPU0: Package temperature above threshold, cpu clock throttled (total events = 765)
[17356.196310] CPU4: Core temperature/speed normal
[17356.196310] CPU0: Core temperature/speed normal
[17356.196311] CPU6: Package temperature/speed normal
[17356.196312] CPU2: Package temperature/speed normal
[17356.196313] CPU7: Package temperature/speed normal
[17356.196313] CPU5: Package temperature/speed normal
[17356.196314] CPU3: Package temperature/speed normal
[17356.196314] CPU1: Package temperature/speed normal
[17356.196315] CPU0: Package temperature/speed normal
[17356.196316] CPU4: Package temperature/speed normal
[17391.437327] pciehp 0000:06:01.0:pcie204: Slot(1): Link Down
[17391.437850] pcieport 0000:09:03.0: Refused to change power state, currently in D3
[17391.438067] xhci_hcd 0000:0c:00.0: remove, state 1
[17391.438079] usb usb8: USB disconnect, device number 1
[17391.438083] usb 8-1: USB disconnect, device number 2
[17391.438188] xhci_hcd 0000:0c:00.0: xHCI host controller not responding, assume dead
[17391.476845] xhci_hcd 0000:0c:00.0: USB bus 8 deregistered
[17391.476919] xhci_hcd 0000:0c:00.0: remove, state 1
[17391.476928] usb usb7: USB disconnect, device number 1
[17391.476930] usb 7-3: USB disconnect, device number 3
[17391.520967] usb 7-4: USB disconnect, device number 4
[17391.520971] usb 7-4.3: USB disconnect, device number 5
[17391.520973] usb 7-4.3.1: USB disconnect, device number 6
[17391.524015] usb 7-4.3.2: USB disconnect, device number 7
[17391.565614] usb 7-4.3.3: USB disconnect, device number 8
[17391.631743] xhci_hcd 0000:0c:00.0: Host halt failed, -19
[17391.631748] xhci_hcd 0000:0c:00.0: Host not accessible, reset failed.
[17391.635891] xhci_hcd 0000:0c:00.0: USB bus 7 deregistered
[17391.653661] pcieport 0000:09:02.0: Refused to change power state, currently in D3
[17391.653804] pcieport 0000:09:01.0: Refused to change power state, currently in D3
[17391.653893] xhci_hcd 0000:0a:00.0: remove, state 4
[17391.653900] usb usb6: USB disconnect, device number 1
[17391.654106] xhci_hcd 0000:0a:00.0: USB bus 6 deregistered
[17391.654173] xhci_hcd 0000:0a:00.0: xHCI host controller not responding, assume dead
[17391.654180] xhci_hcd 0000:0a:00.0: remove, state 1
[17391.654186] usb usb5: USB disconnect, device number 1
[17391.654187] usb 5-1: USB disconnect, device number 2
[17391.672046] usb 5-4: USB disconnect, device number 3
[17391.675007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
[17391.675015] IP: tty_unregister_driver+0x9/0x80
[17391.675016] PGD 0 P4D 0 
[17391.675019] Oops: 0000 [#1] SMP PTI
[17391.675021] Modules linked in: cdc_ether usbnet r8152 mii snd_usb_audio snd_usbmidi_lib snd_rawmidi rfcomm fuse ccm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw devlink ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc vfat fat snd_hda_codec_hdmi rmi_smbus rmi_core arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp iwlmvm snd_soc_skl iTCO_wdt mei_wdt kvm_intel iTCO_vendor_support
[17391.675056]  snd_soc_skl_ipc mac80211 kvm snd_hda_ext_core snd_soc_sst_dsp snd_hda_codec_realtek snd_soc_sst_ipc snd_soc_acpi snd_hda_codec_generic wmi_bmof snd_soc_core intel_wmi_thunderbolt snd_compress snd_pcm_dmaengine ac97_bus irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlwifi snd_hda_intel btusb intel_cstate btrtl btbcm snd_hda_codec btintel intel_uncore intel_rapl_perf bluetooth snd_hda_core tpm_crb cfg80211 snd_hwdep thunderbolt uvcvideo snd_seq joydev snd_seq_device videobuf2_vmalloc snd_pcm videobuf2_memops nvmem_core videobuf2_v4l2 videobuf2_common videodev idma64 mei_me thinkpad_acpi media processor_thermal_device tpm_tis snd_timer tpm_tis_core i2c_i801 intel_lpss_pci mei ucsi_acpi shpchp intel_soc_dts_iosf tpm intel_lpss intel_pch_thermal snd ecdh_generic typec_ucsi
[17391.675089]  typec wmi soundcore pinctrl_sunrisepoint int3403_thermal rfkill int340x_thermal_zone pinctrl_intel int3400_thermal acpi_pad acpi_thermal_rel btrfs xor zstd_decompress zstd_compress xxhash uas usb_storage raid6_pq i915 i2c_algo_bit drm_kms_helper e1000e drm nvme ptp nvme_core crc32c_intel serio_raw pps_core video
[17391.675106] CPU: 7 PID: 29122 Comm: kworker/u16:0 Not tainted 4.16.0-300.fc28.x86_64 #1
[17391.675107] Hardware name: LENOVO 20KG0027GE/20KG0027GE, BIOS N23ET33W (1.08 ) 01/22/2018
[17391.675111] Workqueue: pciehp-1 pciehp_power_thread
[17391.675115] RIP: 0010:tty_unregister_driver+0x9/0x80
[17391.675116] RSP: 0018:ffffb85146cdfcc8 EFLAGS: 00010246
[17391.675118] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[17391.675119] RDX: ffff88bc09d59fc0 RSI: ffffdb318e722640 RDI: 0000000000000000
[17391.675121] RBP: ffff88bbd92dc230 R08: ffff88bb1c8993b8 R09: 00000001801e001b
[17391.675122] R10: ffff88bb0e6124a8 R11: 0000000000000000 R12: ffff88bbd92dc000
[17391.675123] R13: ffff88bbd92dc398 R14: 0000000000000060 R15: ffff88babe91f290
[17391.675125] FS:  0000000000000000(0000) GS:ffff88bc215c0000(0000) knlGS:0000000000000000
[17391.675126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17391.675127] CR2: 0000000000000034 CR3: 00000003e720a002 CR4: 00000000003606e0
[17391.675129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17391.675130] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17391.675131] Call Trace:
[17391.675136]  xhci_dbc_tty_unregister_driver+0x11/0x30
[17391.675139]  xhci_dbc_exit+0x2a/0x40
[17391.675142]  xhci_stop+0x50/0x1c0
[17391.675144]  usb_remove_hcd+0xf9/0x240
[17391.675147]  usb_hcd_pci_remove+0x67/0x130
[17391.675150]  pci_device_remove+0x3b/0xb0
[17391.675152]  device_release_driver_internal+0x15a/0x220
[17391.675156]  pci_stop_bus_device+0x80/0xa0
[17391.675158]  pci_stop_bus_device+0x2b/0xa0
[17391.675160]  pci_stop_bus_device+0x3c/0xa0
[17391.675162]  pci_stop_and_remove_bus_device+0xe/0x20
[17391.675164]  pciehp_unconfigure_device+0xb8/0x160
[17391.675166]  pciehp_disable_slot+0x51/0xd0
[17391.675169]  pciehp_power_thread+0x82/0xa0
[17391.675171]  process_one_work+0x187/0x340
[17391.675173]  worker_thread+0x2e/0x380
[17391.675176]  ? pwq_unbound_release_workfn+0xd0/0xd0
[17391.675178]  kthread+0x112/0x130
[17391.675181]  ? kthread_create_worker_on_cpu+0x70/0x70
[17391.675183]  ? do_syscall_64+0x74/0x180
[17391.675186]  ? SyS_exit_group+0x10/0x10
[17391.675188]  ret_from_fork+0x35/0x40
[17391.675190] Code: 31 e4 e8 4b 0b dd ff 48 83 4d 68 01 eb a6 e8 1f 2e b7 ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 89 fb <8b> 77 34 8b 7f 2c c1 e7 14 0b 7b 30 e8 46 2f d5 ff 48 c7 c7 60 
[17391.675816] RIP: tty_unregister_driver+0x9/0x80 RSP: ffffb85146cdfcc8
[17391.675817] CR2: 0000000000000034
[17391.675819] ---[ end trace ee00b4bf1a911772 ]---
[17392.111392] thinkpad_acpi: EC reports that Thermal Table has changed
[17392.259238] thinkpad_acpi: undocked from hotplug port replicator
[17392.529587] thinkpad_acpi: EC reports that Thermal Table has changed
[17393.199924] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.

Comment 1 Christian Kellner 2018-04-09 13:17 UTC
Created attachment 1419256 [details]
dmesg another crash

Just happened again, directly after plugging the device in.

Comment 2 Christian Kellner 2018-04-12 14:48:56 UTC
Having talked to Mika from Intel he suggested this due to a bug in the DBC driver and to work around it for now by disabling CONFIG_USB_XHCI_DBGCAP 
As this is related to thunderbolt native enumeration he also suggested to include the fowlling patch set https://www.spinics.net/lists/linux-pci/msg71006.html

Comment 3 Christian Kellner 2018-04-12 18:35:33 UTC
Tested locally to confirm that disabling CONFIG_USB_XHCI_DBGCAP works around the kernel oops.

Comment 4 Christian Kellner 2018-04-13 16:00:15 UTC
The proper fix has been submitted as https://patchwork.kernel.org/patch/10340045/

Comment 5 Laura Abbott 2018-04-16 16:52:59 UTC
*** Bug 1562991 has been marked as a duplicate of this bug. ***

Comment 6 Adam Williamson 2018-04-24 14:39:01 UTC
We should definitely document this for F28. It'd actually have been good to fix it, but we're *really* late now unless we slip. I'm proposing it as an FE in case we do slip and can pull it in.

Comment 7 Stephen Gallagher 2018-04-25 11:53:23 UTC
+1 to FE if we end up slipping. Looking at the patch, it appears to be very non-intrusive.

Comment 8 Fedora Update System 2018-04-28 16:13:22 UTC
kernel-4.16.5-200.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-09afae3bb9

Comment 9 Fedora Update System 2018-04-28 16:14:23 UTC
kernel-4.16.5-300.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-a9d6bb6a8e

Comment 10 Fedora Update System 2018-04-29 09:41:54 UTC
kernel-4.16.5-300.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-a9d6bb6a8e

Comment 11 Fedora Update System 2018-04-29 14:29:43 UTC
kernel-4.16.5-200.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-09afae3bb9

Comment 12 Fedora Update System 2018-04-30 16:37:02 UTC
kernel-4.16.5-200.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Adam Williamson 2018-04-30 16:45:09 UTC
This was reported for F28, not F27, so re-opening.

Comment 14 Fedora Update System 2018-04-30 21:18:28 UTC
kernel-4.16.5-300.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.