Bug 1578309 - Thunderbolt 3 docking station hangs at boot-time during xhci initialization
Summary: Thunderbolt 3 docking station hangs at boot-time during xhci initialization
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jarod Wilson
QA Contact: Ken Benoit
URL:
Whiteboard:
Depends On:
Blocks: 1663539
TreeView+ depends on / blocked
 
Reported: 2018-05-15 09:02 UTC by Vratislav Bendel
Modified: 2022-03-13 14:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-31 21:51:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Vratislav Bendel 2018-05-15 09:02:41 UTC
Description of problem:
A customer reported that a Thunderbolt 3 docking station doesn't work properly with Rhel 7.5 (or Rhel 7.4).
From certain points of view this looks like a firmware problem and apparently it got somehow resolved with BIOS update and enabling a 'Fast boot' option in the HP BIOS. 

Purpose of this BZ is mainly to investigate this properly and determine, whether this is/was indeed a firmware issue or if there might be also some problem in our kernel/thunderbolt-driver.

Please see Additional info below for more details.

Hardware environment:
Laptop: HP ZBook G4 15inch
Thunderbolt Dock: P5Q58AA

Version-Release number of selected component (if applicable):
kernel-3.10.0-862.el7.x86

How reproducible:
Always (at least at customer's site)

Steps to Reproduce:
Boot with the TB3 docking station connected.

Actual results:
The boot process hangs for a while, after finally booting, devices connected to the dock don't work.

Expected results:
Boot without hang and everything works fine


Additional info:

From what the customer have tested so far, apparently the docking station gets discovered as usb hub with number 3 (and also apparently 4) as those devices are not discovered during boot _without_ the dock connected.

The problem probably is denoted by the following messages:
~~~~
Apr 19 13:32:21 <hostname> kernel: usb 3-1: new high-speed USB device number 2 using xhci_hcd
...
Apr 19 13:32:27 <hostname> kernel: xhci_hcd 0000:3d:00.0: Timeout while waiting for setup device command
Apr 19 13:32:27 <hostname> kernel: usb 3-1: hub failed to enable device, error -62
~~~~
(-ETIME == -62)

To me this seems like the device usb-3-1 either never received the command or never replied or never finished initialization, however I'm not too familiar in this area, so I might got it wrong.

Also there were later reported 'hung_tasks' on procedure paths xhci_alloc_dev(), which should held a mutex and xhci_setup_device(), which waited for a mutex:
~~~~
Apr 19 13:36:14 <hostname> kernel: INFO: task kworker/0:1:55 blocked for more than 120 seconds.
Apr 19 13:36:14 <hostname> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this      message.
Apr 19 13:36:14 <hostname> kernel: kworker/0:1     D ffff8808a9275ee0     0    55      2 0x00000000
Apr 19 13:36:14 <hostname> kernel: Workqueue: usb_hub_wq hub_event
Apr 19 13:36:14 <hostname> kernel: Call Trace:
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacca37f2>] ? del_timer_sync+0x52/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad313e69>] schedule_preempt_disabled+0x29/0x70
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad311c27>] __mutex_lock_slowpath+0xc7/0x1d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0ffd6c>] ? xhci_discover_or_reset_device+0x11c/0x580
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad31100f>] mutex_lock+0x1f/0x2f
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0f9ab2>] xhci_setup_device+0x62/0x7b0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0bda84>] ? hub_port_reset+0x464/0x680
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0fa213>] xhci_address_device+0x13/0x20
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0be06b>] hub_port_init+0x3cb/0xb80
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad07e8d9>] ? update_autosuspend+0x39/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad07e945>] ? pm_runtime_set_autosuspend_delay+0x45/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c1618>] hub_port_connect+0x158/0x9d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c25cf>] hub_event+0x73f/0xb60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb2dff>] process_one_work+0x17f/0x440
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb3ac6>] worker_thread+0x126/0x3c0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbae31>] kthread+0xd1/0xe0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad31f61d>] ret_from_fork_nospec_begin+0x7/0x21
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
Apr 19 13:36:14 <hostname> kernel: INFO: task kworker/0:2:62 blocked for more than 120 seconds.
Apr 19 13:36:14 <hostname> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this      message.
Apr 19 13:36:14 <hostname> kernel: kworker/0:2     D ffff8817ad62cf10     0    62      2 0x00000000
Apr 19 13:36:14 <hostname> kernel: Workqueue: usb_hub_wq hub_event
Apr 19 13:36:14 <hostname> kernel: Call Trace:
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccd92c8>] ? check_preempt_wakeup+0x148/0x250
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad312f49>] schedule+0x29/0x70
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad3108b9>] schedule_timeout+0x239/0x2c0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacdf62b1>] ? __slab_free+0x81/0x2f0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad3132fd>] wait_for_completion+0xfd/0x140
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacccee80>] ? wake_up_state+0x20/0x20
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0f891e>] xhci_alloc_dev+0xee/0x2d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0bb245>] usb_alloc_dev+0x75/0x340
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacf4cc08>] ? kobject_put+0x28/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c1753>] hub_port_connect+0x293/0x9d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c25cf>] hub_event+0x73f/0xb60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb2dff>] process_one_work+0x17f/0x440
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb3ac6>] worker_thread+0x126/0x3c0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbae31>] kthread+0xd1/0xe0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad31f61d>] ret_from_fork_nospec_begin+0x7/0x21
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
~~~~

Additionally, this hang was observed both on RHEL 7.4 and 7.5, where the difference was only that on 7.5, the device apparently got discovered as thunderbolt 3, denoted by the following messages logged after the usb3 discovery: 
(usb 3/4 also discovered at Apr 19 13:32:21)
~~~~
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0: current switch config:
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:  Switch: 8086:15d3 (Revision: 6,  TB Versi
on: 2)
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Max Port Number: 11 
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Config:
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:    Upstream Port Number: 5 Depth: 0 Route String: 0x0 Enabled: 1, PlugEventsDelay: 254ms
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:    unknown1: 0x0 unknown4: 0x0
Apr 19 13:32:21 <hostname> kernel: TECH PREVIEW: Thunderbolt 3 may not be fully supported.     #012Please review provided documentation for limitations.
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0: 0: Thunderbolt HW version         detected: 3
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0: 0: uid: 0xf037df7cc07200
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:  Port 0: 8086:15d3 (Revision: 6,  TB Version: 1, Type: Port (0x1))
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Max hop id (in/out): 7/7
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Max counters: 8
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   NFC Credits: 0x800000
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:  Port 1: 8086:15d3 (Revision: 6,  TB Version: 1, Type: Port (0x1))
----[ rest omitted ]----
~~~~
Note that these messages are seen when booting _with_ the dock connected and only on Rhel 7.5 (7.4 doesn't have the driver yet).

Since the hang is seen the same with and without the thunderbolt driver messages, in my opinion the problem is not with the tech_preview driver.

Last but not least, the customer has installed updates for AMD and the BIOS, and after some investigation with HP, they came to the following resolution:
~~~~
<cite>
1. There is a BIOS option in the G4 laptop called "Fast Boot". Enabling this setting causes the system to boot quickly and the USB ports on the dock work as expected. So, seems like this flag bypasses some checks and allow the USB ports on the dock to be available after boot. I can now have the dock connected and start a cold boot and the system now comes up quickly with the USB ports enabled.

2. The Video graphics driver setting was initially at "Auto". This was suggested to be set at "Discrete Graphics". Changing to this setting most likely causes the NVIDIA card to be "always" used for all graphics needs. But, in this setting, when shutting down the system, there would be a double beep. HP suggested to upgrade NVIDIA driver from 390.42 to 390.48. After this driver upgrade, the double beep problem is gone.

So, the problems described in this ticket on RHEL74 setup now seem resolved, with the fixes being:
+ Enable "Fast Boot" in BIOS.
+ Upgrade NVIDIA driver to 390.48 version.
</cite>
~~~~

The customer confirmed that while using this solution the hardware works as expected both on RHEL 7.4 and 7.5.
~~~~
<cite>
> In the meantime, can you please confirm that the dock works properly as expected with the 'Fast Boot' option?
Yes, I am using the Dock with 2 USB and 2 Display port devices attached and it works fine on RHEL74. Yesterday, I tested this setup using the RHEL75 HDD and it worked just the same. I did install the latest NVIDIA driver on RHEL75 too, so there was no double beep during shutdown.
</cite>
~~~~


Note You need to log in before you can comment on or make changes to this bug.