Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1578309

Summary: Thunderbolt 3 docking station hangs at boot-time during xhci initialization
Product: Red Hat Enterprise Linux 7 Reporter: Vratislav Bendel <vbendel>
Component: kernelAssignee: Jarod Wilson <jarod>
kernel sub component: Thunderbolt QA Contact: Ken Benoit <kbenoit>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aklimov, cww, kyin, vbendel
Version: 7.5   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-31 21:51:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1663539    

Description Vratislav Bendel 2018-05-15 09:02:41 UTC
Description of problem:
A customer reported that a Thunderbolt 3 docking station doesn't work properly with Rhel 7.5 (or Rhel 7.4).
From certain points of view this looks like a firmware problem and apparently it got somehow resolved with BIOS update and enabling a 'Fast boot' option in the HP BIOS. 

Purpose of this BZ is mainly to investigate this properly and determine, whether this is/was indeed a firmware issue or if there might be also some problem in our kernel/thunderbolt-driver.

Please see Additional info below for more details.

Hardware environment:
Laptop: HP ZBook G4 15inch
Thunderbolt Dock: P5Q58AA

Version-Release number of selected component (if applicable):
kernel-3.10.0-862.el7.x86

How reproducible:
Always (at least at customer's site)

Steps to Reproduce:
Boot with the TB3 docking station connected.

Actual results:
The boot process hangs for a while, after finally booting, devices connected to the dock don't work.

Expected results:
Boot without hang and everything works fine


Additional info:

From what the customer have tested so far, apparently the docking station gets discovered as usb hub with number 3 (and also apparently 4) as those devices are not discovered during boot _without_ the dock connected.

The problem probably is denoted by the following messages:
~~~~
Apr 19 13:32:21 <hostname> kernel: usb 3-1: new high-speed USB device number 2 using xhci_hcd
...
Apr 19 13:32:27 <hostname> kernel: xhci_hcd 0000:3d:00.0: Timeout while waiting for setup device command
Apr 19 13:32:27 <hostname> kernel: usb 3-1: hub failed to enable device, error -62
~~~~
(-ETIME == -62)

To me this seems like the device usb-3-1 either never received the command or never replied or never finished initialization, however I'm not too familiar in this area, so I might got it wrong.

Also there were later reported 'hung_tasks' on procedure paths xhci_alloc_dev(), which should held a mutex and xhci_setup_device(), which waited for a mutex:
~~~~
Apr 19 13:36:14 <hostname> kernel: INFO: task kworker/0:1:55 blocked for more than 120 seconds.
Apr 19 13:36:14 <hostname> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this      message.
Apr 19 13:36:14 <hostname> kernel: kworker/0:1     D ffff8808a9275ee0     0    55      2 0x00000000
Apr 19 13:36:14 <hostname> kernel: Workqueue: usb_hub_wq hub_event
Apr 19 13:36:14 <hostname> kernel: Call Trace:
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacca37f2>] ? del_timer_sync+0x52/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad313e69>] schedule_preempt_disabled+0x29/0x70
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad311c27>] __mutex_lock_slowpath+0xc7/0x1d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0ffd6c>] ? xhci_discover_or_reset_device+0x11c/0x580
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad31100f>] mutex_lock+0x1f/0x2f
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0f9ab2>] xhci_setup_device+0x62/0x7b0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0bda84>] ? hub_port_reset+0x464/0x680
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0fa213>] xhci_address_device+0x13/0x20
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0be06b>] hub_port_init+0x3cb/0xb80
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad07e8d9>] ? update_autosuspend+0x39/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad07e945>] ? pm_runtime_set_autosuspend_delay+0x45/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c1618>] hub_port_connect+0x158/0x9d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c25cf>] hub_event+0x73f/0xb60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb2dff>] process_one_work+0x17f/0x440
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb3ac6>] worker_thread+0x126/0x3c0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbae31>] kthread+0xd1/0xe0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad31f61d>] ret_from_fork_nospec_begin+0x7/0x21
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
Apr 19 13:36:14 <hostname> kernel: INFO: task kworker/0:2:62 blocked for more than 120 seconds.
Apr 19 13:36:14 <hostname> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this      message.
Apr 19 13:36:14 <hostname> kernel: kworker/0:2     D ffff8817ad62cf10     0    62      2 0x00000000
Apr 19 13:36:14 <hostname> kernel: Workqueue: usb_hub_wq hub_event
Apr 19 13:36:14 <hostname> kernel: Call Trace:
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccd92c8>] ? check_preempt_wakeup+0x148/0x250
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad312f49>] schedule+0x29/0x70
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad3108b9>] schedule_timeout+0x239/0x2c0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacdf62b1>] ? __slab_free+0x81/0x2f0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad3132fd>] wait_for_completion+0xfd/0x140
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacccee80>] ? wake_up_state+0x20/0x20
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0f891e>] xhci_alloc_dev+0xee/0x2d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0bb245>] usb_alloc_dev+0x75/0x340
Apr 19 13:36:14 <hostname> kernel: [<ffffffffacf4cc08>] ? kobject_put+0x28/0x60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c1753>] hub_port_connect+0x293/0x9d0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad0c25cf>] hub_event+0x73f/0xb60
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb2dff>] process_one_work+0x17f/0x440
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb3ac6>] worker_thread+0x126/0x3c0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccb39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbae31>] kthread+0xd1/0xe0
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
Apr 19 13:36:14 <hostname> kernel: [<ffffffffad31f61d>] ret_from_fork_nospec_begin+0x7/0x21
Apr 19 13:36:14 <hostname> kernel: [<ffffffffaccbad60>] ? insert_kthread_work+0x40/0x40
~~~~

Additionally, this hang was observed both on RHEL 7.4 and 7.5, where the difference was only that on 7.5, the device apparently got discovered as thunderbolt 3, denoted by the following messages logged after the usb3 discovery: 
(usb 3/4 also discovered at Apr 19 13:32:21)
~~~~
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0: current switch config:
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:  Switch: 8086:15d3 (Revision: 6,  TB Versi
on: 2)
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Max Port Number: 11 
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Config:
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:    Upstream Port Number: 5 Depth: 0 Route String: 0x0 Enabled: 1, PlugEventsDelay: 254ms
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:    unknown1: 0x0 unknown4: 0x0
Apr 19 13:32:21 <hostname> kernel: TECH PREVIEW: Thunderbolt 3 may not be fully supported.     #012Please review provided documentation for limitations.
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0: 0: Thunderbolt HW version         detected: 3
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0: 0: uid: 0xf037df7cc07200
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:  Port 0: 8086:15d3 (Revision: 6,  TB Version: 1, Type: Port (0x1))
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Max hop id (in/out): 7/7
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   Max counters: 8
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:   NFC Credits: 0x800000
Apr 19 13:32:21 <hostname> kernel: thunderbolt 0000:06:00.0:  Port 1: 8086:15d3 (Revision: 6,  TB Version: 1, Type: Port (0x1))
----[ rest omitted ]----
~~~~
Note that these messages are seen when booting _with_ the dock connected and only on Rhel 7.5 (7.4 doesn't have the driver yet).

Since the hang is seen the same with and without the thunderbolt driver messages, in my opinion the problem is not with the tech_preview driver.

Last but not least, the customer has installed updates for AMD and the BIOS, and after some investigation with HP, they came to the following resolution:
~~~~
<cite>
1. There is a BIOS option in the G4 laptop called "Fast Boot". Enabling this setting causes the system to boot quickly and the USB ports on the dock work as expected. So, seems like this flag bypasses some checks and allow the USB ports on the dock to be available after boot. I can now have the dock connected and start a cold boot and the system now comes up quickly with the USB ports enabled.

2. The Video graphics driver setting was initially at "Auto". This was suggested to be set at "Discrete Graphics". Changing to this setting most likely causes the NVIDIA card to be "always" used for all graphics needs. But, in this setting, when shutting down the system, there would be a double beep. HP suggested to upgrade NVIDIA driver from 390.42 to 390.48. After this driver upgrade, the double beep problem is gone.

So, the problems described in this ticket on RHEL74 setup now seem resolved, with the fixes being:
+ Enable "Fast Boot" in BIOS.
+ Upgrade NVIDIA driver to 390.48 version.
</cite>
~~~~

The customer confirmed that while using this solution the hardware works as expected both on RHEL 7.4 and 7.5.
~~~~
<cite>
> In the meantime, can you please confirm that the dock works properly as expected with the 'Fast Boot' option?
Yes, I am using the Dock with 2 USB and 2 Display port devices attached and it works fine on RHEL74. Yesterday, I tested this setup using the RHEL75 HDD and it worked just the same. I did install the latest NVIDIA driver on RHEL75 too, so there was no double beep during shutdown.
</cite>
~~~~