Bug 1196943 - USB autosuspend massively breaks xHCI on AMD APU chipsets
Summary: USB autosuspend massively breaks xHCI on AMD APU chipsets
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-27 04:52 UTC by James Ralston
Modified: 2017-12-12 10:10 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-12-12 10:10:46 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
full dmesg output (98.16 KB, text/plain)
2015-02-27 04:52 UTC, James Ralston
no flags Details
output from "lspci -vvxxx" (70.99 KB, text/plain)
2015-02-27 04:53 UTC, James Ralston
no flags Details

Description James Ralston 2015-02-27 04:52:26 UTC
Created attachment 995907 [details]
full dmesg output

Description of problem:

I have Fedora 20 running on an ASUS A88XM-E mainboard with a Lian Li PC-9F case. The case has two front-mounted USB 3.0 ports, which I have connected to the USB 3.0 header on the mainboard.

When I initially connect a device (regardless of speed) to either of the case-mounted USB 3.0 ports, it works fine.

However, when I disconnect the device, the xhci_hcd HC dies, rendering the USB ports unusable until I reboot.

Version-Release number of selected component (if applicable):

3.18.7-100.fc20.x86_64

How reproducible:

Plug in a device to either of the case-mounted USB 3.0 ports, then unplug it.

Steps to Reproduce:
1. Plug in a device to either of the case-mounted USB 3.0 ports.
2. Unplug the device.

Actual results:

Boom goes the dynamite.

Expected results:

The xhci_hcd HC should not die.

Additional info:

Here's the dmesgs from the device connect:

[164497.533849] usb 3-2: new high-speed USB device number 10 using xhci_hcd
[164497.700493] usb 3-2: New USB device found, idVendor=0781, idProduct=5571
[164497.700503] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[164497.700508] usb 3-2: Product: Cruzer Fit
[164497.700512] usb 3-2: Manufacturer: SanDisk
[164497.700515] usb 3-2: SerialNumber: 4C532000040813108072
[164498.448412] usb-storage 3-2:1.0: USB Mass Storage device detected
[164498.448837] scsi host8: usb-storage 3-2:1.0
[164498.448995] usbcore: registered new interface driver usb-storage
[164498.468899] usbcore: registered new interface driver uas
[164499.451330] scsi 8:0:0:0: Direct-Access     SanDisk  Cruzer Fit       1.26 PQ: 0 ANSI: 6
[164499.452545] sd 8:0:0:0: Attached scsi generic sg2 type 0
[164499.457400] sd 8:0:0:0: [sdb] 62530624 512-byte logical blocks: (32.0 GB/29.8 GiB)
[164499.459638] sd 8:0:0:0: [sdb] Write Protect is off
[164499.459651] sd 8:0:0:0: [sdb] Mode Sense: 43 00 00 00
[164499.460101] sd 8:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[164499.472639]  sdb: sdb1
[164499.474381] sd 8:0:0:0: [sdb] Attached SCSI disk
[164551.092829] SELinux: initialized (dev sdb1, type vfat), uses genfs_contexts

(Again, it doesn't matter what the device is, or whether the device is a USB 3.0 device; any device triggers the bug.)

Here's what happens when I disconnect the device (in this case, by selecting "Safely remove drive" in MATE caja):

[164556.926312] usb 3-2: USB disconnect, device number 10
[164561.936829] xhci_hcd 0000:00:10.1: Stopped the command ring failed, maybe the host is dead
[164561.936873] xhci_hcd 0000:00:10.1: Host not halted after 16000 microseconds.
[164561.936880] xhci_hcd 0000:00:10.1: Abort command ring failed
[164561.936888] xhci_hcd 0000:00:10.1: HC died; cleaning up

At this point, the HC is dead until I reboot.

I updated the firmware of the mainboard to the latest available, and it didn't change or resolve the problem.

Comment 1 James Ralston 2015-02-27 04:53:09 UTC
Created attachment 995908 [details]
output from "lspci -vvxxx"

Comment 2 James Ralston 2015-02-27 05:03:02 UTC
If there's any additional information that would be helpful in debugging this problem, let me know, and I'll see what I can do.

I know how to roll (and re-roll) RPMs, so if you can suggest a patch for this, I'll rebuild a local kernel RPM with the patch and give it a whirl.

Comment 3 Fedora Kernel Team 2015-04-28 18:31:05 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.19.5-100.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 4 James Ralston 2015-04-29 03:24:32 UTC
Yes, the bug is still present in 3.19.5-100.fc20.

Comment 5 Fedora End Of Life 2015-05-29 13:42:51 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Fedora End Of Life 2015-06-30 00:11:33 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 7 James Ralston 2015-07-09 17:53:17 UTC
Reopening, because this problem still exists in Fedora 22.

This problem is much more severe than I initially thought. The USB 3.0 ports are broken to the point of being unusable.

Specifically, almost all high-speed USB 3.0 devices simply fail to work.  On the A88X-E system (running Fedora 20 with kernel 4.0.4), here's what happens when I connect my LG G4 (not sure if first line is related, so I'm including it):

[  253.825901] pci_pm_runtime_suspend(): hcd_pci_runtime_suspend+0x0/0x50 returns -16
[  254.028227] usb 3-2: new high-speed USB device number 2 using xhci_hcd
[  254.144879] usb 3-2: device descriptor read/all, error -71
[  254.297946] usb 3-2: new high-speed USB device number 3 using xhci_hcd
[  254.413979] usb 3-2: device descriptor read/all, error -71
[  254.566667] usb 3-2: new high-speed USB device number 4 using xhci_hcd
[  254.581420] usb 3-2: device descriptor read/all, error -71
[  254.734489] usb 3-2: new high-speed USB device number 5 using xhci_hcd
[  254.748747] usb 3-2: device descriptor read/all, error -71
[  254.748782] usb usb3-port2: unable to enumerate USB device

In contrast, when I plug the LG G4 into my Intel-based laptop (running Fedora 22, also with the 4.0.4 kernel), it works perfectly:

[165797.389963] usb 1-6: new high-speed USB device number 22 using xhci_hcd
[165797.555755] usb 1-6: New USB device found, idVendor=1004, idProduct=633e
[165797.555778] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[165797.555783] usb 1-6: Product: LGE Android Phone
[165797.555786] usb 1-6: Manufacturer: LG Electronics Inc.
[165797.555789] usb 1-6: SerialNumber: LGH811df5dac44

Same kernel (or reasonably so), same device, yet the AMD system craps the bed, and the Intel system works perfectly.

As I already stated, when I unplug *ANY* USB device on the AMD systems, the xhci_hcd driver dies about 90% of the time:

[  259.747143] xhci_hcd 0000:00:10.1: Stopped the command ring failed, maybe the host is dead
[  259.747188] xhci_hcd 0000:00:10.1: Host not halted after 16000 microseconds.
[  259.747195] xhci_hcd 0000:00:10.1: Abort command ring failed
[  259.747205] xhci_hcd 0000:00:10.1: HC died; cleaning up

Again, in contrast, the Intel system works perfectly:

[165873.817070] usb 1-6: USB disconnect, device number 22

I've seen various mutterings that on AMD systems, I need to enable the IOMMU in the BIOS / UEFI in order for USB devices to work correctly, and pass various iommu= flags on the kernel command line.  But for all combinations of IOMMU settings I tried, the USB3 ports are still broken.  (Plus, if enable the IOMMU in the UEFI, the r8169 driver won't pass any traffic... but that's a different problem, sigh.)

Additionally, I have the exact same issue with another ASUS motherboard, an F2A85-M PRO (AMD A85X FCH / Hudson D4 chipset). Both systems are/were dual-boot Windows 7 systems, and I had no issues with the USB3 ports working under Windows.

Comment 8 James Ralston 2015-07-09 18:18:37 UTC
So, in summary:

1.  USB xHCI is badly broken on two different ASUSTek AMD APU motherboards, up through kernel 4.0.4: an A88XM-E (AMD A88X / Bolton D4 chipset), and an F2A85-M PRO (AMD A85X FCH / Hudson D4 chipset).

2.  I can recall no point at which USB xHCI actually worked correctly on either system.

3.  If I boot Windows 7, for both systems, the USB ports work perfectly.

4.  On my Latitude E6440 laptop (using the Intel QM87 Express chipset), also using the 4.0.4 kernel, USB xHCI support works perfectly.

These data strongly imply that the fault lies with the Linux xHCI driver. Specifically, one of the following is probably true:

1.  The xHCI driver is badly broken for the A85X and A88X AMD chipsets.

2.  The xHCI driver is badly broken for the A85X and A88X AMD chipsets on specific ASUSTek motherboards.

It may very well be that ASUSTek is doing something strange/wrong/stupid with their A85X/A88X implementations, and the reason I don't see the problem on Windows is because their drivers know how to cope with the behavior. (I'm opening a support request with them next.)

But in the meantime, the only thing I can point the finger at is the Linux xHCI driver, and I have an avalanche of data that suggest it's badly broken.

Could someone please, pretty please, take up this issue? This is a SHOWSTOPPER-class problem.

I posted on the linux-usb list, but so far, there have been no responses:

http://article.gmane.org/gmane.linux.usb.general/127847

Again: I know how to roll RPMs, and I'd be happy to roll my own kernel RPMs with patches to help isolate/debug this problem.

Comment 9 James Ralston 2015-07-09 20:55:57 UTC
To clarify: I tested Fedora 22 Live Workstation on the A88XM-E system, and it exhibits the same brokenness as does Fedora 20.

Comment 10 James Ralston 2015-08-15 22:28:29 UTC
Performing web searches revealed other people having the same problem. One person claimed that if he disabled CPU throttling, his USB ports worked normally. That didn't work for me, but it did put me on the right track: the problem appears to have to do with USB autosuspend.

At system boot, the kernel enables USB autosuspend by default on all USB hubs:

$ grep . /sys/bus/usb/devices/*/power/control
/sys/bus/usb/devices/1-1/power/control:on
/sys/bus/usb/devices/1-2/power/control:on
/sys/bus/usb/devices/7-1/power/control:on
/sys/bus/usb/devices/7-2/power/control:on
/sys/bus/usb/devices/usb1/power/control:auto
/sys/bus/usb/devices/usb2/power/control:auto
/sys/bus/usb/devices/usb3/power/control:auto
/sys/bus/usb/devices/usb4/power/control:auto
/sys/bus/usb/devices/usb5/power/control:auto
/sys/bus/usb/devices/usb6/power/control:auto
/sys/bus/usb/devices/usb7/power/control:auto
/sys/bus/usb/devices/usb8/power/control:auto

But if I manually disable USB autosuspend:

$ ( for F in /sys/bus/usb/devices/*/power/control; do echo on >"${F}"; done )
$ ep . /sys/bus/usb/devices/*/power/control
/sys/bus/usb/devices/1-1/power/control:on
/sys/bus/usb/devices/1-2/power/control:on
/sys/bus/usb/devices/7-1/power/control:on
/sys/bus/usb/devices/7-2/power/control:on
/sys/bus/usb/devices/usb1/power/control:on
/sys/bus/usb/devices/usb2/power/control:on
/sys/bus/usb/devices/usb3/power/control:on
/sys/bus/usb/devices/usb4/power/control:on
/sys/bus/usb/devices/usb5/power/control:on
/sys/bus/usb/devices/usb6/power/control:on
/sys/bus/usb/devices/usb7/power/control:on
/sys/bus/usb/devices/usb8/power/control:on

…then all of my USB problems vanish, and all devices work properly.

I don't know whether the USB autosuspend code is doing something wrong on these chipsets, or autosuspend is just plain broken on these chipsets. But regardless, disabling autosuspend is the work-around (if not the solution).

All of the host bridges in the lspci output I attached need to be blacklisted in the kernel.

Comment 11 Justin M. Forbes 2015-10-20 19:17:28 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 12 Fedora Kernel Team 2015-11-23 17:09:04 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 13 James Ralston 2015-11-23 19:44:18 UTC
Reopening, because this bug still exists with 4.2.3-200.fc22.

Comment 14 Fedora End Of Life 2016-07-19 20:21:05 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 15 James Ralston 2016-08-28 23:39:41 UTC
Reopening, as this is still an issue on Fedora 23.

Comment 16 Laura Abbott 2016-09-23 19:20:42 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.

Comment 17 James Ralston 2016-10-02 21:55:14 UTC
The problem persists with 4.7.5-100.fc23.

Comment 18 Fedora End Of Life 2016-11-24 11:29:58 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Fedora End Of Life 2016-12-20 13:17:44 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 20 James Ralston 2017-01-01 22:52:03 UTC
Reopening, as this is still an issue on Fedora 25.

Comment 21 Laura Abbott 2017-01-17 01:10:01 UTC
*********** MASS BUG UPDATE **************
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 25 kernel bugs.
 
Fedora 25 has now been rebased to 4.9.3-200.fc25.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.
 
If you experience different issues, please open a new bug report for those.

Comment 22 James Ralston 2017-01-17 06:17:19 UTC
Still broken.

Comment 23 Justin M. Forbes 2017-04-11 14:33:06 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 25 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-200.fc25.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 24 Justin M. Forbes 2017-04-28 17:04:36 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the 
relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 25 James Ralston 2017-05-13 22:41:41 UTC
Still broken on 4.10.15-200.

Comment 26 Fedora End Of Life 2017-11-16 19:30:30 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 27 Fedora End Of Life 2017-12-12 10:10:46 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.