Created attachment 1699965 [details] dmesg from bootup, suspend attempt 1. Please describe the problem: My system (based on an Asus PRIME H270-PRO motherboard) fails to suspend properly under 5.7 kernels. It starts to suspend but then immediately wakes back up again. I also noticed a bunch of PCIe AER error spam in dmesg that did not occur with 5.6-based kernels, for example: [ 12.909890] pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 12.909890] pcieport 0000:00:1c.0: AER: device [8086:a292] error status/mask=00003000/00002000 [ 12.909891] pcieport 0000:00:1c.0: AER: [12] Timeout [ 12.909896] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 [ 12.909899] pcieport 0000:00:1c.0: AER: can't find device of ID00e0 [ 12.909900] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 [ 12.909902] pcieport 0000:00:1c.0: AER: can't find device of ID00e0 [ 12.909903] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 [ 12.909906] pcieport 0000:00:1c.0: AER: can't find device of ID00e0 [ 12.910012] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 [ 12.910015] pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 12.910015] pcieport 0000:00:1c.0: AER: device [8086:a292] error status/mask=00001000/00002000 [ 12.910016] pcieport 0000:00:1c.0: AER: [12] Timeout [ 12.910020] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 [ 12.910023] pcieport 0000:00:1c.0: AER: can't find device of ID00e0 [ 12.910157] pcieport 0000:00:1c.0: AER: Multiple Corrected error received: 0000:00:1c.0 Device 1c.0 is a PCI Express root port, which is connected to an ASMedia PCIe to PCI bridge: 00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0) 02:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 04) 2. What is the Version-Release number of the kernel: kernel-5.7.7-200.fc32.x86_64 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : kernel-5.7.6-201.fc32.x86_64 was the first version I have seen that had the problem. kernel-5.6.19-300.fc32.x86_64 works fine. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Fails every time on this system. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Have not tried 6. Are you running any modules that not shipped with directly Fedora's kernel?: No 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag.
It appears that the AER errors are related to the suspend failure, as suspend works if the pci=noaer option is added to the kernel command line. I am guessing that these errors occurring during the suspend process are causing the machine to immediately wake up again.
Reported to LKML: https://lkml.org/lkml/2020/7/10/1267
As I posted on LKML, it seems that the issue may have been caused by an upstream change that went into the 5.7 stable series to enable PCIe ASPM on PCIe to PCI bridges: commit 66ff14e59e8a30690755b08bc3042359703fb07a Author: Kai-Heng Feng <kai.heng.feng> Date: Wed May 6 01:34:21 2020 +0800 PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for Linux to enable ASPM, but for some undocumented reason, it didn't enable ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge. Remove this exclusion so we can enable ASPM on these links. The Dell OptiPlex 7080 mentioned in the bugzilla has a TI XIO2001 PCIe-to-PCI Bridge. Enabling ASPM on the link leading to it allows the Intel SoC to enter deeper Package C-states, which is a significant power savings. [bhelgaas: commit log] Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207571 Link: https://lore.kernel.org/r/20200505173423.26968-1-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng> Signed-off-by: Bjorn Helgaas <bhelgaas> Reviewed-by: Mika Westerberg <mika.westerberg.com> Disabling ASPM manually on this ASMedia bridge device as well as the PCIe root port it is connected to seems to resolve the problem: setpci -s 00:1c.0 0x50.B=0x00 setpci -s 02:00.0 0x90.B=0x00
Patch submitted upstream: https://patchwork.ozlabs.org/project/linux-pci/patch/20200722021803.17958-1-hancockrwd@gmail.com/
Patch has been merged into mainline: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b361663c5a40c8bc758b7f7f2239f7a192180e7c I have nominated it for stable kernels as well, as the previous patch that exposed the issue was added to stable.
Fixed in build kernel-5.7.14-200.fc32: https://koji.fedoraproject.org/koji/buildinfo?buildID=1586714