Description of problem: kernel-core-5.1.20-300.fc30.x86_64 fails to wake from suspend (STR), 5.1.19-300.fc30 did fine on the same computer. 5.2.5-200.fc30 fails in a different way. Version-Release number of selected component (if applicable): 5.1.20-300.fc30.x86_64 How reproducible: always Steps to Reproduce: 1. boot and log into GNOME desktop 2. click pause symbol to suspend the computer to RAM, wait until suspended 3. press key on keyboard, or power button Actual results: computer tries to wake up, HDD LED blinks a bit, but console does not wake. Other computer on network cannot ping the waking computer. Expected results: computer wakes up properly as it used to do with kernel-core-5.1.19-300.fc30.x86_64 and earlier kernels. Additional info: PM tracing was enabled, the next boot returned [ 0.827930] PM: hash matches drivers/base/power/main.c:1021 It appears that suspend to disk still works. Computer has an NVIDIA GeForce 1060 PCIe graphics board, but 5.1.19 and prior would suspend properly, and the 5.1.20 and 5.2.5 suspend issues also occur if nvidia kernel modules are renamed out of the way and nouveau remains blocked, so it's not an nvidia driver issue.
Note this is reproducible without nvidia proprietary/binary drivers, and that for me, the new nvidia binary driver continues to suspend 5.1.19 properly. Note that Bugzilla search didn't turn up 1735786 when I searched for regression or suspend bugs... sorry. Setting Depends:.
Noting that with kernel 5.2.5, on 2nd resume, for me it is possible for the desktop GUI to show up but all windows and gnome are frozen. If browser is open with a page then can scroll browser page but cannot do anything else besides this. Must force reboot computer. Nouveau.
I have "git bisect"ed this on the vanilla stable kernel, the stable/linux-5.1.y branch (because I have had starting points 5.1.19 and 5.1.20 there). The failure-inducing commit on the branch is 3c795a8e3481e4dec071b5956e7177e816f6e7f1 (see below), which got picked from master's c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5, (merged through cf2d213e49fdf47e4c10dc629a3659e0026a54b8, v5.3-rc1~167) and also got picked to stable/linux-5.2.y 5817d78eba34f6c86f5462ae2c5212f80a013357 (v5.2.3~291). Sasha Levin's signoff is only on the stable branches, not on master. ------------------------------------------------------------ commit 3c795a8e3481e4dec071b5956e7177e816f6e7f1 (refs/bisect/bad) Author: Mika Westerberg <mika.westerberg.com> 2019-06-12 12:57:38 Committer: Greg Kroah-Hartman <gregkh> 2019-07-26 09:12:37 Parent: 70cc29dba925b8a99a4917c2b5fa6702d0d496d1 (bpf: fix callees pruning callers) Child: a98c15177f72ae3c0a736bb324e66c279bf94899 (net: netsec: initialize tx ring on ndo_open) Branch: remotes/stable/linux-5.1.y Follows: v5.1.19 Precedes: v5.1.20 PCI: Add missing link delays required by the PCIe spec [ Upstream commit c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5 ] Currently Linux does not follow PCIe spec regarding the required delays after reset. A concrete example is a Thunderbolt add-in-card that consists of a PCIe switch and two PCIe endpoints: +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller +-01.0-[04-36]-- DS hotplug port +-02.0-[37]----00.0 xHCI controller \-04.0-[38-6b]-- DS hotplug port The root port (1b.0) and the PCIe switch downstream ports are all PCIe gen3 so they support 8GT/s link speeds. We wait for the PCIe hierarchy to enter D3cold (runtime): pcieport 0000:00:1b.0: power state changed by ACPI to D3cold When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the PCIe switch is put to reset and its power is re-applied. This means that we must follow the rules in PCIe 4.0 section 6.6.1. [...] Signed-off-by: Mika Westerberg <mika.westerberg.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki> Signed-off-by: Sasha Levin <sashal> drivers/pci/pci.c | 29 +++++++++++++++++++---------- drivers/pci/pci.h | 1 + drivers/pci/pcie/portdrv_core.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+), 10 deletions(-)
It sounds like this was meant for PCIe 4.0 hardware? I have PCIe 3.0 motherboard https://www.msi.com/Motherboard/B350M-PRO-VDH/Specification and gpu should be as well. Interesting.
MSI X370 SLI PLUS (alias MS-7A33) here, with Ryzen 7 1700 and Zotac-based NVIDIA GeForce 1060-6GB, so no PCIe 4.0 HW anywhere.
Can someone test this with the 5.3 rc kernel and see if it is broken there, or if we need to pull in an additional patch?
Justin, I did yesterday on a vanilla 5.3-rc2 so I could report where the error came from, see https://bugzilla.kernel.org/show_bug.cgi?id=204413#c2 (link was in external trackers already): 5.2.5 and 5.3-rc2 also needs a "git revert" of the offending patch for me. Bjorn Helgaas pointed Mika Westerberg to my report, see https://www.spinics.net/lists/linux-pci/msg85535.html I haven't tested with Fedora-derived kernels other than the broken kernel-core-5.1.20-300.fc30.x86_64 from the @updates yet, since the vanilla kernel fails in the same manner as Fedora's 5.1.20. Let me know if (a) the Fedora kernel is worth testing nonetheless and (b) if yes, whether the instructions in https://fedoraproject.org/wiki/Building_a_custom_kernel#Building_a_kernel_from_the_exploded_git_trees are still current. Note: PM testing per 01.org instructions was fruitless, you need to do the real thing, meaning "systemctl suspend" and wakeup to trigger the bug. Any pm_test setting but "none" will mask it.
(In reply to Justin M. Forbes from comment #6) > Can someone test this with the 5.3 rc kernel and see if it is broken there, > or if we need to pull in an additional patch? $ curl -s https://repos.fedorapeople.org/repos/thl/kernel-vanilla.repo | sudo tee /etc/yum.repos.d/kernel-vanilla.repo $ dnf --enablerepo=kernel-vanilla-mainline update reboot and use kernel 5.3.0-0.rc2.git4.1.vanilla.knurd.1.fc30 x86_64... suspend 2x and 2nd suspend fails exactly the same for me. No difference.
FEDORA-2019-a7f551b8c9 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-a7f551b8c9
Ouch. Note that upstream has chosen to remove an offending commit, see <https://bugzilla.kernel.org/show_bug.cgi?id=204413#c12>
Warning, Linux stable 5.2.6 and 5.2.7 still have this regression.
kernel-core-5.2.6-200.fc30.x86_64 seems to work for me, apparently survives two suspend/resume cycles without ill effect.
Yes, so far it does here as well =)
kernel-5.2.6-200.fc30, kernel-headers-5.2.6-200.fc30, kernel-tools-5.2.6-200.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-a7f551b8c9
Alright, so 5.2.6-200 hasn't had a suspend issue still. I consider it fixed. Thank you Matthias and all others who helped fix it :-)
kernel-5.2.6-200.fc30, kernel-headers-5.2.6-200.fc30, kernel-tools-5.2.6-200.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
Fedora's 5.2.6-200.fc30.x86_64 can properly resume-from-suspend again for me (because Justin reverted the offending patch), the upstream vanilla kernel has the revert queued up for 5.2.9 and 5.3-rc4. Upstream debugging still ongoing, see kernel.org bug.
This message is a reminder that Fedora 30 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '30'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 30 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
5.6.6-200.fc31.x86_64 also works. What's necessary to close this bug with a proper status? "CLOSED-FIXED" isn't available to me (probably due to workflow restrictions in the bugzilla configuration)
Since this is fixed by a (kernel) update, the proper resolution is ERRATA, I'll close it with this resolution right away.