Bug 2039621 - 5.15.14 kernel doesn't sleep properly on a laptop with an integrated Radeon R5 GPU
Summary: 5.15.14 kernel doesn't sleep properly on a laptop with an integrated Radeon R...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-12 05:56 UTC by Matt Fagnani
Modified: 2022-01-22 01:31 UTC (History)
19 users (show)

Fixed In Version: kernel-5.15.16-100.fc34 kernel-5.15.16-200.fc35
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-22 01:18:23 UTC
Type: Bug


Attachments (Terms of Use)
The journal from a boot where the kernel didn't sleep properly at the end and the system got stuck on a black screen (250.19 KB, text/plain)
2022-01-12 05:56 UTC, Matt Fagnani
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab drm amd issues 1858 0 None opened 5.15.14 kernel doesn't sleep properly on a laptop with an integrated Radeon R5 GPU 2022-01-12 05:56:55 UTC

Description Matt Fagnani 2022-01-12 05:56:56 UTC
Created attachment 1850260 [details]
The journal from a boot where the kernel didn't sleep properly at the end and the system got stuck on a black screen

1. Please describe the problem:

I updated to the 5.15.14 kernel from the updates-testing repo in a Fedora 35 KDE Plasma installation on an hp laptop with a AMD A10-9620P CPU and an integrated Radeon R5 GPU. When I've selected Sleep from sddm or Plasma 5.23.4 on Wayland with 5.15.14, the screen turned black, but the system didn't sleep properly. The power LED remained solid instead of turning to flashing as when normally sleeping. The fan became progressively louder over a few minutes. The system didn't wake up by moving the mouse or using the touchpad. Pressing sysrq+alt+r,e,i,s,u,b had no effect. I held the power button for 5 seconds to shut the system off after a few minutes. This problem happened 3 of 3 times with the 5.15.14 kernel. The end of the journal from when the sleep problem happened was the following.

Jan 11 23:12:06 systemd[1]: Reached target Sleep.
Jan 11 23:12:06 systemd[1]: Starting System Suspend...
Jan 11 23:12:06 systemd-sleep[1128]: Entering sleep state 'suspend'...
Jan 11 23:12:06 kernel: PM: suspend entry (deep)

I've attached the journal from a boot with the problem. The problem didn't happen with 5.15.5-5.15.13. The journal from when 5.15.13 slept normally showed many more kernel messages about various hardware suspending after those above.

The 5.15.14 changelog includes commits involving suspending and amdgpu https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15.14

b8a1293e38508a0f6664018a852f4585c22832a8 drm/amd/pm: keep the BACO feature enabled for suspend
19070d812e130c035eca07b9af9ed7867cd9df96 Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)"
f55383e6b92bb574edefd2cf23f74f133d4e1263 drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
3c196f05666610912645c7c5d9107706003f67c3 drm/amdgpu: always reset the asic in suspend (v2)
fbabb82b11b4fb5cd7824fbf9fa06deff9d2b13c drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
b4391e49ac1db37810ea8ea10362d0fe111d4f46 drm/amdgpu: disable runpm if we are the primary adapter
e24c6a48c6ea1b395a4c3144363ac8467ea936f7 Revert "i2c: core: support bus regulator controlling in adapter"

I can try to bisect between 5.15.13 and 5.15.14 if that would help. I've reported this problem at https://gitlab.freedesktop.org/drm/amd/-/issues/1858

A similar problem where my system didn't sleep, reboot, or shutdown properly with 5.15.2-5.15.4 involved amdgpu. That previous problem was reported at https://bugzilla.kernel.org/show_bug.cgi?id=214921
https://bugzilla.redhat.com/show_bug.cgi?id=2023035

2. What is the Version-Release number of the kernel:
5.15.14-200.fc35.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Yes, sleep worked with 5.15.5-5.15.13. The problem first appeared with 5.15.14

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Yes, the issue occurred 3/3 times.
1. Boot a Fedora 35 KDE Plasma installation on a system with an integrated Radeon R5 GPU
2. Log in to Plasma on Wayland from sddm
3. Start konsole
4. Update to the 5.15.14 kernel which I did with the following with the updates-testing repo enabled
sudo dnf offline-upgrade download
sudo dnf offline-upgrade reboot

5. After the update is completed, boot the 5.15.14 kernel
6. Select Sleep in sddm

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
I haven't tried the latest rawhide kernel yet.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I've attached the journal from a boot with the problem.

Comment 1 Matt Fagnani 2022-01-13 03:31:51 UTC
I bisected between 5.15.13 and 5.15.14. git bisect gave the following final output for the first bad commit which involved amdgpu and suspend.

3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
commit 3c196f05666610912645c7c5d9107706003f67c3
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Nov 12 11:25:30 2021 -0500

    drm/amdgpu: always reset the asic in suspend (v2)
    
    [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
    
    If the platform suspend happens to fail and the power rail
    is not turned off, the GPU will be in an unknown state on
    resume, so reset the asic so that it will be in a known
    good state on resume even if the platform suspend failed.
    
    v2: handle s0ix
    
    Acked-by: Luben Tuikov <luben.tuikov@amd.com>
    Acked-by: Evan Quan <evan.quan@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3c196f05666610912645c7c5d9107706003f67c3

Comment 2 Hans de Goede 2022-01-13 09:23:29 UTC
Thank you very much for bisecting your kernel bug, that is always really helpful in getting things fixed.

In case you haven't done so already can you please report this directly to the upstream amdgpu developers, including Alexs Deucher, the author of the problematic patch?

Since you can clearly build your own kernels, they can then interact with you directly and give you patches which may fix this for you to test.

Comment 3 Matt Fagnani 2022-01-13 13:29:36 UTC
(In reply to Hans de Goede from comment #2)
> Thank you very much for bisecting your kernel bug, that is always really
> helpful in getting things fixed.
> 
> In case you haven't done so already can you please report this directly to
> the upstream amdgpu developers, including Alexs Deucher, the author of the
> problematic patch?
> 
> Since you can clearly build your own kernels, they can then interact with
> you directly and give you patches which may fix this for you to test.

Alex Deucher wrote a patch at https://gitlab.freedesktop.org/drm/amd/-/issues/1858#note_1217823 I built 5.15.14 after applying 0001-drm-amdgpu-don-t-do-resets-on-APUs-which-don-t-suppo.patch, and that kernel slept and woke up normally. Thanks.

Comment 4 Justin M. Forbes 2022-01-14 17:15:46 UTC
Thanks for doing all of this. The patch was posted on the 12th and hasn't made it to any of the next trees yet, but I will pull it in once it does.

Comment 5 Fedora Update System 2022-01-20 22:36:05 UTC
FEDORA-2022-6352c313b7 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2022-6352c313b7

Comment 6 Fedora Update System 2022-01-20 22:36:07 UTC
FEDORA-2022-6d4082d590 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2022-6d4082d590

Comment 7 Fedora Update System 2022-01-21 06:40:03 UTC
FEDORA-2022-6d4082d590 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-6d4082d590`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-6d4082d590

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Fedora Update System 2022-01-21 18:53:11 UTC
FEDORA-2022-6352c313b7 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-6352c313b7`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-6352c313b7

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2022-01-22 01:18:23 UTC
FEDORA-2022-6d4082d590 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 10 Fedora Update System 2022-01-22 01:31:46 UTC
FEDORA-2022-6352c313b7 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.