Bug 2054948 - AMD Vega64 GPU Freeze when dynamic power management on
Summary: AMD Vega64 GPU Freeze when dynamic power management on
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 35
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-16 04:28 UTC by Danut Enachioiu
Modified: 2022-12-13 16:41 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-13 16:41:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Danut Enachioiu 2022-02-16 04:28:26 UTC
1. Please describe the problem:

On my Fedora 35 Silverblue, with amdgpu.dpm at its default setting (on), I get occasional freezes/crashes when using GPU-intensive games. The crashes don't always happen, but when they do they typically occur a few minutes after starting the game. Other times, the games run fine for hours without a crash.

Setting amdgpu.dpm=0 fixes the problem entirely.

More context at https://ask.fedoraproject.org/t/gpu-hang-how-could-i-investigate-fix-this/20124/10

2. What is the Version-Release number of the kernel:

kernel-5.16.8-200.fc35.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Unsure. It has happened since I started using Fedora (and Linux) for gaming, around December 2021.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Unreliably.

Have a default Fedora 35 install, have my GPU (Sapphire Nitro+ Vega 64), start a GPU-intensive game for a few minutes. In my research it doesn't seem to depend on the game being run as long as it's intensive. Some examples include: Doom Eternal, Civ 6, Satisfactory, Cities Skylines.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

I will research how to do this on Silverblue and try it out, but righ tnow I don't know.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I will collect full logs if you need them, but here is the key bit I copied last time it happened. There were no messages from the kernel for several minutes before these and the logs ended after the timeout.

Feb 13 02:56:46 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:46 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:56:48 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:50 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:50 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:56:53 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:55 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:55 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:56:57 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:59 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:59 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:57:04 fedora kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!

Comment 1 Ben Cotton 2022-11-29 17:54:09 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 2 Ben Cotton 2022-12-13 16:41:04 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.