Bug 2239807 - Fedora 39 GNOME (X11/Wayland) experiences regular screen blackouts at increasing intervals after login on amdgpu with kernel 6.5.3-300 and later
Summary: Fedora 39 GNOME (X11/Wayland) experiences regular screen blackouts at increas...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 39
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Florian Müllner
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker AcceptedFreezeException
Depends On:
Blocks: F39FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2023-09-20 10:06 UTC by Seth Maurice-Brant
Modified: 2024-11-27 21:30 UTC (History)
33 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-11-27 21:30:39 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Kernel log on a session experiencing this issue (447.34 KB, text/plain)
2023-09-20 17:23 UTC, Seth Maurice-Brant
no flags Details
Log booting from 6.5.6-300 (427.68 KB, text/plain)
2023-10-07 18:26 UTC, Seth Maurice-Brant
no flags Details
Bootlog of kernel-6.7.0-0.rc1.20231117git7475e51b8796.19.fc40 (509.84 KB, text/plain)
2023-11-19 17:06 UTC, Seth Maurice-Brant
no flags Details
A working system bootlog using kernel 6.5.2 (367.35 KB, text/plain)
2023-11-19 17:11 UTC, Seth Maurice-Brant
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab drm amd issues 2999 0 None opened [AMD RX 6600XT] Kernel 6.5.3+ screen flickers off and then settles on black after logging into gnome-session 2023-11-20 12:39:12 UTC

Description Seth Maurice-Brant 2023-09-20 10:06:17 UTC
When I login to my fedora 39 beta system, the screen quickly starts going black for periods of up to 30 seconds. This happens only once I start interacting with my system. I would not expect the screen to black out at all. Opening software and interacting with Gnome features like the activities menu seems to exacerbate the problem. I have flagged this as an issue with Mutter but I am not currently sure what component is actually at fault here.




[NOTE] I rolled my system back as it was unusable, but if required I can reinstall F39 to gather more information to debug this issue.

This is the first manual bug I have submitted here, so it is likely I have omitted some details and possibly not followed best practices. Please provide any feedback and I will do my best to fill everything in properly.

Reproducible: Always

Steps to Reproduce:
1.Upgrade from Fedora 38 to 39 via CLI
2. Login from GDM
3. Open Firefox, start using computer. The issue begins quickly.
Actual Results:  
The screen seems to go black for up to 30 seconds at a time, before suddenly reappearing.

(Ctrl+Alt+<Num> still works so tty is accessible)

Expected Results:  
The desktop would open up and the screen would not go black at all.

Basic Hardware Information:
GPU: AMD Radeon RX 6600 XT
Render server: Wayland
Driver: Mesa
CPU: AMD 5900 24t/12c

Comment 1 Kamil Páral 2023-09-20 14:39:41 UTC
Hey Seth! When you're booted into the system and see the blackout issues, please save the system journal ("sudo journalctl -b > journal.txt") and attach it here, thanks.

Also, when you repeatedly press F8 during early boot to see the GRUB menu, and choose an older kernel, does this still happen? Please tell us which kernel version works and which one is faulty.

There seem to be some AMD GPU issues on kernels 6.5.x. It talks just about APUs, and you don't have an APU, but it still might be the same or a related issue:
https://gitlab.freedesktop.org/drm/amd/-/issues/2830

If you rolled back to F38, perhaps you can just test the latest kernel in F38 updates-testing ("sudo dnf update 'kernel*' --enablerepo updates-testing) and see it if behaves the same way (and gather the log in that case).

Comment 2 Seth Maurice-Brant 2023-09-20 16:27:33 UTC
After further testing, I have discovered one of the patches made to F39 since yesterday has resolved this issue.

Comment 3 Seth Maurice-Brant 2023-09-20 17:23:56 UTC
Created attachment 1989739 [details]
Kernel log on a session experiencing this issue

This is clearly a kernel issue, not a mutter one. It is introduced somewhere after 6.4 and is persisting through 6.5 and 6.6.

Comment 4 Adam Williamson 2023-09-20 17:30:22 UTC
So, I see this:

Sep 20 18:12:21 tempest gnome-shell[7563]: Received an X Window System error.
                                           This probably reflects a bug in the program.
                                           The error was 'GLXBadDrawable'.
                                             (Details: serial 1538 error_code 160 request_code 152 (GLX) minor_code 29)
                                             (Note to programmers: normally, X errors are reported asynchronously;
                                              that is, you will receive the error a while after causing it.
                                              To debug your program, run it with the MUTTER_SYNC environment
                                              variable to change this behavior. You can then get a meaningful
                                              backtrace from your debugger if you break on the mtk_x_error() function.)
Sep 20 18:12:21 tempest audit[7563]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=4 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=7563 comm="gnome-shell" exe="/usr/bin/gnome-shell" sig=5 res=1
Sep 20 18:12:21 tempest org.gnome.Shell.desktop[7563]: == Stack trace for context 0x5649f42fb740 ==
Sep 20 18:12:21 tempest org.gnome.Shell.desktop[7563]: #0   5649f43c66a8 i   resource:///org/gnome/shell/ui/init.js:21 (5c76cc70ba0 @ 48)
Sep 20 18:12:21 tempest audit: BPF prog-id=125 op=LOAD
Sep 20 18:12:21 tempest audit: BPF prog-id=126 op=LOAD
Sep 20 18:12:21 tempest audit: BPF prog-id=127 op=LOAD
Sep 20 18:12:21 tempest systemd[1]: Started systemd-coredump - Process Core Dump (PID 8373/UID 0).
Sep 20 18:12:21 tempest audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-8373-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 20 18:12:21 tempest systemd-coredump[8377]: Resource limits disable core dumping for process 7563 (gnome-shell).
Sep 20 18:12:21 tempest systemd-coredump[8377]: Process 7563 (gnome-shell) of user 1000 dumped core.
Sep 20 18:12:21 tempest systemd[1]: systemd-coredump: Deactivated successfully.
Sep 20 18:12:21 tempest audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-8373-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 20 18:12:21 tempest gnome-session[7418]: gnome-session-binary[7418]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5
Sep 20 18:12:21 tempest gnome-session-binary[7418]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5

which means Shell crashed on a GLXBadDrawable but we don't have the backtrace. Not sure if drm debugging would help here or not?

Comment 5 Adam Williamson 2023-09-20 17:32:11 UTC
we *don't* seem to have the amdgpu errors or kernel backtrace from https://gitlab.freedesktop.org/drm/amd/-/issues/2830 here, though.

Comment 6 Adam Williamson 2023-09-20 23:43:29 UTC
https://fedoramagazine.org/announcing-fedora-39-beta/#comment-553006 is an interestingly similar report, though with Intel graphics.

Comment 7 Seth Maurice-Brant 2023-09-22 08:46:23 UTC
It could be similar, would need more info. I'm not sure if GDM has anything to do with this issue anymore however.

Comment 8 Seth Maurice-Brant 2023-09-22 08:49:40 UTC
I've removed GDM from the title of the report for now until we can verify that it is related, the issue only occurs in the gnome shell itself, GDM is unaffected.

Comment 9 jonathan.dundas 2023-10-02 16:48:12 UTC
I am having this issue with a AMD Ryzen 7 5800X system and AMD 6800XT video card, running Plasma with a Wayland session. My system is up to date. If I boot with the latest 6.5 kernel I can login to a normal Plasma session and work for 30-60 minutes before the screen goes blank, but audio still plays via bluetooth, and the system responds to pings. If I boot a previous kernel the system is stable and the issue does not happen. Journalctl shows for the previous boot:

Oct 02 12:37:21 xxx kernel: amdgpu 0000:0b:00.0: [drm] *ERROR* [CRTC:82:crtc-0] flip_done timed out
Oct 02 12:37:23 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma2 timeout, signaled seq=8100, emitted seq=8101
Oct 02 12:37:23 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Oct 02 12:37:23 xxx kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
Oct 02 12:37:27 xxx kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Oct 02 12:37:27 xxx kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to disable gfxoff!

Comment 10 Seth Maurice-Brant 2023-10-03 08:50:15 UTC
This may be related or may be a separate issue. I am noticing the black screen issue almost immediately after logging into Gnome, whereas you are reporting normal use for 30-60 minutes before it begins, however it may be just a difference in DEs muddying the waters.

Comment 11 Adam Williamson 2023-10-03 15:32:35 UTC
Jonathan's case looks more like https://gitlab.freedesktop.org/drm/amd/-/issues/2830 , since he has the errors discussed in this report. At the moment I tend to think these are two separate issues, though it's hard to be sure.

Comment 12 Adam Williamson 2023-10-03 15:40:00 UTC
Jonathan, we have https://bugzilla.redhat.com/show_bug.cgi?id=2240859 for that bug.

Comment 13 Fedora Blocker Bugs Application 2023-10-03 15:45:43 UTC
Proposed as a Blocker for 39-final by Fedora user saluki using the blocker tracking app because:

 This bug is rendering my desktop completely unusable with F39 unless I drop to a tty and manually rollback my kernel. 

I am able to login to my system, but I am unable to complete the "Default application functionality" checks.

Comment 14 Kamil Páral 2023-10-06 08:50:21 UTC
Seth, saluki, we need more info about exactly happens when the screen blanks. Display a clock with seconds a notice exactly when the screen goes blank and when it restores. Then find the exact time frame in `sudo journalctl -b` and post it here. We have the full log already, but I don't know where to look specifically. The issue from comment 4 might or might not be relevant.

We'll also need an upstream bug report where actual developers can look at it.

Since Seth found that the issue is introduced by a kernel update, it would be great if you could identify the exact commit that caused it. I've recently wrote a guide exactly for doing that, it's here:
https://kparal.wordpress.com/2023/08/15/bisecting-fedora-kernel/

Comment 15 redhatbugzilla 2023-10-06 20:39:59 UTC
it's reported upstream at https://gitlab.freedesktop.org/drm/amd/-/issues/2830

upstream also says 6.5.6 fixes it https://gitlab.freedesktop.org/drm/amd/-/issues/2830#note_2116074

Comment 16 redhatbugzilla 2023-10-06 20:41:08 UTC
sorry, i just read that that's tracked separately at https://bugzilla.redhat.com/show_bug.cgi?id=2240859

Comment 17 Adam Williamson 2023-10-06 23:25:03 UTC
Yeah, we don't think these two bugs are the same, given the information we have so far.

Comment 18 Adam Williamson 2023-10-07 01:34:22 UTC
But just in case we get lucky - can affected folks test if https://bodhi.fedoraproject.org/updates/FEDORA-2023-c3bb819677 happens to fix this?

Comment 19 Seth Maurice-Brant 2023-10-07 18:19:38 UTC
(In reply to Adam Williamson from comment #18)
> But just in case we get lucky - can affected folks test if
> https://bodhi.fedoraproject.org/updates/FEDORA-2023-c3bb819677 happens to
> fix this?

Just tested, I can confirm this does not resolve the issue. I will attempt to gather more information about this issue shortly.

Comment 20 Seth Maurice-Brant 2023-10-07 18:25:51 UTC
(In reply to Kamil Páral from comment #14)
> Seth, saluki, we need more info about exactly happens when the screen
> blanks. Display a clock with seconds a notice exactly when the screen goes
> blank and when it restores. Then find the exact time frame in `sudo
> journalctl -b` and post it here. We have the full log already, but I don't
> know where to look specifically. The issue from comment 4 might or might not
> be relevant.
> 
> We'll also need an upstream bug report where actual developers can look at
> it.
> 
> Since Seth found that the issue is introduced by a kernel update, it would
> be great if you could identify the exact commit that caused it. I've
> recently wrote a guide exactly for doing that, it's here:
> https://kparal.wordpress.com/2023/08/15/bisecting-fedora-kernel/

On the kernel I recently tested, the screen blacks *immediately* after a login and then remains black. It did flash the desktop briefly.I will attach a relevant boot log shortly. I will also have a look at your commit identification guide and provide information on that if possible, but I cannot guarantee I will have the time.

Comment 21 Seth Maurice-Brant 2023-10-07 18:26:49 UTC
Created attachment 1992824 [details]
Log booting from 6.5.6-300

Comment 22 Seth Maurice-Brant 2023-10-07 18:57:40 UTC
I have performed some additional kernel testing to attempt to pinpoint when this issue was introduced.

I have discovered that kernel 6.5.2-301 does NOT have the issue present.
Whilst kernel 6.5.3-300 does have the issue present.

This suggests that the issue was introduced in 6.5.3.

Currently, I do not have enough time or skills to pull apart the git history and identify exactly which commit caused the issue, however I am willing to do some more work on IDing this bug over the next week or two, if someone could provide me with guidance on what specifically needs doing/what I'm looking for.

Thanks.

Comment 23 Leslie Satenstein 2023-10-08 16:06:18 UTC
Bug 2241955 - Fedora 39 Wayland initialization code has something missing -- my system crashes (edit)

Comment 24 Leslie Satenstein 2023-10-08 16:37:15 UTC
At the end of my many comments to 2241955, I pointed out one critical error message. It is a copy without a range or validity check to a display field. The field, when used, overruns it's target area.

FYI 
I only test directly using real hardware, not via VMs, and therefore, my results are as above, and fatal after logon 2 or logon 3.
Warm reboots will fail.   KDE also exhibits some issues due to the same item.  Xorg testing results are OK.

Comment 25 Seth Maurice-Brant 2023-10-08 16:45:31 UTC
Thank you for your comments. 

Out of curiosity, I loaded into an Xorg session for the first time in years a moment ago and discovered that this issue is persistent there. This suggests that this bug is not Wayland specific.

Comment 26 Seth Maurice-Brant 2023-10-08 16:46:05 UTC
(I performed this testing on kernel 6.5.6)

Comment 27 Leslie Satenstein 2023-10-08 16:47:06 UTC
From my boot log.

Oct 07 15:38:36 Lockwood kernel: pcpu-alloc: s225280 r8192 d28672 u262144 alloc=1*2097152
Oct 07 15:38:36 Lockwood kernel: pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 
Oct 07 15:38:36 Lockwood kernel: pcpu-alloc: [0] 16 17 18 19 20 21 22 23 [0] 24 25 26 27 28 29 30 31 

Oct 07 15:38:36 Lockwood kernel: Kernel command line: BOOT_IMAGE=(hd10,gpt12)/vmlinuz-6.5.5-300.fc39.x86_64 root=UUID=72740d33-c04c-488c-8437-6460a45b7c78 ro resume=UUID=a391dafa-4d2f-4071-9656-932f9215efc6 rhgb quiet
Oct 07 15:38:36 Lockwood kernel: Unknown kernel command line parameters "rhgb BOOT_IMAGE=(hd10,gpt12)/vmlinuz-6.5.5-300.fc39.x86_64", will be passed to user space.

Oct 07 15:38:36 Lockwood kernel: Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
Oct 07 15:38:36 Lockwood kernel: Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
Oct 07 15:38:36 Lockwood kernel: Fallback order for Node 0: 0 
Oct 07 15:38:36 Lockwood kernel: Built 1 zonelists, mobility grouping on.  Total pages: 4105599
Oct 07 15:38:36 Lockwood kernel: Policy zone: Normal

Comment 28 Leslie Satenstein 2023-10-08 22:20:32 UTC
 bug 2241955,

Comment 29 Adam Williamson 2023-10-09 15:29:31 UTC
We have +1 / -6 in https://pagure.io/fedora-qa/blocker-review/issue/1359 , so marking rejected. For this kind of issue we have to make a judgment call on how much hardware is affected, and so far it just seems like this issue is fairly rare. If we get a strong indication this is affecting more people than we thought, the decision can be revisited.

Proposing for an FE in case we find a fix.

Comment 30 Leslie Satenstein 2023-10-09 16:35:11 UTC
Adam, I know we would like to move forward to a clean release However, 
Extremely rare is perhaps not quite right. 

This may be a red-herring, but I am also testing Fedora40(Rawhide)
I would like to just past the bottom part of the F40 clipboard herein
---------------------------------
## Software Information:
- **Graphics:**                                    NVA8
...
- **Firmware Version:**                            6042
- **OS Name:**                                     Fedora Linux 40 (Workstation Edition Prerelease)
- **OS Build:**                                    (null)
- **OS Type:**                                     64-bit
- **GNOME Version:**                               45.0
- **Windowing System:**                            Wayland
- **Kernel Version:**                              Linux 6.6.0-0.rc4.20231005git3006adf3be79.36.fc40.x86_64
------------------------------------
The same kernel, the same gnome, But no issues The logout/logon can be done multiple times without any issue as occurring with Fedora39.  With F39, installed using the vanilla Everything.iso.

Comment 31 Adam Williamson 2023-10-09 16:53:57 UTC
Leslie, you have an NVIDIA adapter. You very likely do not have the same problem as Seth. Please don't confuse the bug report.

Seth, your latest log seems to show fossilize_replay crashing, which appears to be a Valve thing to do with Steam...is Steam set to start on boot for you? Does the bug go away if you remove or disable Steam somehow?

Comment 32 Seth Maurice-Brant 2023-10-09 17:15:51 UTC
Thanks for highlighting that crash. I've managed to avoid the crash appearing in my journal by disabling Steam's start on boot, however it has no impact on this issue.

Comment 33 Leslie Satenstein 2023-10-09 17:34:32 UTC
Adam

I am really trying to have a system that I can consistently login to, numerous times, without a reboot.
I do not intend to be a distraction, or an obstacle.
 
I chose to not install the NVIDIA adapter from rpmfusion. I do not know what vanilla everything.iso does with the NVIDIA adapter. 

I am currently responding with Fedora 39 and xorg. It works flawlessly.  

I intend to put this F39 aside and install the latest F39 Oct 9th Everything.iso and report back within 12 hrs, even though sudo "dnf update -y" is run daily against the Oct 7th installation.

FYI I use btrfs for / (30gig) and for /home (10gig).  

I will signoff now to do a fresh F39 Everything installation.  Back in 12hrs.

Comment 34 Adam Williamson 2023-10-09 17:47:50 UTC
Please don't "report back" here. This is almost certainly not your bug. Please report a bug of your own, with clear details on exactly what is going wrong, and the hardware you are running on.

Comment 35 Geoffrey Marr 2023-10-09 17:58:42 UTC
Discussed during the 2023-10-09 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedFreezeException (Final)" was made as it is a noticeable issue that cannot be fixed with an update.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2023-10-09/f39-blocker-review.2023-10-09-16.00.txt

Comment 36 Seth Maurice-Brant 2023-10-27 16:42:43 UTC
Still present in 6.5.9

Comment 37 Seth Maurice-Brant 2023-11-19 17:06:26 UTC
Created attachment 2000357 [details]
Bootlog of  kernel-6.7.0-0.rc1.20231117git7475e51b8796.19.fc40

Regression as of kernel-6.7.0-0.rc1.20231117git7475e51b8796.19.fc40, the issue now follows me into tty rather than being only present in gnome-session.

Comment 38 Seth Maurice-Brant 2023-11-19 17:11:31 UTC
Created attachment 2000358 [details]
A working system bootlog using kernel 6.5.2

As there are plenty of random things running on my system, to take help eliminate any random or irrelevant errors, this is a bootlog where the issue is NOT present.

Comment 39 Kamil Páral 2023-11-20 10:20:28 UTC
I don't think we have an upstream report for this specific behavior yet, right? 
Seth, can you please file an upstream bug at https://gitlab.freedesktop.org/drm/amd/-/issues , attach relevant logs, and link it here?
Thanks.

Comment 40 Seth Maurice-Brant 2023-11-20 12:35:16 UTC
Upstream issue created: https://gitlab.freedesktop.org/drm/amd/-/issues/2999

Comment 41 flip101 2023-11-24 20:18:35 UTC
(In reply to jonathan.dundas from comment #9)
> I am having this issue with a AMD Ryzen 7 5800X system and AMD 6800XT video
> card, running Plasma with a Wayland session. My system is up to date. If I
> boot with the latest 6.5 kernel I can login to a normal Plasma session and
> work for 30-60 minutes before the screen goes blank, but audio still plays
> via bluetooth, and the system responds to pings. If I boot a previous kernel
> the system is stable and the issue does not happen. Journalctl shows for the
> previous boot:
> 
> Oct 02 12:37:21 xxx kernel: amdgpu 0000:0b:00.0: [drm] *ERROR*
> [CRTC:82:crtc-0] flip_done timed out
> Oct 02 12:37:23 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
> sdma2 timeout, signaled seq=8100, emitted seq=8101
> Oct 02 12:37:23 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> Process information: process  pid 0 thread  pid 0
> Oct 02 12:37:23 xxx kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
> Oct 02 12:37:27 xxx kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done
> with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
> Oct 02 12:37:27 xxx kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to disable
> gfxoff!

Hello Jonathan, could you report your findings here? https://gitlab.freedesktop.org/drm/amd/-/issues/2870 for this issue the error code is the same. Even though you are on the 6000-series card.

Comment 42 jonathan.dundas 2023-12-21 21:16:54 UTC
Adam Williamson, flip101 

The other bugs you link do not seem related to the bug I am trying to report. I get no such error message with the text 'gfx_low'. Neither https://gitlab.freedesktop.org/drm/amd/-/issues/2830 nor https://bugzilla.redhat.com/show_bug.cgi?id=2240859 look related.

This issue has occured with every kernel release starting with the Fedora 6.5 Kernel releases, I get about an hour of desktop time then hard crash, screen blank. It is still happening with 6.6.7. I am updated to current today for today's test/crash. 

I have been stuck on kernel-6.4.15-200 if I want to run a stable system.

If there is additional information or logs that might be useful please let me know. This is running updated Fedora 38 with no proprietary drivers, Plasma Desktop in Wayland.

My crash may not be related to this issue, but it seems to be the closest fit searching bugzilla.

The relevant section from journalctl:

Dec 21 15:10:35 localhost systemd[1]: dnf-makecache.service: Consumed 1.339s CPU time.
Dec 21 15:12:38 localhost plasmashell[2788]: qt.qpa.wayland: Wayland does not support QWindow::requestActivate()
Dec 21 15:15:07 localhost kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=322881, emitted seq=322883
Dec 21 15:15:07 localhost kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process slack pid 4388 thread slack:cs0 pid 4398
Dec 21 15:15:07 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
Dec 21 15:15:12 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Dec 21 15:15:12 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to disable gfxoff!
Dec 21 15:15:12 localhost plasmashell[4325]: [1221/151512.976542:ERROR:scoped_ptrace_attach.cc(27)] ptrace: Operation not permitted (1)
Dec 21 15:15:13 localhost systemd-logind[1199]: Power key pressed short.
Dec 21 15:15:13 localhost systemd[2235]: Started dbus-:1.2-org.kde.LogoutPrompt.
Dec 21 15:15:13 localhost kwin_wayland[2586]: kf.service.services: The desktop entry file "/usr/share/applications/qemu.desktop" has Type= "Application" but has no Exec field.
Dec 21 15:15:15 localhost kernel: [drm] REG_WAIT timeout 1us * 100000 tries - hubp2_set_blank_regs line:959
Dec 21 15:15:15 localhost kernel: [drm] REG_WAIT timeout 1us * 100000 tries - hubp2_set_blank_regs line:959
Dec 21 15:15:17 localhost plasmashell[4325]: [121:1221/151517.080153:ERROR:command_buffer_proxy_impl.cc(319)] GPU state invalid after WaitForGetOffsetInRange.
Dec 21 15:15:18 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Dec 21 15:15:18 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
Dec 21 15:15:20 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Dec 21 15:15:21 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
Dec 21 15:15:21 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Dec 21 15:15:21 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
Dec 21 15:15:21 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Dec 21 15:15:21 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
Dec 21 15:15:21 localhost kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!

Comment 43 Adam Williamson 2023-12-23 17:12:17 UTC
Hmm, yeah, it is a bit different indeed. I had another look and can't find any other report exactly like yours. I think it might be best if you could file a new report at https://gitlab.freedesktop.org/drm/amd/-/issues with all your details and logs - AFAIK we aren't really patching amdgpu at all, so this is likely an upstream issue (though if you could confirm you see it on other distro kernels it'd probably help confirm).

Comment 44 Clemens Eisserer 2024-01-14 11:35:44 UTC
my system is also suffering from this bug: Feodra 39, 6.6.9-200.fc39.x86_64, SSDM + KDE, both wayland and xorg are affected.

Comment 45 jonathan.dundas 2024-02-28 16:45:02 UTC
It is anecdotal and doesn't appear directly relevant to this ticket, but I do believe I found the answer to the 'black screen' hangs/crashes for my case. It seems like it was an issue with firmware or drivers, such that when power management drops the memory clock down to 96MHz and you're using a high refresh monitor (higher than 60Hz), the black screen crash could happen. After enabling overclocking with the amdgpu.ppfeaturemask, I used LACT to disable the 96MHz state for GPU memory. I tried using the cli tool amdgpu-clocks, but it looks like the 6xxx series of cards are not well supported for dynamic power management, thankfully LACT seemed to be able to adjust 6xxx clocks properly.

The crashes continued on an up to date Fedora 39 system with current kernel, but stopped once I adjusted the powersaving mode as described above.

Special thanks to the arch wiki!

Comment 46 Aoife Moloney 2024-11-27 21:30:39 UTC
Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26.

Fedora Linux 39 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.