Bug 2415143
| Summary: | amdgpu: Fedora KDE amdgpu Boot-looping Crash | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Wyatt Childers <rhbugs.2o67n> |
| Component: | xorg-x11-drv-amdgpu | Assignee: | Dominik 'Rathann' Mierzejewski <dominik> |
| Status: | NEW --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 43 | CC: | acaringi, adscvr, airlied, dominik, hans, hpa, jforbes, josef, kernel-maint, linville, masami256, mchehab, negativo17, ptalbert, steved, suraj.ghimire7 |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | --- | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Wyatt Childers
2025-11-14 23:49:19 UTC
These issues seem possibly related: - https://bugzilla.redhat.com/show_bug.cgi?id=2354776 - https://bugzilla.redhat.com/show_bug.cgi?id=2359116 but as a boot-loop that can only be bypassed with (what I would consider "heroics") this manifestation is much more serious. Are you using Xorg session with KDE? I am not; sorry I forgot hardware info in general. Operating System: Fedora Linux 43 KDE Plasma Version: 6.5.2 KDE Frameworks Version: 6.19.0 Qt Version: 6.10.0 Kernel Version: 6.17.7-300.fc43.x86_64 (64-bit) Graphics Platform: Wayland Processors: 32 × AMD Ryzen 9 7950X 16-Core Processor Memory: 64 GiB of RAM (61.9 GiB usable) Graphics Processor 1: AMD Radeon RX 7900 XTX Graphics Processor 2: AMD Ryzen 9 7950X 16-Core Processor Manufacturer: ASUS If I can get to SDDM (to even get into my KDE session), all is fine. There is a number of similar issues open at upstream issue tracker: https://gitlab.freedesktop.org/drm/amd/-/issues/?sort=created_date&state=opened&search=dc_dmub_srv_log_diagnostic_data&first_page_size=30 . Could you check if any of them match yours? Anyway, this is a kernel issue, so reassigning to kernel. Reassigning back to xorg driver package after discussion with kernel maintainer. It's kind of hard to say; those all look like they're happening well after login, but conceptually the same crash could be responsible for all of them (but with different triggers) or this could be novel. So in the since of the symptom no, but with such little information in the logs following the crash ... I can't say definitely that this is "none of those things" or "one of those things." Of note, I've had the system up since the 14th without any crashing during normal usage (I've played games, watched shows, done web browsing, ran heavy compilation workloads, etc etc etc). So this really does seem to just be triggering during the boot process. Okay I tried again with today's updates and got some interesting new errors in related to the crash: Nov 19 11:05:12 localhost kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data Nov 19 11:05:16 localhost kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000 Nov 19 11:05:16 localhost kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff! Nov 19 11:05:18 localhost kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data Nov 19 11:05:24 localhost kernel: amdgpu 0000:03:00.0: amdgpu: ring_buffer_start = 00000000b23a7c87; ring_buffer_end = 00000000695ffdcc; write_frame = 0000000089fa4e2b Nov 19 11:05:24 localhost kernel: amdgpu 0000:03:00.0: amdgpu: write_frame is pointing to address out of bounds Nov 19 11:05:24 localhost kernel: amdgpu 0000:03:00.0: amdgpu: device lost from bus! Nov 19 11:05:24 localhost kernel: amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff? Nov 19 11:05:24 localhost kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff! Nov 19 11:05:45 localhost kernel: amdgpu 0000:03:00.0: amdgpu: device lost from bus! Nov 19 11:05:45 localhost kernel: amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff? Nov 19 11:05:45 localhost kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff! I also had one clean boot from power off. It seems like the transitions between grub -> plymouth -> sddm are the danger points. In particular, if I never see plymouth's "loading spinner" I seem to be golden. However, if that does render, I only get a single frame of it and the boot is going to fail. I tried adding amdgpu.gfxoff=0 to my kernel arguments running o the hint of "Failed to disable gfxoff": sudo grubby --update-kernel=ALL --args="amdgpu.gfxoff=0" However, that did not seem to improve things. "device lost from bus!" is also interesting as it is somewhat suggestive of a hardware issue, but I'd find that hard to believe given the days-long flawless runtime once successfully on the desktop. No change from the latest updates. Please do not add comments here unless you can point to an upstream fix. There's really nothing I can do apart from asking you to either subscribe to one of the upstream bugs or open a new one and link here. |