Bug 2362696

Summary: kernel: amdgpu 0000:64:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Product: [Fedora] Fedora Reporter: Michal Nowak <mnowak>
Component: linux-firmwareAssignee: David Woodhouse <dwmw2>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 41CC: 8ru2u4gz, dwmw2, jforbes, jwboyer, kernel-maint, laura, masouddehghani, pbrobinson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: linux-firmware-20250627-1.fc42 linux-firmware-20250627-1.fc41 Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-06-30 02:21:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dnf history info 115 from Apr 20 none

Description Michal Nowak 2025-04-28 13:23:17 UTC
After linux-firmware got updated to 20250410 I face system hangs on my Lenovo P14s (a 2024 model) with Radeon 780M Graphics when I attach or detach USB-C type charger:

Apr 27 18:17:36 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 27 18:17:36 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 27 18:17:37 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 27 18:17:37 fedora kernel: amdgpu 0000:64:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - mpc2_assert_idle_mpcc line:481
Apr 27 18:17:47 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out

Here's the upstream error: https://gitlab.freedesktop.org/drm/amd/-/issues/3913

And a fix: https://gitlab.com/kernel-firmware/linux-firmware/-/merge_requests/420

SUSE deployed the fix in February: https://bugzilla.suse.com/show_bug.cgi?id=1236196

I just downgraded amd-gpu-firmware to 20241017 and am testing the workaround.

Reproducible: Sometimes

Steps to Reproduce:
Run Fedora 41 GNOME on a laptop for a while, suspend/resume, attach/detach the charging cable.



Additional Information:
After a second crash yesterday, I lost all opened Firefox tabs and windows even with the "Open previous windows and tabs" Firefox option enabled.

Comment 1 Peter Robinson 2025-04-28 13:34:10 UTC
> And a fix:
> https://gitlab.com/kernel-firmware/linux-firmware/-/merge_requests/420
> 
> SUSE deployed the fix in February:
> https://bugzilla.suse.com/show_bug.cgi?id=1236196

That fix has been in Fedora since the 20250211 release which has the above MR in the release.

> I just downgraded amd-gpu-firmware to 20241017 and am testing the workaround.

We have the upstream fix. There's been other updates since:

$ git log --format=oneline  amdgpu/dcn_3_1_4_dmcub.bin
152e5e12df704b78d1fda9e29d9c893d76db615d amdgpu: update dcn 3.1.4 firmware to 8.0.78.0
c2c0e64a1b022724dc3b1b10bba9a4ab1b60587d amdgpu: DMCUB updates for various ASICs
61d257d5a8b3303a0159ade514138d98a154248b amdgpu: DMCUB updates for various ASICs
0e16f416fa296f66c83187c2bfa2984ef0be47a0 amdgpu: revert DMCUB 3.1.4 firmware

$ git tag --contains 0e16f416fa296f66c83187c2bfa2984ef0be47a0
20250211
20250311
20250410

Are you sure it's that problem upstream and not another one?

Comment 2 Michal Nowak 2025-04-28 15:14:28 UTC
You must be right, I wasn't on linux-firmware 20241017 when it worked, but on 20250311:

Upgrade  amd-gpu-firmware-0:20250410-1.fc41.noarch              Group         updates
Replaced amd-gpu-firmware-0:20250311-1.fc41.noarch              Group         @System

This is when it happened the first time: (Linux version 6.13.9-200.fc41.x86_64):

Apr 20 15:46:49 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:46:49 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:46:50 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:46:50 fedora kernel: amdgpu 0000:64:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - mpc2_assert_idle_mpcc line:481
Apr 20 15:47:00 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out

The screen was light grey, the caps lock led could be turned on and off, but I could not interact with the system by any other means. I closed the laptop screen lid.

Apr 20 15:48:05 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* flip_done timed out
Apr 20 15:48:05 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* [CRTC:79:crtc-0] commit wait timed out
Apr 20 15:48:05 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* flip_done timed out
Apr 20 15:48:05 fedora kernel: amdgpu 0000:64:00.0: [drm] *ERROR* [PLANE:58:plane-3] commit wait timed out
Apr 20 15:48:05 fedora kernel: Freezing user space processes failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
Apr 20 15:48:05 fedora kernel: task:KMS thread      state:D stack:0     pid:7359  tgid:7344  ppid:2744   flags:0x00004006
Apr 20 15:48:05 fedora kernel: Call Trace:
Apr 20 15:48:05 fedora kernel:  <TASK>
Apr 20 15:48:05 fedora kernel:  __schedule+0x2ad/0x5f0
Apr 20 15:48:05 fedora kernel:  schedule+0x27/0xa0
Apr 20 15:48:05 fedora kernel:  schedule_timeout+0x84/0x100
Apr 20 15:48:05 fedora kernel:  ? __pfx_process_timeout+0x10/0x10
Apr 20 15:48:05 fedora kernel:  __wait_for_common+0x8e/0x1c0
Apr 20 15:48:05 fedora kernel:  ? __pfx_schedule_timeout+0x10/0x10
Apr 20 15:48:05 fedora kernel:  drm_crtc_commit_wait+0x36/0x50
Apr 20 15:48:05 fedora kernel:  drm_atomic_helper_wait_for_dependencies+0xd2/0x100
Apr 20 15:48:05 fedora kernel:  commit_tail+0x3e/0x160
Apr 20 15:48:05 fedora kernel:  drm_atomic_helper_commit+0x11a/0x140
Apr 20 15:48:05 fedora kernel:  drm_atomic_commit+0xaf/0xe0
Apr 20 15:48:05 fedora kernel:  ? __pfx___drm_printfn_info+0x10/0x10
Apr 20 15:48:05 fedora kernel:  drm_mode_atomic_ioctl+0x70b/0x7c0
Apr 20 15:48:05 fedora kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Apr 20 15:48:05 fedora kernel:  drm_ioctl_kernel+0xad/0x100
Apr 20 15:48:05 fedora kernel:  drm_ioctl+0x288/0x530
Apr 20 15:48:05 fedora kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Apr 20 15:48:05 fedora kernel:  amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
Apr 20 15:48:05 fedora kernel:  __x64_sys_ioctl+0x94/0xc0
Apr 20 15:48:05 fedora kernel:  do_syscall_64+0x82/0x160
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? rseq_get_rseq_cs+0x1d/0x220
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? rseq_ip_fixup+0x8d/0x1d0
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? __x64_sys_ppoll+0xf4/0x160
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? eventfd_read+0xdf/0x230
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? vfs_read+0x299/0x370
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? syscall_exit_to_user_mode+0x10/0x210
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? do_syscall_64+0x8e/0x160
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? do_syscall_64+0x8e/0x160
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? syscall_exit_to_user_mode+0x10/0x210
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? do_syscall_64+0x8e/0x160
Apr 20 15:48:05 fedora kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 20 15:48:05 fedora kernel:  ? __irq_exit_rcu+0x4c/0xe0
Apr 20 15:48:05 fedora kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 20 15:48:05 fedora kernel: RIP: 0033:0x7f4c4d2fd4ad
Apr 20 15:48:05 fedora kernel: RSP: 002b:00007f4c31f55bf0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 20 15:48:05 fedora kernel: RAX: ffffffffffffffda RBX: 00007f4c1000d8d0 RCX: 00007f4c4d2fd4ad
Apr 20 15:48:05 fedora kernel: RDX: 00007f4c31f55c90 RSI: 00000000c03864bc RDI: 000000000000000c
Apr 20 15:48:05 fedora kernel: RBP: 00007f4c31f55c40 R08: 0000000000000130 R09: 0000000000000001
Apr 20 15:48:05 fedora kernel: R10: 0000000000000015 R11: 0000000000000246 R12: 00007f4c31f55c90
Apr 20 15:48:05 fedora kernel: R13: 00000000c03864bc R14: 000000000000000c R15: 00007f4c1003a960
...
Apr 20 15:48:05 fedora kernel: PM: suspend exit
Apr 20 15:48:05 fedora kernel: PM: suspend entry (s2idle)
Apr 20 15:48:05 fedora bluetoothd[1570]: Controller resume with wake event 0x0
Apr 20 15:48:06 fedora kernel: Filesystems sync: 0.011 seconds

But would not wake up, and I held the power button for a while to reset.

I found two similar reports. Mine could be a duplicate of #2360956.

https://bugzilla.redhat.com/show_bug.cgi?id=2360956
https://bugzilla.redhat.com/show_bug.cgi?id=2312366

Btw, I just upgraded to Fedora 42.

Comment 3 Michal Nowak 2025-04-28 15:22:14 UTC
Created attachment 2087627 [details]
dnf history info 115 from Apr 20

Comment 4 Michal Nowak 2025-05-01 16:31:45 UTC
Now it happened on Fedora 42. I downgraded amd-gpu-firmware to 20250311-1.fc42 and versionlocked it in dnf.

Comment 5 Michal Nowak 2025-05-02 20:20:32 UTC
I downgraded all linux-firmware-related packages to 20250311 but it did not help.

Comment 6 Michal Nowak 2025-05-02 20:24:54 UTC
Unsure if it's just a conincidence but loupe - the new GNOME image viewer - is always implicated shortly before the freeze:

    loupe[10758]: vkAcquireNextImageKHR(): A swapchain no longer matches the surface properties exactly, but can still be used to present to the surface successfully. (VK_SUBOPTIMAL_KHR) (1000001003)

Or:

May 02 21:05:39 fedora systemd[2659]: Started dbus-:1.2-org.gnome.evince.Daemon.
May 02 21:05:46 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:05:48 fedora systemd[2659]: dbus-:1.2-org.gnome.Loupe: Consumed 1.177s CPU time, 251.7M memory peak.
May 02 21:05:49 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:06:30 fedora systemd[2659]: dbus-:1.2-org.gnome.Loupe: Consumed 3.854s CPU time, 329.1M memory peak.
May 02 21:06:33 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:06:37 fedora systemd[2659]: dbus-:1.2-org.gnome.Loupe: Consumed 2.666s CPU time, 397.5M memory peak.
May 02 21:06:40 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:06:46 fedora systemd[2659]: dbus-:1.2-org.gnome.Loupe: Consumed 2.785s CPU time, 401.4M memory peak.
May 02 21:06:49 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:06:54 fedora systemd[2659]: dbus-:1.2-org.gnome.Loupe: Consumed 2.625s CPU time, 407.3M memory peak.
May 02 21:06:55 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:07:09 fedora systemd[2659]: dbus-:1.2-org.gnome.Loupe: Consumed 3.102s CPU time, 397M memory peak.
May 02 21:07:14 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:07:25 fedora systemd[2659]: dbus-:1.2-org.gnome.Loupe: Consumed 2.601s CPU time, 326.1M memory peak.
May 02 21:07:34 fedora systemd[2659]: Started dbus-:1.2-org.gnome.Loupe.
May 02 21:07:36 fedora loupe[21117]: vkAcquireNextImageKHR(): A swapchain no longer matches the surface properties exactly, but can still be used to present to the surface successfully. (VK_SUBOPTIMAL_KHR) (1000001003)

Uninstalled loupe and back to eog.

Comment 7 Michal Nowak 2025-05-03 06:47:10 UTC
These two look similar to my issue:

https://gitlab.freedesktop.org/drm/amd/-/issues/2950
https://gitlab.freedesktop.org/drm/amd/-/issues/3926

Comment 8 Michal Nowak 2025-05-13 14:49:33 UTC
> Uninstalled loupe and back to eog.

Actually, I use gThumb now, but getting rid of loupe workarounded things for me.

Comment 9 Masoud 2025-06-25 09:00:22 UTC
I have the same problem and only downgrading the following two packages I can go past LUKS passphrase screen:
```
amd-gpu-firmware
amd-ucode-firmware
```

You can find some more relevant info here:
https://discussion.fedoraproject.org/t/system-not-booting-after-installation-of-kernel-0-6-15-3-200-fc42-x86-64-with-the-amdgpu-drm-error/156338/25

Comment 10 Fedora Update System 2025-06-27 19:12:29 UTC
FEDORA-2025-7feed8b25a (kernel-6.15.4-200.fc42 and linux-firmware-20250627-1.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-7feed8b25a

Comment 11 Fedora Update System 2025-06-27 19:12:33 UTC
FEDORA-2025-f6f8526a43 (kernel-6.15.4-100.fc41 and linux-firmware-20250627-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-f6f8526a43

Comment 12 Fedora Update System 2025-06-28 02:08:48 UTC
FEDORA-2025-7feed8b25a has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-7feed8b25a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-7feed8b25a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 Fedora Update System 2025-06-28 02:33:42 UTC
FEDORA-2025-f6f8526a43 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-f6f8526a43`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-f6f8526a43

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Fedora Update System 2025-06-30 02:21:44 UTC
FEDORA-2025-7feed8b25a (kernel-6.15.4-200.fc42 and linux-firmware-20250627-1.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 15 Fedora Update System 2025-06-30 02:46:06 UTC
FEDORA-2025-f6f8526a43 (kernel-6.15.4-100.fc41 and linux-firmware-20250627-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.