Bug 2317962 - AMD 780M GPU crash on kernel version 6.10.12-200.fc40.x86_64
Summary: AMD 780M GPU crash on kernel version 6.10.12-200.fc40.x86_64
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 40
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Christopher Atherton
QA Contact: Fedora Extras Quality Assurance
URL: https://forums.fedoraforum.org/showth...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-10-11 03:51 UTC by Sidhant Sharma
Modified: 2025-05-20 20:01 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-05-20 20:01:15 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Sidhant Sharma 2024-10-11 03:51:59 UTC
After using the system for a certain period of time, the system suddenly becomes extremely slow (refresh rate is likely to be in seconds per frame). The audio starts glitching, mouse inputs are delayed. This issue is persistent and has occurred over several times. The specifications of my system are provided below. I have also attached the kernel logs during the time of the incident.

inxi -F

System:
  Host: fedora Kernel: 6.10.12-200.fc40.x86_64 arch: x86_64 bits: 64
  Desktop: KDE Plasma v: 6.2.0 Distro: Fedora Linux 40 (Workstation Edition)
Machine:
  Type: Laptop System: LENOVO product: 21F8CTO1WW v: ThinkPad T14s Gen 4
    serial: <superuser required>
  Mobo: LENOVO model: 21F8CTO1WW v: SDK0T76461 WIN
    serial: <superuser required> UEFI: LENOVO v: R2EET40W (1.21 )
    date: 07/30/2024
Battery:
  ID-1: BAT0 charge: 55.7 Wh (96.7%) condition: 57.6/57.0 Wh (101.1%)
CPU:
  Info: 8-core model: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics bits: 64
    type: MT MCP cache: L2: 8 MiB
  Speed (MHz): avg: 1346 min/max: 400/5132 cores: 1: 2992 2: 400 3: 5065
    4: 400 5: 400 6: 400 7: 2894 8: 2893 9: 400 10: 400 11: 3294 12: 400 13: 400
    14: 400 15: 400 16: 400
Graphics:
  Device-1: AMD Phoenix1 driver: amdgpu v: kernel
  Device-2: Chicony Integrated Camera driver: uvcvideo type: USB
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 24.1.3
    compositor: kwin_wayland driver: X: loaded: modesetting dri: radeonsi
    gpu: amdgpu resolution: 1536x960
  API: EGL v: 1.5 drivers: kms_swrast,radeonsi,swrast
    platforms: gbm,wayland,x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.1.7 renderer: AMD
    Radeon 780M (radeonsi gfx1103_r1 LLVM 18.1.6 DRM 3.57
    6.10.12-200.fc40.x86_64)
  API: Vulkan v: 1.3.290 drivers: N/A surfaces: xcb,xlib,wayland
Audio:
  Device-1: AMD Rembrandt Radeon High Definition Audio driver: snd_hda_intel
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor driver: snd_pci_ps
  Device-3: AMD Family 17h/19h HD Audio driver: snd_hda_intel
  API: ALSA v: k6.10.12-200.fc40.x86_64 status: kernel-api
  Server-1: PipeWire v: 1.0.8 status: active
Network:
  Device-1: Qualcomm QCNFA765 Wireless Network Adapter driver: ath11k_pci
  IF: wlp1s0 state: up mac: 76:b0:97:59:ac:72
Drives:
  Local Storage: total: 476.94 GiB used: 36.3 GiB (7.6%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: HFS512GEJ9X162N
    size: 476.94 GiB
Partition:
  ID-1: / size: 319 GiB used: 35.85 GiB (11.2%) fs: btrfs dev: /dev/nvme0n1p6
  ID-2: /boot size: 973.4 MiB used: 406.7 MiB (41.8%) fs: ext4
    dev: /dev/nvme0n1p5
  ID-3: /boot/efi size: 256 MiB used: 54.7 MiB (21.4%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: /home size: 319 GiB used: 35.85 GiB (11.2%) fs: btrfs
    dev: /dev/nvme0n1p6
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 52.4 C mobo: N/A gpu: amdgpu temp: 47.0 C
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 32 GiB note: est. available: 30.05 GiB used: 6.24 GiB (20.8%)
  Processes: 434 Uptime: 7h 8m Shell: Bash inxi: 3.3.34

dmesg:

[24760.357712] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24760.612590] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24760.872520] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.132302] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.388513] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.674352] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.932453] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24762.190262] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24978.246335] amdgpu 0000:c3:00.0: amdgpu: GPU reset begin!
[24981.633460] amdgpu 0000:c3:00.0: amdgpu: MODE2 reset
[24981.671864] amdgpu 0000:c3:00.0: amdgpu: GPU reset succeeded, trying to resume
[24981.672541] [drm] PCIE GART of 512M enabled (table at 0x0000008000900000).
[24981.672788] [drm] VRAM is lost due to GPU reset!
[24981.672791] amdgpu 0000:c3:00.0: amdgpu: SMU is resuming...
[24981.674519] amdgpu 0000:c3:00.0: amdgpu: SMU is resumed successfully!
[24981.677105] [drm] DMUB hardware initialized: version=0x08004300
[24982.684006] [drm] kiq ring mec 3 pipe 1 q 0
[24982.685878] amdgpu 0000:c3:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[24982.686565] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[24982.686569] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[24982.686572] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[24982.686574] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[24982.686575] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[24982.686577] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[24982.686579] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[24982.686580] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[24982.686582] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[24982.686584] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[24982.686586] amdgpu 0000:c3:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[24982.686588] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[24982.686590] amdgpu 0000:c3:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[24982.698169] amdgpu 0000:c3:00.0: amdgpu: recover vram bo from shadow start
[24982.698171] amdgpu 0000:c3:00.0: amdgpu: recover vram bo from shadow done
[24982.698190] amdgpu 0000:c3:00.0: amdgpu: GPU reset(2) succeeded! 

Reproducible: Always

Steps to Reproduce:
(Unsure that these steps will reproduce the issue certainly)
1. Resuming the system from sleep/suspend mode.
2. Plug in the machine to power
3. Watching YouTube videos for a extended period of time on Firefox/Chromium and general web browsing. (1-2 hours)
Actual Results:  
The system suddenly starts glitching, extremely slow, audio starts glitching out.

Expected Results:  
The GPU driver shouldn't crash abruptly. 

A workaround fix that I've found is to reset the gpu from the following file:
sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

Comment 1 Aoife Moloney 2025-04-28 13:58:26 UTC
This message is a reminder that Fedora Linux 40 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 40 on 2025-05-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '40'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 40 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 2 Aoife Moloney 2025-05-20 20:01:15 UTC
Fedora Linux 40 entered end-of-life (EOL) status on 2025-05-13.

Fedora Linux 40 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.