Bug 2317962 - AMD 780M GPU crash on kernel version 6.10.12-200.fc40.x86_64
Summary: AMD 780M GPU crash on kernel version 6.10.12-200.fc40.x86_64
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 40
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Christopher Atherton
QA Contact: Fedora Extras Quality Assurance
URL: https://forums.fedoraforum.org/showth...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-10-11 03:51 UTC by Sidhant Sharma
Modified: 2024-10-11 04:31 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Sidhant Sharma 2024-10-11 03:51:59 UTC
After using the system for a certain period of time, the system suddenly becomes extremely slow (refresh rate is likely to be in seconds per frame). The audio starts glitching, mouse inputs are delayed. This issue is persistent and has occurred over several times. The specifications of my system are provided below. I have also attached the kernel logs during the time of the incident.

inxi -F

System:
  Host: fedora Kernel: 6.10.12-200.fc40.x86_64 arch: x86_64 bits: 64
  Desktop: KDE Plasma v: 6.2.0 Distro: Fedora Linux 40 (Workstation Edition)
Machine:
  Type: Laptop System: LENOVO product: 21F8CTO1WW v: ThinkPad T14s Gen 4
    serial: <superuser required>
  Mobo: LENOVO model: 21F8CTO1WW v: SDK0T76461 WIN
    serial: <superuser required> UEFI: LENOVO v: R2EET40W (1.21 )
    date: 07/30/2024
Battery:
  ID-1: BAT0 charge: 55.7 Wh (96.7%) condition: 57.6/57.0 Wh (101.1%)
CPU:
  Info: 8-core model: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics bits: 64
    type: MT MCP cache: L2: 8 MiB
  Speed (MHz): avg: 1346 min/max: 400/5132 cores: 1: 2992 2: 400 3: 5065
    4: 400 5: 400 6: 400 7: 2894 8: 2893 9: 400 10: 400 11: 3294 12: 400 13: 400
    14: 400 15: 400 16: 400
Graphics:
  Device-1: AMD Phoenix1 driver: amdgpu v: kernel
  Device-2: Chicony Integrated Camera driver: uvcvideo type: USB
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 24.1.3
    compositor: kwin_wayland driver: X: loaded: modesetting dri: radeonsi
    gpu: amdgpu resolution: 1536x960
  API: EGL v: 1.5 drivers: kms_swrast,radeonsi,swrast
    platforms: gbm,wayland,x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.1.7 renderer: AMD
    Radeon 780M (radeonsi gfx1103_r1 LLVM 18.1.6 DRM 3.57
    6.10.12-200.fc40.x86_64)
  API: Vulkan v: 1.3.290 drivers: N/A surfaces: xcb,xlib,wayland
Audio:
  Device-1: AMD Rembrandt Radeon High Definition Audio driver: snd_hda_intel
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor driver: snd_pci_ps
  Device-3: AMD Family 17h/19h HD Audio driver: snd_hda_intel
  API: ALSA v: k6.10.12-200.fc40.x86_64 status: kernel-api
  Server-1: PipeWire v: 1.0.8 status: active
Network:
  Device-1: Qualcomm QCNFA765 Wireless Network Adapter driver: ath11k_pci
  IF: wlp1s0 state: up mac: 76:b0:97:59:ac:72
Drives:
  Local Storage: total: 476.94 GiB used: 36.3 GiB (7.6%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: HFS512GEJ9X162N
    size: 476.94 GiB
Partition:
  ID-1: / size: 319 GiB used: 35.85 GiB (11.2%) fs: btrfs dev: /dev/nvme0n1p6
  ID-2: /boot size: 973.4 MiB used: 406.7 MiB (41.8%) fs: ext4
    dev: /dev/nvme0n1p5
  ID-3: /boot/efi size: 256 MiB used: 54.7 MiB (21.4%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: /home size: 319 GiB used: 35.85 GiB (11.2%) fs: btrfs
    dev: /dev/nvme0n1p6
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 52.4 C mobo: N/A gpu: amdgpu temp: 47.0 C
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 32 GiB note: est. available: 30.05 GiB used: 6.24 GiB (20.8%)
  Processes: 434 Uptime: 7h 8m Shell: Bash inxi: 3.3.34

dmesg:

[24760.357712] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24760.612590] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24760.872520] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.132302] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.388513] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.674352] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24761.932453] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24762.190262] amdgpu 0000:c3:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[24978.246335] amdgpu 0000:c3:00.0: amdgpu: GPU reset begin!
[24981.633460] amdgpu 0000:c3:00.0: amdgpu: MODE2 reset
[24981.671864] amdgpu 0000:c3:00.0: amdgpu: GPU reset succeeded, trying to resume
[24981.672541] [drm] PCIE GART of 512M enabled (table at 0x0000008000900000).
[24981.672788] [drm] VRAM is lost due to GPU reset!
[24981.672791] amdgpu 0000:c3:00.0: amdgpu: SMU is resuming...
[24981.674519] amdgpu 0000:c3:00.0: amdgpu: SMU is resumed successfully!
[24981.677105] [drm] DMUB hardware initialized: version=0x08004300
[24982.684006] [drm] kiq ring mec 3 pipe 1 q 0
[24982.685878] amdgpu 0000:c3:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[24982.686565] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[24982.686569] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[24982.686572] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[24982.686574] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[24982.686575] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[24982.686577] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[24982.686579] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[24982.686580] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[24982.686582] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[24982.686584] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[24982.686586] amdgpu 0000:c3:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[24982.686588] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[24982.686590] amdgpu 0000:c3:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[24982.698169] amdgpu 0000:c3:00.0: amdgpu: recover vram bo from shadow start
[24982.698171] amdgpu 0000:c3:00.0: amdgpu: recover vram bo from shadow done
[24982.698190] amdgpu 0000:c3:00.0: amdgpu: GPU reset(2) succeeded! 

Reproducible: Always

Steps to Reproduce:
(Unsure that these steps will reproduce the issue certainly)
1. Resuming the system from sleep/suspend mode.
2. Plug in the machine to power
3. Watching YouTube videos for a extended period of time on Firefox/Chromium and general web browsing. (1-2 hours)
Actual Results:  
The system suddenly starts glitching, extremely slow, audio starts glitching out.

Expected Results:  
The GPU driver shouldn't crash abruptly. 

A workaround fix that I've found is to reset the gpu from the following file:
sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover


Note You need to log in before you can comment on or make changes to this bug.