Bug 2193325 - amdgpu: GPU reset(24) failed
Summary: amdgpu: GPU reset(24) failed
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: libdrm
Version: 38
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-05 08:46 UTC by lejeczek
Modified: 2023-10-09 17:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description lejeczek 2023-05-05 08:46:25 UTC
Description of problem:

This has been the case with recent kernel versions.
Hardware is a Lenovo Thinkpad e14 g2
Seems to happen randomly, when switching windows sometimes, there is no playing games on this system.
Screens, both laptop's & external Dell begin to blink blank/black alternately with what is on the desktop, after a few seconds switch & stay black, system is non-responsive at that moment.

-> $ journalctl -l -o cat -b-1
...
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe8ca245000 CR3: 0000000038428000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 gmc_v9_0_hw_fini+0x71/0xa0 [amdgpu]
 amdgpu_device_ip_suspend_phase2+0x104/0x1a0 [amdgpu]
 ? amdgpu_device_ip_suspend_phase1+0x6f/0xe0 [amdgpu]
 amdgpu_device_ip_suspend+0x32/0x70 [amdgpu]
 amdgpu_device_pre_asic_reset+0xcf/0x290 [amdgpu]
 amdgpu_device_gpu_recover+0x4c6/0xd70 [amdgpu]
 amdgpu_job_timedout+0x19e/0x250 [amdgpu]
 drm_sched_job_timedout+0x7f/0x110 [gpu_sched]
 process_one_work+0x294/0x560
 worker_thread+0x4f/0x3a0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xf5/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
irq event stamp: 812846
hardirqs last  enabled at (812845): [<ffffffff9011a334>] _raw_spin_unlock_irq+0x24/0x50
hardirqs last disabled at (812846): [<ffffffff9010f867>] __schedule+0xe37/0x1790
softirqs last  enabled at (812776): [<ffffffff8f8601e4>] blkg_async_bio_workfn+0x74/0xe0
softirqs last disabled at (812774): [<ffffffff8f8601c8>] blkg_async_bio_workfn+0x58/0xe0
---[ end trace 0000000000000000 ]---
amdgpu 0000:05:00.0: amdgpu: MODE2 reset
amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
[drm] PCIE GART of 1024M enabled.
[drm] PTB located at 0x000000F47FC00000
[drm] PSP is resuming...
Deleting problem directory oops-2023-05-05-10:35:44-1197-1 (dup of oops-2023-05-05-10:31:29-1197-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
[drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
[drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
amdgpu 0000:05:00.0: amdgpu: GPU reset(24) failed
amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -62
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62
Deleting problem directory oops-2023-05-05-10:35:44-1197-2 (dup of oops-2023-05-05-10:31:29-1197-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Deleting problem directory oops-2023-05-05-10:35:44-1197-3 (dup of oops-2023-05-05-10:31:29-1197-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Reported 4 kernel oopses to Abrt
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=281140, emitted seq=281142
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
amdgpu 0000:05:00.0: amdgpu: Failed to disallow df cstate
Failed to get load state of poweroff.target: Connection timed out
Failed to execute poweroff operation: Connection timed out

thanks, L.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 lejeczek 2023-05-09 17:58:02 UTC
Same with 6.3.x

Stack trace of thread 5565:
#0  0x00007fe92887635d __poll (libc.so.6 + 0x10535d)
#1  0x00007fe927edf00a _xcb_conn_wait.part.0 (libxcb.so.1 + 0xe00a)
#2  0x00007fe927edf620 wait_for_reply (libxcb.so.1 + 0xe620)
#3  0x00007fe927ee055d xcb_wait_for_reply64 (libxcb.so.1 + 0xf55d)
#4  0x00007fe9289eb6dd _XReply (libX11.so.6 + 0x4d6dd)
#5  0x00007fe9289ebb55 XSync (libX11.so.6 + 0x4db55)
#6  0x00007fe929694304 gdk_flush (libgdk-3.so.0 + 0x39304)
#7  0x00007fe92900630f gtk_main (libgtk-3.so.0 + 0x20630f)
#8  0x000055a0509117ff main (gsd-xsettings + 0x87ff)
#9  0x00007fe928798b4a __libc_start_call_main (libc.so.6 + 0x27b4a)
#10 0x00007fe928798c0b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c0b)
#11 0x000055a050911d95 _start (gsd-xsettings + 0x8d95)

Stack trace of thread 5571:
#0  0x00007fe9287fa1d9 __futex_abstimed_wait_common (libc.so.6 + 0x891d9)
#1  0x00007fe9287fcb79 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bb79)
#2  0x00007fe917f113fd cnd_wait (radeonsi_dri.so + 0x1113fd)
#3  0x00007fe917ec05bb util_queue_thread_func (radeonsi_dri.so + 0xc05bb)
#4  0x00007fe917f1132c impl_thrd_routine (radeonsi_dri.so + 0x11132c)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)

Stack trace of thread 5572:
#0  0x00007fe9287fa1d9 __futex_abstimed_wait_common (libc.so.6 + 0x891d9)
#1  0x00007fe9287fcb79 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bb79)
#2  0x00007fe917f113fd cnd_wait (radeonsi_dri.so + 0x1113fd)
#3  0x00007fe917ec05bb util_queue_thread_func (radeonsi_dri.so + 0xc05bb)
#4  0x00007fe917f1132c impl_thrd_routine (radeonsi_dri.so + 0x11132c)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)

Stack trace of thread 5576:
#0  0x00007fe92887635d __poll (libc.so.6 + 0x10535d)
#1  0x00007fe928b9f3a9 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba3a9)
#2  0x00007fe928b4099f g_main_loop_run (libglib-2.0.so.0 + 0x5b99f)
#3  0x00007fe928d48472 gdbus_shared_thread_func.lto_priv.0 (libgio-2.0.so.0 + 0x11a472)
#4  0x00007fe928b6f893 g_thread_proxy (libglib-2.0.so.0 + 0x8a893)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)

Stack trace of thread 5574:
#0  0x00007fe92887635d __poll (libc.so.6 + 0x10535d)
#1  0x00007fe928b9f3a9 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba3a9)
#2  0x00007fe928b3ea23 g_main_context_iteration (libglib-2.0.so.0 + 0x59a23)
#3  0x00007fe928b3ea79 glib_worker_main (libglib-2.0.so.0 + 0x59a79)
#4  0x00007fe928b6f893 g_thread_proxy (libglib-2.0.so.0 + 0x8a893)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)

Stack trace of thread 5573:
#0  0x00007fe9287fa1d9 __futex_abstimed_wait_common (libc.so.6 + 0x891d9)
#1  0x00007fe9287fcb79 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bb79)
#2  0x00007fe917f113fd cnd_wait (radeonsi_dri.so + 0x1113fd)
#3  0x00007fe917ec05bb util_queue_thread_func (radeonsi_dri.so + 0xc05bb)
#4  0x00007fe917f1132c impl_thrd_routine (radeonsi_dri.so + 0x11132c)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)

Stack trace of thread 5577:
#0  0x00007fe92887635d __poll (libc.so.6 + 0x10535d)
#1  0x00007fe928b9f3a9 g_main_context_iterate.isra.0 (libglib-2.0.so.0 + 0xba3a9)
#2  0x00007fe928b3ea23 g_main_context_iteration (libglib-2.0.so.0 + 0x59a23)
#3  0x00007fe9276075c5 dconf_gdbus_worker_thread (libdconfsettings.so + 0x75c5)
#4  0x00007fe928b6f893 g_thread_proxy (libglib-2.0.so.0 + 0x8a893)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)

Stack trace of thread 5575:
#0  0x00007fe92887bb5d syscall (libc.so.6 + 0x10ab5d)
#1  0x00007fe928b965ee g_cond_wait (libglib-2.0.so.0 + 0xb15ee)
#2  0x00007fe928b0c04b g_async_queue_pop_intern_unlocked (libglib-2.0.so.0 + 0x2704b)
#3  0x00007fe928b71473 g_thread_pool_spawn_thread (libglib-2.0.so.0 + 0x8c473)
#4  0x00007fe928b6f893 g_thread_proxy (libglib-2.0.so.0 + 0x8a893)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)

Stack trace of thread 5570:
#0  0x00007fe9287fa1d9 __futex_abstimed_wait_common (libc.so.6 + 0x891d9)
#1  0x00007fe9287fcb79 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8bb79)
#2  0x00007fe917f113fd cnd_wait (radeonsi_dri.so + 0x1113fd)
#3  0x00007fe917ec05bb util_queue_thread_func (radeonsi_dri.so + 0xc05bb)
#4  0x00007fe917f1132c impl_thrd_routine (radeonsi_dri.so + 0x11132c)
#5  0x00007fe9287fd907 start_thread (libc.so.6 + 0x8c907)
#6  0x00007fe928883870 __clone3 (libc.so.6 + 0x112870)
ELF object binary architecture: AMD x86-64

systemd-coredump: Deactivated successfully.
SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-79306-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
systemd-coredump: Consumed 1.124s CPU time.
BPF prog-id=135 op=UNLOAD
BPF prog-id=134 op=UNLOAD
BPF prog-id=133 op=UNLOAD
[drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
[drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
[drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
amdgpu 0000:05:00.0: amdgpu: Secure display: Generic Failure.
amdgpu 0000:05:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
amdgpu 0000:05:00.0: amdgpu: SMU is resuming...
amdgpu 0000:05:00.0: amdgpu: dpm has been disabled
amdgpu 0000:05:00.0: amdgpu: SMU is resumed successfully!
[drm] DMUB hardware initialized: version=0x01010026
abrt-dump-journal-oops: Found oopses: 6
abrt-dump-journal-oops: Creating problem directories
[drm] kiq ring mec 2 pipe 1 q 0
amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ enable failed
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
amdgpu 0000:05:00.0: amdgpu: GPU reset(7) failed
amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
AVC avc:  denied  { read } for  pid=79336 comm="gdb" name="renderD128" dev="devtmpfs" ino=557 scontext=system_u:system_r:abrt_t:s0-s0:c0.c1023 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0
AVC avc:  denied  { read } for  pid=79336 comm="gdb" name="renderD128" dev="devtmpfs" ino=557 scontext=system_u:system_r:abrt_t:s0-s0:c0.c1023 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0
AVC avc:  denied  { read } for  pid=79336 comm="gdb" name="renderD128" dev="devtmpfs" ino=557 scontext=system_u:system_r:abrt_t:s0-s0:c0.c1023 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0
Deleting problem directory ccpp-2023-05-09-19:52:53.152980-5565 (dup of ccpp-2023-05-08-17:44:27.51644-6131)
Starting gvfs-daemon.service - Virtual filesystem service...
Started gvfs-daemon.service - Virtual filesystem service.
org.gnome.SettingsDaemon.XSettings.service: State 'stop-watchdog' timed out. Killing.
org.gnome.SettingsDaemon.XSettings.service: Killing process 5565 (gsd-xsettings) with signal SIGKILL.
org.gnome.SettingsDaemon.XSettings.service: Killing process 5571 (gsd-xse:disk$0) with signal SIGKILL.
Process 6131 (gsd-xsettings) crashed in __poll()
Deleting problem directory oops-2023-05-09-19:52:53-1132-0 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Deleting problem directory oops-2023-05-09-19:52:53-1132-1 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Reported 6 kernel oopses to Abrt
Deleting problem directory oops-2023-05-09-19:52:53-1132-2 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Deleting problem directory oops-2023-05-09-19:52:53-1132-3 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Deleting problem directory oops-2023-05-09-19:52:53-1132-4 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
org.gnome.SettingsDaemon.XSettings.service: Processes still around after SIGKILL. Ignoring.
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=125932, emitted seq=125934
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
------------[ cut here ]------------
WARNING: CPU: 4 PID: 78677 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_masq nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dummy nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr snd_sof_amd_rembrandt intel_rapl_common snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_ctl_led edac_mce_amd snd_sof snd_hda_codec_realtek snd_hda_codec_generic snd_sof_utils kvm_amd iwlmvm snd_hda_codec_hdmi uvcvideo kvm uvc snd_seq_midi snd_soc_core snd_seq_midi_event mac80211 irqbypass videobuf2_vmalloc videobuf2_memops snd_hda_intel snd_compress snd_intel_dspcfg snd_intel_sdw_acpi rapl videobuf2_v4l2 ac97_bus snd_pcm_dmaengine snd_hda_codec btusb libarc4 snd_usb_audio videobuf2_common btrtl snd_hda_core snd_pci_ps snd_rpl_pci_acp6x btbcm videodev
 snd_usbmidi_lib btintel snd_hwdep snd_pci_acp6x think_lmi snd_seq snd_rawmidi btmtk firmware_attributes_class wmi_bmof mc snd_pci_acp5x bluetooth iwlwifi snd_seq_device snd_rn_pci_acp3x snd_acp_config thinkpad_acpi k10temp snd_pcm snd_soc_acpi cfg80211 snd_pci_acp3x i2c_piix4 ledtrig_audio platform_profile vfat snd_timer rfkill snd soundcore fat i2c_scmi acpi_cpufreq joydev squashfs loop zram dm_crypt amdgpu crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_ttm_helper ttm crc32c_intel iommu_v2 polyval_clmulni polyval_generic drm_buddy gpu_sched ghash_clmulni_intel hid_lenovo sha512_ssse3 ccp drm_display_helper nvme cec video nvme_core sp5100_tco nvme_common r8169 ucsi_acpi typec_ucsi typec wmi serio_raw ip6_tables ip_tables nbd fuse
CPU: 4 PID: 78677 Comm: kworker/u12:35 Tainted: G        W          6.3.1-200.fc38.x86_64+debug #1
Hardware name: LENOVO 20T6S00W00/20T6S00W00, BIOS R1AET45W (1.21 ) 11/30/2022
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
RIP: 0010:amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Code: 11 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ee b2 5b ea 44 89 ea 48 89 de 4c 89 e7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 12 f6 ff ff <0f> 0b b8 ea ff ff ff eb d0 48 89 ef e8 c1 af 06 e8 eb 8b e8 5a af
RSP: 0018:ffffc90020437928 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881532d77d0 RCX: ffffffffc0abc5e6
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff888176e99a28
RBP: ffff888176e99a28 R08: 0000000000000000 R09: 0000000000000003
R10: ffffed102edd3345 R11: 0000000000000000 R12: ffff8881532c0000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881532d77d8
FS:  0000000000000000(0000) GS:ffff888331a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7dc43402e0 CR3: 0000000125846000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 sdma_v4_0_hw_fini+0xa8/0x170 [amdgpu]
 amdgpu_device_ip_suspend_phase2+0x133/0x890 [amdgpu]
 amdgpu_device_ip_suspend+0x5e/0xd0 [amdgpu]
 amdgpu_device_pre_asic_reset+0x1df/0x820 [amdgpu]
 amdgpu_device_gpu_recover+0xb7b/0x2530 [amdgpu]
 ? __drm_err+0xe4/0x120
 ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x66/0x80
 amdgpu_job_timedout+0x43d/0x780 [amdgpu]
 ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu]
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 drm_sched_job_timedout+0x1be/0x4d0 [gpu_sched]
 process_one_work+0x87f/0x1440
 ? worker_thread+0x2b2/0x12c0
 ? __pfx_process_one_work+0x10/0x10
 ? lock_acquired+0x355/0xa00
 worker_thread+0xfb/0x12c0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a2/0x340
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
irq event stamp: 184
hardirqs last  enabled at (183): [<ffffffffab073718>] _raw_spin_unlock_irq+0x28/0x60
hardirqs last disabled at (184): [<ffffffffab05a8df>] __schedule+0x2cff/0x5c80
softirqs last  enabled at (0): [<ffffffffa822fb09>] copy_process+0x1e39/0x6860
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 4 PID: 78677 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_masq nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dummy nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr snd_sof_amd_rembrandt intel_rapl_common snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_ctl_led edac_mce_amd snd_sof snd_hda_codec_realtek snd_hda_codec_generic snd_sof_utils kvm_amd iwlmvm snd_hda_codec_hdmi uvcvideo kvm uvc snd_seq_midi snd_soc_core snd_seq_midi_event mac80211 irqbypass videobuf2_vmalloc videobuf2_memops snd_hda_intel snd_compress snd_intel_dspcfg snd_intel_sdw_acpi rapl videobuf2_v4l2 ac97_bus snd_pcm_dmaengine snd_hda_codec btusb libarc4 snd_usb_audio videobuf2_common btrtl snd_hda_core snd_pci_ps snd_rpl_pci_acp6x btbcm videodev
 snd_usbmidi_lib btintel snd_hwdep snd_pci_acp6x think_lmi snd_seq snd_rawmidi btmtk firmware_attributes_class wmi_bmof mc snd_pci_acp5x bluetooth iwlwifi snd_seq_device snd_rn_pci_acp3x snd_acp_config thinkpad_acpi k10temp snd_pcm snd_soc_acpi cfg80211 snd_pci_acp3x i2c_piix4 ledtrig_audio platform_profile vfat snd_timer rfkill snd soundcore fat i2c_scmi acpi_cpufreq joydev squashfs loop zram dm_crypt amdgpu crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_ttm_helper ttm crc32c_intel iommu_v2 polyval_clmulni polyval_generic drm_buddy gpu_sched ghash_clmulni_intel hid_lenovo sha512_ssse3 ccp drm_display_helper nvme cec video nvme_core sp5100_tco nvme_common r8169 ucsi_acpi typec_ucsi typec wmi serio_raw ip6_tables ip_tables nbd fuse
CPU: 4 PID: 78677 Comm: kworker/u12:35 Tainted: G        W          6.3.1-200.fc38.x86_64+debug #1
Hardware name: LENOVO 20T6S00W00/20T6S00W00, BIOS R1AET45W (1.21 ) 11/30/2022
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
RIP: 0010:amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Code: 11 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ee b2 5b ea 44 89 ea 48 89 de 4c 89 e7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 12 f6 ff ff <0f> 0b b8 ea ff ff ff eb d0 48 89 ef e8 c1 af 06 e8 eb 8b e8 5a af
RSP: 0018:ffffc900204378f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881532d01f8 RCX: ffffffffc0abc5e6
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff888176e99848
RBP: ffff888176e99848 R08: 0000000000000000 R09: ffff888176e9984b
R10: ffffed102edd3309 R11: 0000000000000000 R12: ffff8881532c0000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881532d0200
FS:  0000000000000000(0000) GS:ffff888331a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7dc43402e0 CR3: 0000000125846000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 gfx_v9_0_hw_fini+0x28/0x1920 [amdgpu]
 amdgpu_device_ip_suspend_phase2+0x133/0x890 [amdgpu]
 amdgpu_device_ip_suspend+0x5e/0xd0 [amdgpu]
 amdgpu_device_pre_asic_reset+0x1df/0x820 [amdgpu]
 amdgpu_device_gpu_recover+0xb7b/0x2530 [amdgpu]
 ? __drm_err+0xe4/0x120
 ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x66/0x80
 amdgpu_job_timedout+0x43d/0x780 [amdgpu]
 ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu]
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 drm_sched_job_timedout+0x1be/0x4d0 [gpu_sched]
 process_one_work+0x87f/0x1440
 ? worker_thread+0x2b2/0x12c0
 ? __pfx_process_one_work+0x10/0x10
 ? lock_acquired+0x355/0xa00
 worker_thread+0xfb/0x12c0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a2/0x340
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
irq event stamp: 184
hardirqs last  enabled at (183): [<ffffffffab073718>] _raw_spin_unlock_irq+0x28/0x60
hardirqs last disabled at (184): [<ffffffffab05a8df>] __schedule+0x2cff/0x5c80
softirqs last  enabled at (0): [<ffffffffa822fb09>] copy_process+0x1e39/0x6860
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 4 PID: 78677 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_masq nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dummy nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr snd_sof_amd_rembrandt intel_rapl_common snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_ctl_led edac_mce_amd snd_sof snd_hda_codec_realtek snd_hda_codec_generic snd_sof_utils kvm_amd iwlmvm snd_hda_codec_hdmi uvcvideo kvm uvc snd_seq_midi snd_soc_core snd_seq_midi_event mac80211 irqbypass videobuf2_vmalloc videobuf2_memops snd_hda_intel snd_compress snd_intel_dspcfg snd_intel_sdw_acpi rapl videobuf2_v4l2 ac97_bus snd_pcm_dmaengine snd_hda_codec btusb libarc4 snd_usb_audio videobuf2_common btrtl snd_hda_core snd_pci_ps snd_rpl_pci_acp6x btbcm videodev
 snd_usbmidi_lib btintel snd_hwdep snd_pci_acp6x think_lmi snd_seq snd_rawmidi btmtk firmware_attributes_class wmi_bmof mc snd_pci_acp5x bluetooth iwlwifi snd_seq_device snd_rn_pci_acp3x snd_acp_config thinkpad_acpi k10temp snd_pcm snd_soc_acpi cfg80211 snd_pci_acp3x i2c_piix4 ledtrig_audio platform_profile vfat snd_timer rfkill snd soundcore fat i2c_scmi acpi_cpufreq joydev squashfs loop zram dm_crypt amdgpu crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_ttm_helper ttm crc32c_intel iommu_v2 polyval_clmulni polyval_generic drm_buddy gpu_sched ghash_clmulni_intel hid_lenovo sha512_ssse3 ccp drm_display_helper nvme cec video nvme_core sp5100_tco nvme_common r8169 ucsi_acpi typec_ucsi typec wmi serio_raw ip6_tables ip_tables nbd fuse
CPU: 4 PID: 78677 Comm: kworker/u12:35 Tainted: G        W          6.3.1-200.fc38.x86_64+debug #1
Hardware name: LENOVO 20T6S00W00/20T6S00W00, BIOS R1AET45W (1.21 ) 11/30/2022
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
RIP: 0010:amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Code: 11 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ee b2 5b ea 44 89 ea 48 89 de 4c 89 e7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 12 f6 ff ff <0f> 0b b8 ea ff ff ff eb d0 48 89 ef e8 c1 af 06 e8 eb 8b e8 5a af
RSP: 0018:ffffc900204378f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881532d01c8 RCX: ffffffffc0abc5e6
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff888176e99140
RBP: ffff888176e99140 R08: 0000000000000000 R09: 0000000000000003
R10: ffffed102edd3228 R11: 0000000000000000 R12: ffff8881532c0000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881532d01d0
FS:  0000000000000000(0000) GS:ffff888331a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7dc43402e0 CR3: 0000000125846000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 gfx_v9_0_hw_fini+0x39/0x1920 [amdgpu]
 amdgpu_device_ip_suspend_phase2+0x133/0x890 [amdgpu]
 amdgpu_device_ip_suspend+0x5e/0xd0 [amdgpu]
 amdgpu_device_pre_asic_reset+0x1df/0x820 [amdgpu]
 amdgpu_device_gpu_recover+0xb7b/0x2530 [amdgpu]
 ? __drm_err+0xe4/0x120
 ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x66/0x80
 amdgpu_job_timedout+0x43d/0x780 [amdgpu]
 ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu]
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 drm_sched_job_timedout+0x1be/0x4d0 [gpu_sched]
 process_one_work+0x87f/0x1440
 ? worker_thread+0x2b2/0x12c0
 ? __pfx_process_one_work+0x10/0x10
 ? lock_acquired+0x355/0xa00
 worker_thread+0xfb/0x12c0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a2/0x340
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
irq event stamp: 184
hardirqs last  enabled at (183): [<ffffffffab073718>] _raw_spin_unlock_irq+0x28/0x60
hardirqs last disabled at (184): [<ffffffffab05a8df>] __schedule+0x2cff/0x5c80
softirqs last  enabled at (0): [<ffffffffa822fb09>] copy_process+0x1e39/0x6860
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 4 PID: 78677 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_masq nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dummy nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr snd_sof_amd_rembrandt intel_rapl_common snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_ctl_led edac_mce_amd snd_sof snd_hda_codec_realtek snd_hda_codec_generic snd_sof_utils kvm_amd iwlmvm snd_hda_codec_hdmi uvcvideo kvm uvc snd_seq_midi snd_soc_core snd_seq_midi_event mac80211 irqbypass videobuf2_vmalloc videobuf2_memops snd_hda_intel snd_compress snd_intel_dspcfg snd_intel_sdw_acpi rapl videobuf2_v4l2 ac97_bus snd_pcm_dmaengine snd_hda_codec btusb libarc4 snd_usb_audio videobuf2_common btrtl snd_hda_core snd_pci_ps snd_rpl_pci_acp6x btbcm videodev
 snd_usbmidi_lib btintel snd_hwdep snd_pci_acp6x think_lmi snd_seq snd_rawmidi btmtk firmware_attributes_class wmi_bmof mc snd_pci_acp5x bluetooth iwlwifi snd_seq_device snd_rn_pci_acp3x snd_acp_config thinkpad_acpi k10temp snd_pcm snd_soc_acpi cfg80211 snd_pci_acp3x i2c_piix4 ledtrig_audio platform_profile vfat snd_timer rfkill snd soundcore fat i2c_scmi acpi_cpufreq joydev squashfs loop zram dm_crypt amdgpu crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_ttm_helper ttm crc32c_intel iommu_v2 polyval_clmulni polyval_generic drm_buddy gpu_sched ghash_clmulni_intel hid_lenovo sha512_ssse3 ccp drm_display_helper nvme cec video nvme_core sp5100_tco nvme_common r8169 ucsi_acpi typec_ucsi typec wmi serio_raw ip6_tables ip_tables nbd fuse
CPU: 4 PID: 78677 Comm: kworker/u12:35 Tainted: G        W          6.3.1-200.fc38.x86_64+debug #1
Hardware name: LENOVO 20T6S00W00/20T6S00W00, BIOS R1AET45W (1.21 ) 11/30/2022
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
RIP: 0010:amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Code: 11 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ee b2 5b ea 44 89 ea 48 89 de 4c 89 e7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 12 f6 ff ff <0f> 0b b8 ea ff ff ff eb d0 48 89 ef e8 c1 af 06 e8 eb 8b e8 5a af
RSP: 0018:ffffc900204378f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881532d01e0 RCX: ffffffffc0abc5e6
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff888176e99aa0
RBP: ffff888176e99aa0 R08: 0000000000000000 R09: 0000000000000003
R10: ffffed102edd3354 R11: 0000000000000000 R12: ffff8881532c0000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881532d01e8
FS:  0000000000000000(0000) GS:ffff888331a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7dc43402e0 CR3: 0000000125846000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 gfx_v9_0_hw_fini+0x4a/0x1920 [amdgpu]
 amdgpu_device_ip_suspend_phase2+0x133/0x890 [amdgpu]
 amdgpu_device_ip_suspend+0x5e/0xd0 [amdgpu]
 amdgpu_device_pre_asic_reset+0x1df/0x820 [amdgpu]
 amdgpu_device_gpu_recover+0xb7b/0x2530 [amdgpu]
 ? __drm_err+0xe4/0x120
 ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x66/0x80
 amdgpu_job_timedout+0x43d/0x780 [amdgpu]
 ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu]
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 drm_sched_job_timedout+0x1be/0x4d0 [gpu_sched]
 process_one_work+0x87f/0x1440
 ? worker_thread+0x2b2/0x12c0
 ? __pfx_process_one_work+0x10/0x10
 ? lock_acquired+0x355/0xa00
 worker_thread+0xfb/0x12c0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a2/0x340
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
irq event stamp: 184
hardirqs last  enabled at (183): [<ffffffffab073718>] _raw_spin_unlock_irq+0x28/0x60
hardirqs last disabled at (184): [<ffffffffab05a8df>] __schedule+0x2cff/0x5c80
softirqs last  enabled at (0): [<ffffffffa822fb09>] copy_process+0x1e39/0x6860
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
[drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
------------[ cut here ]------------
WARNING: CPU: 5 PID: 78677 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_masq nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dummy nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr snd_sof_amd_rembrandt intel_rapl_common snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_ctl_led edac_mce_amd snd_sof snd_hda_codec_realtek snd_hda_codec_generic snd_sof_utils kvm_amd iwlmvm snd_hda_codec_hdmi uvcvideo kvm uvc snd_seq_midi snd_soc_core snd_seq_midi_event mac80211 irqbypass videobuf2_vmalloc videobuf2_memops snd_hda_intel snd_compress snd_intel_dspcfg snd_intel_sdw_acpi rapl videobuf2_v4l2 ac97_bus snd_pcm_dmaengine snd_hda_codec btusb libarc4 snd_usb_audio videobuf2_common btrtl snd_hda_core snd_pci_ps snd_rpl_pci_acp6x btbcm videodev
 snd_usbmidi_lib btintel snd_hwdep snd_pci_acp6x think_lmi snd_seq snd_rawmidi btmtk firmware_attributes_class wmi_bmof mc snd_pci_acp5x bluetooth iwlwifi snd_seq_device snd_rn_pci_acp3x snd_acp_config thinkpad_acpi k10temp snd_pcm snd_soc_acpi cfg80211 snd_pci_acp3x i2c_piix4 ledtrig_audio platform_profile vfat snd_timer rfkill snd soundcore fat i2c_scmi acpi_cpufreq joydev squashfs loop zram dm_crypt amdgpu crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_ttm_helper ttm crc32c_intel iommu_v2 polyval_clmulni polyval_generic drm_buddy gpu_sched ghash_clmulni_intel hid_lenovo sha512_ssse3 ccp drm_display_helper nvme cec video nvme_core sp5100_tco nvme_common r8169 ucsi_acpi typec_ucsi typec wmi serio_raw ip6_tables ip_tables nbd fuse
CPU: 5 PID: 78677 Comm: kworker/u12:35 Tainted: G        W          6.3.1-200.fc38.x86_64+debug #1
Hardware name: LENOVO 20T6S00W00/20T6S00W00, BIOS R1AET45W (1.21 ) 11/30/2022
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
RIP: 0010:amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Code: 11 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ee b2 5b ea 44 89 ea 48 89 de 4c 89 e7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 12 f6 ff ff <0f> 0b b8 ea ff ff ff eb d0 48 89 ef e8 c1 af 06 e8 eb 8b e8 5a af
RSP: 0018:ffffc90020437938 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881532c30e8 RCX: ffffffffc0abc5e6
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8881060fe5c8
RBP: ffff8881060fe5c8 R08: 0000000000000000 R09: 0000000000000003
R10: ffffed1020c1fcb9 R11: 0000000000000000 R12: ffff8881532c0000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881532c30f0
FS:  0000000000000000(0000) GS:ffff888331e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa971a09754 CR3: 0000000249a70000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 gmc_v9_0_hw_fini+0x189/0x220 [amdgpu]
 amdgpu_device_ip_suspend_phase2+0x133/0x890 [amdgpu]
 amdgpu_device_ip_suspend+0x5e/0xd0 [amdgpu]
 amdgpu_device_pre_asic_reset+0x1df/0x820 [amdgpu]
 amdgpu_device_gpu_recover+0xb7b/0x2530 [amdgpu]
 ? __drm_err+0xe4/0x120
 ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x66/0x80
 amdgpu_job_timedout+0x43d/0x780 [amdgpu]
 ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu]
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 drm_sched_job_timedout+0x1be/0x4d0 [gpu_sched]
 process_one_work+0x87f/0x1440
 ? worker_thread+0x2b2/0x12c0
 ? __pfx_process_one_work+0x10/0x10
 ? lock_acquired+0x355/0xa00
 worker_thread+0xfb/0x12c0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a2/0x340
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
irq event stamp: 184
hardirqs last  enabled at (183): [<ffffffffab073718>] _raw_spin_unlock_irq+0x28/0x60
hardirqs last disabled at (184): [<ffffffffab05a8df>] __schedule+0x2cff/0x5c80
softirqs last  enabled at (0): [<ffffffffa822fb09>] copy_process+0x1e39/0x6860
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 5 PID: 78677 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_masq nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dummy nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr snd_sof_amd_rembrandt intel_rapl_common snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_ctl_led edac_mce_amd snd_sof snd_hda_codec_realtek snd_hda_codec_generic snd_sof_utils kvm_amd iwlmvm snd_hda_codec_hdmi uvcvideo kvm uvc snd_seq_midi snd_soc_core snd_seq_midi_event mac80211 irqbypass videobuf2_vmalloc videobuf2_memops snd_hda_intel snd_compress snd_intel_dspcfg snd_intel_sdw_acpi rapl videobuf2_v4l2 ac97_bus snd_pcm_dmaengine snd_hda_codec btusb libarc4 snd_usb_audio videobuf2_common btrtl snd_hda_core snd_pci_ps snd_rpl_pci_acp6x btbcm videodev
 snd_usbmidi_lib btintel snd_hwdep snd_pci_acp6x think_lmi snd_seq snd_rawmidi btmtk firmware_attributes_class wmi_bmof mc snd_pci_acp5x bluetooth iwlwifi snd_seq_device snd_rn_pci_acp3x snd_acp_config thinkpad_acpi k10temp snd_pcm snd_soc_acpi cfg80211 snd_pci_acp3x i2c_piix4 ledtrig_audio platform_profile vfat snd_timer rfkill snd soundcore fat i2c_scmi acpi_cpufreq joydev squashfs loop zram dm_crypt amdgpu crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_ttm_helper ttm crc32c_intel iommu_v2 polyval_clmulni polyval_generic drm_buddy gpu_sched ghash_clmulni_intel hid_lenovo sha512_ssse3 ccp drm_display_helper nvme cec video nvme_core sp5100_tco nvme_common r8169 ucsi_acpi typec_ucsi typec wmi serio_raw ip6_tables ip_tables nbd fuse
CPU: 5 PID: 78677 Comm: kworker/u12:35 Tainted: G        W          6.3.1-200.fc38.x86_64+debug #1
Hardware name: LENOVO 20T6S00W00/20T6S00W00, BIOS R1AET45W (1.21 ) 11/30/2022
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
RIP: 0010:amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Code: 11 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ee b2 5b ea 44 89 ea 48 89 de 4c 89 e7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 12 f6 ff ff <0f> 0b b8 ea ff ff ff eb d0 48 89 ef e8 c1 af 06 e8 eb 8b e8 5a af
RSP: 0018:ffffc90020437938 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881532c17d8 RCX: ffffffffc0abc5e6
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8881060fe4b0
RBP: ffff8881060fe4b0 R08: 0000000000000000 R09: 0000000000000003
R10: ffffed1020c1fc96 R11: 0000000000000000 R12: ffff8881532c0000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881532c17e0
FS:  0000000000000000(0000) GS:ffff888331e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa971a09754 CR3: 0000000249a70000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 gmc_v9_0_hw_fini+0x19a/0x220 [amdgpu]
 amdgpu_device_ip_suspend_phase2+0x133/0x890 [amdgpu]
 amdgpu_device_ip_suspend+0x5e/0xd0 [amdgpu]
 amdgpu_device_pre_asic_reset+0x1df/0x820 [amdgpu]
 amdgpu_device_gpu_recover+0xb7b/0x2530 [amdgpu]
 ? __drm_err+0xe4/0x120
 ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x66/0x80
 amdgpu_job_timedout+0x43d/0x780 [amdgpu]
 ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu]
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 drm_sched_job_timedout+0x1be/0x4d0 [gpu_sched]
 process_one_work+0x87f/0x1440
 ? worker_thread+0x2b2/0x12c0
 ? __pfx_process_one_work+0x10/0x10
 ? lock_acquired+0x355/0xa00
 worker_thread+0xfb/0x12c0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2a2/0x340
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
irq event stamp: 184
hardirqs last  enabled at (183): [<ffffffffab073718>] _raw_spin_unlock_irq+0x28/0x60
hardirqs last disabled at (184): [<ffffffffab05a8df>] __schedule+0x2cff/0x5c80
softirqs last  enabled at (0): [<ffffffffa822fb09>] copy_process+0x1e39/0x6860
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
amdgpu 0000:05:00.0: amdgpu: MODE2 reset
amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
[drm] PCIE GART of 1024M enabled.
[drm] PTB located at 0x000000F47FC00000
[drm] PSP is resuming...
[drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
[drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
[drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
amdgpu 0000:05:00.0: amdgpu: Secure display: Generic Failure.
amdgpu 0000:05:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
amdgpu 0000:05:00.0: amdgpu: SMU is resuming...
amdgpu 0000:05:00.0: amdgpu: dpm has been disabled
amdgpu 0000:05:00.0: amdgpu: SMU is resumed successfully!
[drm] DMUB hardware initialized: version=0x01010026
abrt-dump-journal-oops: Found oopses: 6
abrt-dump-journal-oops: Creating problem directories
[drm] kiq ring mec 2 pipe 1 q 0
amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ enable failed
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
amdgpu 0000:05:00.0: amdgpu: GPU reset(8) failed
amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Deleting problem directory oops-2023-05-09-19:53:06-1132-0 (dup of oops-2023-05-09-19:52:16-1132-0)
org.gnome.SettingsDaemon.XSettings.service: State 'stop-post' timed out. Aborting.
org.gnome.SettingsDaemon.XSettings.service: Killing process 5565 (gsd-xsettings) with signal SIGABRT.
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Deleting problem directory oops-2023-05-09-19:53:06-1132-1 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Deleting problem directory oops-2023-05-09-19:53:06-1132-2 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Deleting problem directory oops-2023-05-09-19:53:06-1132-3 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
Reported 6 kernel oopses to Abrt
Deleting problem directory oops-2023-05-09-19:53:06-1132-4 (dup of oops-2023-05-09-19:52:16-1132-0)
System encountered a non-fatal error in sdma_v4_0_hw_fini()
org.gnome.SettingsDaemon.XSettings.service: State 'final-watchdog' timed out. Killing.
org.gnome.SettingsDaemon.XSettings.service: Killing process 5565 (gsd-xsettings) with signal SIGKILL.
org.gnome.SettingsDaemon.XSettings.service: Killing process 5571 (gsd-xse:disk$0) with signal SIGKILL.

Comment 2 Phil Smith 2023-05-11 23:35:19 UTC
I think this is a serious X problem affecting all(?) AMD Ryzen with Radeon graphics including the recent Ryzen Lenovos, running kernels (approx.) 6.0 and later.
See
   https://gitlab.freedesktop.org/drm/amd/-/issues/2220
Search for
   kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout
in the system logs.
X crashes every day or few days depending on graphics activity (often when using a web browser)
or crashes really soon with
   https://testdrive-archive.azurewebsites.net/graphics/webglstresstest/
See also
   https://bugzilla.redhat.com/show_bug.cgi?id=2193110

Comment 3 lejeczek 2023-05-12 06:17:11 UTC
I get reports, if not a crush, every day, ABRT notices it but cannot form a useful, complete report.
Yes, it is badly critical & critically bad - I wonder, I really do, if there is a single employee at whole AMD, a developer, who actually works and develops on AMD own product such as a laptop, hooked in to external screen & keyboard.

This/similar is what I see every day:
-> $ abrt i --pretty full cf4deb3
Id            cf4deb3  
Component     kernel  
Count         1  
Time          2023-05-12 06:35:47  
Command line  BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.3.1-200.fc38.x86_64+debug root=UUID=494d8fe6-4571-4258-96c3-1d6cf72e2cb8 ro rootflags=subvol=root rd.luks.uuid=luks-7ac1c56e-1e3f-4b2e-bc5b-10ad26d54dcf rhgb quiet  
Package       kernel-debug-core-6.3.1-200.fc38  
Path          /var/spool/abrt/oops-2023-05-12-06:35:45-1159-2  
              Not reportable  
              The backtrace does not contain enough meaningful function frames to be reported. It is annoying but it does not necessarily indicate a problem with your computer. ABRT will not allow you to create a report in a bug tracking system but you can contact kernel maintainers via e-mail.

-> $ abrt bt cf4deb3
WARNING: CPU: 5 PID: 12770 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Modules linked in: tls uinput rfcomm snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nft_masq nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack dummy nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr snd_sof_amd_rembrandt snd_sof_amd_renoir intel_rapl_common snd_sof_amd_acp edac_mce_amd iwlmvm snd_sof_pci snd_ctl_led snd_hda_codec_realtek snd_sof_xtensa_dsp snd_sof snd_hda_codec_generic kvm_amd mac80211 uvcvideo snd_sof_utils btusb btrtl snd_hda_codec_hdmi uvc btbcm snd_soc_core kvm btintel videobuf2_vmalloc videobuf2_memops libarc4 snd_compress snd_hda_intel ac97_bus videobuf2_v4l2 snd_pcm_dmaengine videobuf2_common snd_intel_dspcfg irqbypass snd_intel_sdw_acpi snd_hda_codec btmtk videodev rapl snd_pci_ps snd_hda_core iwlwifi snd_rpl_pci_acp6x bluetooth mc
 snd_pci_acp6x snd_hwdep snd_seq snd_pci_acp5x thinkpad_acpi think_lmi wmi_bmof firmware_attributes_class snd_seq_device cfg80211 ledtrig_audio snd_pcm k10temp snd_rn_pci_acp3x snd_acp_config snd_soc_acpi platform_profile i2c_piix4 snd_timer snd_pci_acp3x snd rfkill soundcore vfat fat i2c_scmi acpi_cpufreq joydev squashfs loop zram dm_crypt amdgpu i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul iommu_v2 crc32_pclmul crc32c_intel polyval_clmulni polyval_generic drm_buddy hid_lenovo gpu_sched ghash_clmulni_intel sha512_ssse3 drm_display_helper nvme ccp r8169 cec nvme_core nvme_common sp5100_tco video ucsi_acpi typec_ucsi typec wmi serio_raw ip6_tables ip_tables nbd fuse
CPU: 5 PID: 12770 Comm: kworker/u12:1 Tainted: G        W          6.3.1-200.fc38.x86_64+debug #1
Hardware name: LENOVO 20T6S00W00/20T6S00W00, BIOS R1AET45W (1.21 ) 11/30/2022
Workqueue: events_unbound async_run_entry_fn
RIP: 0010:amdgpu_irq_put+0x18e/0x1f0 [amdgpu]
Code: 11 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ee 82 3b d5 44 89 ea 48 89 de 4c 89 e7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 12 f6 ff ff <0f> 0b b8 ea ff ff ff eb d0 48 89 ef e8 c1 7f e6 d2 eb 8b e8 5a 7f
RSP: 0018:ffffc9001edffa40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88816c5430e8 RCX: ffffffffc0cbf5e6
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8881061b3050
RBP: ffff8881061b3050 R08: 0000000000000000 R09: 0000000000000003
R10: ffffed1020c3660a R11: 0000000000000003 R12: ffff88816c540000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88816c5430f0
FS:  0000000000000000(0000) GS:ffff888331e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feaac9fe306 CR3: 0000000233a70000 CR4: 0000000000350ee0
Call Trace:

Comment 4 Christopher Klooz 2023-06-02 23:10:47 UTC
With regards to the other report [1] that was already mentioned by Phil, you could test to disable PSR. Maybe this helps you. I currently test it. I started to test it only some hours ago, so it does not mean much that I have not yet had freezes/crashes so far. But in case you also still experience this issue, it could be worth a try.

If you want to test that, use the kernel parameter amdgpu.dcdebugmask=0x10 (see https://www.kernel.org/doc/html/latest/gpu/amdgpu/module-parameters.html and [1], and more on the Internet if you search it). My internal notebook screen also has PSR support. So even without external screen attached this could be related to the occurrences. 

If you test it, feel free to let us know how it develops for you.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2193110

Comment 5 Sean Estabrooks 2023-10-09 17:43:56 UTC
Just upgraded to kernel-6.5.5-200.fc38.x86_64 here and hit this problem.
Not sure how many previous kernels I skipped, but downgrading to 
kernel-6.4.15-200.fc38.x86_64, made the problem disappear.


Note You need to log in before you can comment on or make changes to this bug.