Bug 2181426

Summary: [abrt] smum_send_msg_to_smc: WARNING: CPU: 0 PID: 244 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x130/0x150 [amdgpu] [amdgpu]
Product: [Fedora] Fedora Reporter: Matt Fagnani <matt.fagnani>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 38CC: acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, hpa, jarodwilson, jglisse, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/fd1576ee4e1224282d2b1ef4c7ed93ecbfcc396
Whiteboard: abrt_hash:466d89878a0c644eea955a43aa8deb33ab8f1aec;VARIANT_ID=kde;
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: dmesg none

Description Matt Fagnani 2023-03-24 05:11:33 UTC
Description of problem:
I booted 6.2.8 and earlier on an hp laptop with an AMD A10-9620P CPU and integrated Radeon R5 GPU in a Fedora 38 KDE Plasma installation. amdgpu warnings involving smu8_send_msg_to_smc_with_parameter timeouts happened occasionally when booting shortly after amdgpu started. During boots in which that warning happened, messages like amdgpu: smu8_send_msg_to_smc_with_parameter(0x0007) aborted; SMU still servicing msg (0x0009) were frequently shown in the journal. sddm took about 10 seconds to start when it normally started in 1-2 seconds. Plasma started in 1-2 minutes when the warning and errors happened when it normally took 2-3 seconds to start. Programs were sometimes much slower to respond on the desktop. Shutting down the system took 1-2 minutes while doing so usually took 2 seconds. Those amdgpu messages were shown on the console at the end of the shutdown process after the plymouth shutdown service failed.
This problem has happened on less than 10% of boots over recent months. Bisecting would be difficult. I'd be unsure which kernels aren't affected due to the low frequency of the problem. I reported this problem at https://gitlab.freedesktop.org/drm/amd/-/issues/2476

1. Boot a Fedora 38 KDE Plasma installation updated to 2023-3-24 with updates-testing enabled on a laptop with an AMD A10-9620P CPU and integrated Radeon R5 GPU
2. Log in to Plasma 5.27.3 on Wayland from sddm
3. Reboot the system
4. Repeat 2-3 until the problem happens

Additional info:
reporter:       libreport-2.17.8
WARNING: CPU: 0 PID: 244 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x130/0x150 [amdgpu]
Modules linked in: amdgpu drm_ttm_helper ttm crct10dif_pclmul iommu_v2 crc32_pclmul crc32c_intel hid_logitech_hidpp drm_buddy polyval_clmulni polyval_generic gpu_sched drm_display_helper ghash_clmulni_intel cec sha512_ssse3 wdat_wdt sp5100_tco r8169 video wmi hid_multitouch hid_logitech_dj serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse dm_multipath
CPU: 0 PID: 244 Comm: kworker/0:2 Not tainted 6.2.8-300.fc38.x86_64 #1
Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019
Workqueue: events amdgpu_vce_idle_work_handler [amdgpu]
RIP: 0010:smu8_send_msg_to_smc_with_parameter+0x130/0x150 [amdgpu]
Code: 20 48 c7 c7 60 7c b9 c0 48 89 c1 48 f7 ea 48 89 c8 44 89 e9 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 e2 e8 a0 18 a1 c1 <0f> 0b e9 37 ff ff ff bd ea ff ff ff e9 2d ff ff ff 66 66 2e 0f 1f
RSP: 0018:ffffa71c804efdb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8d8f090b6c00 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
RBP: 00000000ffffffc2 R08: 0000000000000000 R09: ffffa71c804efc48
R10: 0000000000000003 R11: ffffffff841447c8 R12: 0000000000000009
R13: 0000000000000000 R14: 00000003467891d5 R15: 0000000000000002
FS:  0000000000000000(0000) GS:ffff8d8ff7400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdc716bc10 CR3: 000000010006c000 CR4: 00000000001506f0
Call Trace:
 <TASK>
 smum_send_msg_to_smc+0xba/0xf0 [amdgpu]
 smu8_dpm_powergate_vce+0x15a/0x180 [amdgpu]
 pp_set_powergating_by_smu+0x76/0x280 [amdgpu]
 amdgpu_dpm_set_powergating_by_smu+0x84/0xf0 [amdgpu]
 amdgpu_dpm_enable_vce+0x29/0xa0 [amdgpu]
 process_one_work+0x1c7/0x3d0
 worker_thread+0x4d/0x380
 ? _raw_spin_lock_irqsave+0x23/0x50
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe9/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>

Potential duplicate: bug {crossver_id}

Comment 1 Matt Fagnani 2023-03-24 05:11:38 UTC
Created attachment 1953325 [details]
File: dmesg

Comment 2 sobekrobert1 2023-07-01 19:23:49 UTC
Description of problem:
Rebooted after updating, right after I logged in I've got 3 errors popped out, all of them named kernel-core.

Version-Release number of selected component:
kernel-core-6.3.8-200.fc38

Additional info:
reporter:       libreport-2.17.10
kernel:         6.3.8-200.fc38.x86_64
crash_function: smu8_send_msg_to_smc_with_parameter
reason:         WARNING: CPU: 3 PID: 53 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu] [amdgpu]
type:           Kerneloops
cmdline:        BOOT_IMAGE=(hd0,gpt5)/vmlinuz-6.3.8-200.fc38.x86_64 root=UUID=294846f6-2f8b-4bf1-9db9-a9d2c826ddc7 ro rootflags=subvol=root rhgb quiet
package:        kernel-core-6.3.8-200.fc38
runlevel:       unknown
comment:        Rebooted after updating, right after I logged in I've got 3 errors popped out, all of them named kernel-core.

Truncated backtrace:
WARNING: CPU: 3 PID: 53 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
Modules linked in: amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit polyval_clmulni drm_ttm_helper ttm sdhci_pci polyval_generic cqhci iommu_v2 ghash_clmulni_intel sha512_ssse3 drm_buddy wdat_wdt sdhci gpu_sched mmc_core drm_display_helper sp5100_tco r8169 cec video wmi serio_raw ip6_tables ip_tables fuse
CPU: 3 PID: 53 Comm: kworker/3:1 Not tainted 6.3.8-200.fc38.x86_64 #1
Hardware name: HP HP Notebook/81FA, BIOS F.37 06/22/2020
Workqueue: events amdgpu_vce_idle_work_handler [amdgpu]
RIP: 0010:smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
Code: 20 48 c7 c7 48 8e ea c0 48 89 c1 48 f7 ea 48 89 c8 44 89 e9 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 e2 e8 1c c6 70 f4 <0f> 0b e9 37 ff ff ff bd ea ff ff ff e9 2d ff ff ff 66 66 2e 0f 1f
RSP: 0018:ffffb87cc02bfdb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff9226cc3ebc00 RCX: 0000000000000027
RDX: ffff9227d75a1548 RSI: 0000000000000001 RDI: ffff9227d75a1540
RBP: 00000000ffffffc2 R08: 0000000000000000 R09: ffffb87cc02bfc48
R10: 0000000000000003 R11: ffffffffb7146108 R12: 0000000000000009
R13: 0000000000000000 R14: 000000027b844020 R15: 0000000000000002
FS:  0000000000000000(0000) GS:ffff9227d7580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f38114e2000 CR3: 0000000137022000 CR4: 00000000001506e0
Call Trace:
 <TASK>
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 ? __warn+0x81/0x130
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 ? report_bug+0x171/0x1a0
 ? prb_read_valid+0x1b/0x30
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 smum_send_msg_to_smc+0xbe/0x100 [amdgpu]
 smu8_dpm_powergate_vce+0x15e/0x180 [amdgpu]
 pp_set_powergating_by_smu+0x7a/0x280 [amdgpu]
 amdgpu_dpm_set_powergating_by_smu+0x88/0xf0 [amdgpu]
 amdgpu_dpm_enable_vce+0x2d/0xa0 [amdgpu]
 process_one_work+0x1c7/0x3d0
 worker_thread+0x51/0x390
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xde/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>