Bug 2181426 - [abrt] smum_send_msg_to_smc: WARNING: CPU: 0 PID: 244 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x130/0x150 [amdgpu] [amdgpu]
Summary: [abrt] smum_send_msg_to_smc: WARNING: CPU: 0 PID: 244 at drivers/gpu/drm/amd/...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 38
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:466d89878a0c644eea955a43aa8...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-24 05:11 UTC by Matt Fagnani
Modified: 2023-07-01 19:23 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
File: dmesg (90.77 KB, text/plain)
2023-03-24 05:11 UTC, Matt Fagnani
no flags Details

Description Matt Fagnani 2023-03-24 05:11:33 UTC
Description of problem:
I booted 6.2.8 and earlier on an hp laptop with an AMD A10-9620P CPU and integrated Radeon R5 GPU in a Fedora 38 KDE Plasma installation. amdgpu warnings involving smu8_send_msg_to_smc_with_parameter timeouts happened occasionally when booting shortly after amdgpu started. During boots in which that warning happened, messages like amdgpu: smu8_send_msg_to_smc_with_parameter(0x0007) aborted; SMU still servicing msg (0x0009) were frequently shown in the journal. sddm took about 10 seconds to start when it normally started in 1-2 seconds. Plasma started in 1-2 minutes when the warning and errors happened when it normally took 2-3 seconds to start. Programs were sometimes much slower to respond on the desktop. Shutting down the system took 1-2 minutes while doing so usually took 2 seconds. Those amdgpu messages were shown on the console at the end of the shutdown process after the plymouth shutdown service failed.
This problem has happened on less than 10% of boots over recent months. Bisecting would be difficult. I'd be unsure which kernels aren't affected due to the low frequency of the problem. I reported this problem at https://gitlab.freedesktop.org/drm/amd/-/issues/2476

1. Boot a Fedora 38 KDE Plasma installation updated to 2023-3-24 with updates-testing enabled on a laptop with an AMD A10-9620P CPU and integrated Radeon R5 GPU
2. Log in to Plasma 5.27.3 on Wayland from sddm
3. Reboot the system
4. Repeat 2-3 until the problem happens

Additional info:
reporter:       libreport-2.17.8
WARNING: CPU: 0 PID: 244 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x130/0x150 [amdgpu]
Modules linked in: amdgpu drm_ttm_helper ttm crct10dif_pclmul iommu_v2 crc32_pclmul crc32c_intel hid_logitech_hidpp drm_buddy polyval_clmulni polyval_generic gpu_sched drm_display_helper ghash_clmulni_intel cec sha512_ssse3 wdat_wdt sp5100_tco r8169 video wmi hid_multitouch hid_logitech_dj serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse dm_multipath
CPU: 0 PID: 244 Comm: kworker/0:2 Not tainted 6.2.8-300.fc38.x86_64 #1
Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019
Workqueue: events amdgpu_vce_idle_work_handler [amdgpu]
RIP: 0010:smu8_send_msg_to_smc_with_parameter+0x130/0x150 [amdgpu]
Code: 20 48 c7 c7 60 7c b9 c0 48 89 c1 48 f7 ea 48 89 c8 44 89 e9 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 e2 e8 a0 18 a1 c1 <0f> 0b e9 37 ff ff ff bd ea ff ff ff e9 2d ff ff ff 66 66 2e 0f 1f
RSP: 0018:ffffa71c804efdb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8d8f090b6c00 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
RBP: 00000000ffffffc2 R08: 0000000000000000 R09: ffffa71c804efc48
R10: 0000000000000003 R11: ffffffff841447c8 R12: 0000000000000009
R13: 0000000000000000 R14: 00000003467891d5 R15: 0000000000000002
FS:  0000000000000000(0000) GS:ffff8d8ff7400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdc716bc10 CR3: 000000010006c000 CR4: 00000000001506f0
Call Trace:
 <TASK>
 smum_send_msg_to_smc+0xba/0xf0 [amdgpu]
 smu8_dpm_powergate_vce+0x15a/0x180 [amdgpu]
 pp_set_powergating_by_smu+0x76/0x280 [amdgpu]
 amdgpu_dpm_set_powergating_by_smu+0x84/0xf0 [amdgpu]
 amdgpu_dpm_enable_vce+0x29/0xa0 [amdgpu]
 process_one_work+0x1c7/0x3d0
 worker_thread+0x4d/0x380
 ? _raw_spin_lock_irqsave+0x23/0x50
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe9/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>

Potential duplicate: bug {crossver_id}

Comment 1 Matt Fagnani 2023-03-24 05:11:38 UTC
Created attachment 1953325 [details]
File: dmesg

Comment 2 sobekrobert1 2023-07-01 19:23:49 UTC
Description of problem:
Rebooted after updating, right after I logged in I've got 3 errors popped out, all of them named kernel-core.

Version-Release number of selected component:
kernel-core-6.3.8-200.fc38

Additional info:
reporter:       libreport-2.17.10
kernel:         6.3.8-200.fc38.x86_64
crash_function: smu8_send_msg_to_smc_with_parameter
reason:         WARNING: CPU: 3 PID: 53 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu] [amdgpu]
type:           Kerneloops
cmdline:        BOOT_IMAGE=(hd0,gpt5)/vmlinuz-6.3.8-200.fc38.x86_64 root=UUID=294846f6-2f8b-4bf1-9db9-a9d2c826ddc7 ro rootflags=subvol=root rhgb quiet
package:        kernel-core-6.3.8-200.fc38
runlevel:       unknown
comment:        Rebooted after updating, right after I logged in I've got 3 errors popped out, all of them named kernel-core.

Truncated backtrace:
WARNING: CPU: 3 PID: 53 at drivers/gpu/drm/amd/amdgpu/../pm/powerplay/smumgr/smu8_smumgr.c:98 smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
Modules linked in: amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit polyval_clmulni drm_ttm_helper ttm sdhci_pci polyval_generic cqhci iommu_v2 ghash_clmulni_intel sha512_ssse3 drm_buddy wdat_wdt sdhci gpu_sched mmc_core drm_display_helper sp5100_tco r8169 cec video wmi serio_raw ip6_tables ip_tables fuse
CPU: 3 PID: 53 Comm: kworker/3:1 Not tainted 6.3.8-200.fc38.x86_64 #1
Hardware name: HP HP Notebook/81FA, BIOS F.37 06/22/2020
Workqueue: events amdgpu_vce_idle_work_handler [amdgpu]
RIP: 0010:smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
Code: 20 48 c7 c7 48 8e ea c0 48 89 c1 48 f7 ea 48 89 c8 44 89 e9 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 e2 e8 1c c6 70 f4 <0f> 0b e9 37 ff ff ff bd ea ff ff ff e9 2d ff ff ff 66 66 2e 0f 1f
RSP: 0018:ffffb87cc02bfdb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff9226cc3ebc00 RCX: 0000000000000027
RDX: ffff9227d75a1548 RSI: 0000000000000001 RDI: ffff9227d75a1540
RBP: 00000000ffffffc2 R08: 0000000000000000 R09: ffffb87cc02bfc48
R10: 0000000000000003 R11: ffffffffb7146108 R12: 0000000000000009
R13: 0000000000000000 R14: 000000027b844020 R15: 0000000000000002
FS:  0000000000000000(0000) GS:ffff9227d7580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f38114e2000 CR3: 0000000137022000 CR4: 00000000001506e0
Call Trace:
 <TASK>
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 ? __warn+0x81/0x130
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 ? report_bug+0x171/0x1a0
 ? prb_read_valid+0x1b/0x30
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 ? smu8_send_msg_to_smc_with_parameter+0x134/0x150 [amdgpu]
 smum_send_msg_to_smc+0xbe/0x100 [amdgpu]
 smu8_dpm_powergate_vce+0x15e/0x180 [amdgpu]
 pp_set_powergating_by_smu+0x7a/0x280 [amdgpu]
 amdgpu_dpm_set_powergating_by_smu+0x88/0xf0 [amdgpu]
 amdgpu_dpm_enable_vce+0x2d/0xa0 [amdgpu]
 process_one_work+0x1c7/0x3d0
 worker_thread+0x51/0x390
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xde/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>


Note You need to log in before you can comment on or make changes to this bug.