Bug 2116036

Summary: [abrt] dcn30_init_hw: WARNING: CPU: 0 PID: 4913 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hubbub.c:566 hubbub2_get_dchub_ref_freq+0x7e/0xa0 [amdgpu] [amdgpu]
Product: [Fedora] Fedora Reporter: Robin Tetour <Rob.Tetour>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 36CC: acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, hpa, jarodwilson, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/07c49765b36fe00ca0996168ec81f2e3c4b993c7
Whiteboard: abrt_hash:ee5e4932845803395b1b6b58293248b4ba841282;VARIANT_ID=workstation;
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-26 09:43:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: dmesg none

Description Robin Tetour 2022-08-06 11:41:36 UTC
Description of problem:
I don't exactly know what is causing this but it happened second time now when inside GNOME i opened the automatic sleep options and the system froze so I had to do hard shutdown, holding down the power button.

Maybe it is caused by my eGPU but hotplugging that without a display connected to it works fine.

The logs should be attached.

Additional info:
reporter:       libreport-2.17.1
WARNING: CPU: 0 PID: 4913 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_hubbub.c:566 hubbub2_get_dchub_ref_freq+0x7e/0xa0 [amdgpu]
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_soc_dmic iTCO_wdt intel_pmc_bxt iTCO_vendor_support mei_hdcp mei_pxp pmt_telemetry pmt_class intel_rapl_msr intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate intel_uncore pcspkr wmi_bmof snd_hda_codec_realtek iwlmvm snd_hda_codec_generic snd_sof_pci_intel_tgl snd_sof_intel_hda_common mac80211 soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils libarc4 snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi
 soundwire_bus ledtrig_audio vfat iwlwifi snd_soc_core fat snd_compress iwlmei i2c_i801 ac97_bus snd_hda_codec_hdmi i2c_smbus snd_pcm_dmaengine cfg80211 btusb mei_me btrtl snd_hda_intel btbcm idma64 snd_intel_dspcfg mei joydev btintel snd_intel_sdw_acpi snd_hda_codec btmtk bluetooth snd_hda_core snd_hwdep hid_sensor_als snd_seq hid_sensor_trigger hid_sensor_iio_common snd_seq_device industrialio_triggered_buffer kfifo_buf ecdh_generic industrialio snd_pcm thunderbolt snd_timer snd intel_vsec soundcore processor_thermal_device_pci processor_thermal_device processor_thermal_rfim processor_thermal_mbox processor_thermal_rapl intel_rapl_common igen6_edac ideapad_laptop platform_profile rfkill int3403_thermal int340x_thermal_zone intel_hid sparse_keymap int3400_thermal acpi_thermal_rel acpi_tad acpi_pad zram dm_crypt hid_sensor_hub intel_ishtp_hid hid_logitech_hidpp uas usb_storage hid_logitech_dj amdgpu i915 drm_buddy drm_ttm_helper r8152 ttm nvme intel_ish_ipc iommu_v2
 ucsi_acpi hid_multitouch crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched ghash_clmulni_intel nvme_core typec_ucsi serio_raw drm_dp_helper intel_ishtp typec wmi mii i2c_hid_acpi i2c_hid video pinctrl_tigerlake ip6_tables ip_tables fuse
CPU: 0 PID: 4913 Comm: kworker/0:4 Not tainted 5.18.16-200.fc36.x86_64 #1
Hardware name: LENOVO 82TK/LNVNB161216, BIOS HMCN33WW 07/22/2022
Workqueue: pm pm_runtime_work
RIP: 0010:hubbub2_get_dchub_ref_freq+0x7e/0xa0 [amdgpu]
Code: 4c 24 04 85 c9 74 23 83 3c 24 02 74 2d 8d 83 c0 63 ff ff 3d 20 4e 00 00 76 02 0f 0b 89 5d 00 48 83 c4 08 5b 5d c3 cc cc cc cc <0f> 0b 89 5d 00 48 83 c4 08 5b 5d c3 cc cc cc cc d1 eb 8d 83 c0 63
RSP: 0018:ffffbee5c4623c40 EFLAGS: 00010246
RAX: 0000000000001000 RBX: 00000000000186a0 RCX: 0000000000000000
RDX: ffffbee5c4623c44 RSI: 00000000000039df RDI: ffffa0b1d8ee0000
RBP: ffffa0b1d16683a0 R08: ffffbee5c4623c40 R09: 000000000000000c
R10: 0000000000000003 R11: ffffffffacf453e8 R12: ffffa0b1d1668000
R13: ffffa0b1cc6d5a00 R14: ffffa0b1d1668460 R15: ffffa0b2431a0000
FS:  0000000000000000(0000) GS:ffffa0b577600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000557ed6b53740 CR3: 00000002d4e10004 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 dcn30_init_hw+0x444/0x7c0 [amdgpu]
 ? dm_read_reg_func+0x34/0xb0 [amdgpu]
 dc_set_power_state+0x10a/0x160 [amdgpu]
 dm_resume+0xbd/0x5b0 [amdgpu]
 amdgpu_device_ip_resume_phase2+0x4f/0xc0 [amdgpu]
 ? amdgpu_device_fw_loading+0xb9/0x130 [amdgpu]
 amdgpu_device_resume+0x7b/0x210 [amdgpu]
 ? amdgpu_dpm_is_baco_supported+0x6f/0x90 [amdgpu]
 amdgpu_pmops_runtime_resume+0x7d/0xe0 [amdgpu]
 pci_pm_runtime_resume+0xa7/0xd0
 ? pci_pm_freeze_noirq+0xe0/0xe0
 __rpm_callback+0x41/0x170
 ? pci_pm_freeze_noirq+0xe0/0xe0
 rpm_callback+0x5d/0x70
 ? pci_pm_freeze_noirq+0xe0/0xe0
 rpm_resume+0x5c1/0x7f0
 ? _raw_spin_unlock_irqrestore+0x23/0x40
 ? try_to_wake_up+0x83/0x560
 pm_runtime_work+0x6c/0xa0
 process_one_work+0x1c4/0x380
 worker_thread+0x4d/0x380
 ? process_one_work+0x380/0x380
 kthread+0xe6/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>

Comment 1 Robin Tetour 2022-08-06 11:41:41 UTC
Created attachment 1904060 [details]
File: dmesg

Comment 2 Robin Tetour 2022-08-06 11:48:32 UTC
Ok it happened again now I think it has to do something with suspend options. Because the display dimmed as if the computer went to sleep and I moved my mouse and it froze.

Comment 3 Robin Tetour 2022-08-07 14:55:36 UTC
I tried to run my kernel with another parameter i915.psr_enable=0 which I naively thought it would magically fix it, but then I discovered that it does not freeze when I run the GNOME power options on the second display. Only on the internal display. This time it generated a new bug log which for some reason is not reportable so I pushed the files it generated this time on a temporary gitlab repo. https://gitlab.com/RobTheRealLifeAnimoo/log-dump I am going to open an issue on GNOME gitlab just to be sure since it happened when using GNOME but I don't recall this happening when using KDE.

Comment 4 Robin Tetour 2022-08-09 17:25:19 UTC
I added new logs to the gitlab repo. I think there is a serious problem with my system. This time it froze on splash screen when booting. Had to hard shutdown again. This time it generated whopping 5 crash logs in the kernel-core package.

Comment 5 Robin Tetour 2022-08-12 14:21:11 UTC
So it is definitely an issue with power-profiles-daemon. I cannot replicate any of these system crashes/freezes since I uninstalled that package. Closing this issue. I am going to open an issue on ppd github.

Comment 6 Robin Tetour 2022-08-12 22:19:42 UTC
So it looks like it is not a power profiles daemon fault.

I have filed a bug report but according to it's maintainer it just uses kernel APIs https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/issues/104

I seriously don't know what to do now. It is getting very frustrating. I do have somehow stable system but it now feels like I am just dodging the issue - which is not good in my book.

I will gladly help resolving the issue and helping with log files.

If anyone knows what to do please help!!

Comment 7 Robin Tetour 2022-08-16 21:44:25 UTC
(In reply to Robin Tetour from comment #6)
> So it looks like it is not a power profiles daemon fault.
> 
> I have filed a bug report but according to it's maintainer it just uses
> kernel APIs
> https://gitlab.freedesktop.org/hadess/power-profiles-daemon/-/issues/104
> 
> I seriously don't know what to do now. It is getting very frustrating. I do
> have somehow stable system but it now feels like I am just dodging the issue
> - which is not good in my book.
> 
> I will gladly help resolving the issue and helping with log files.
> 
> If anyone knows what to do please help!!

So the system crashed just by opening gnome power settings. Even though the Power-Profiles-Daemon is uninstalled. I have no clue what is happening. Today I tried doing single (e)gpu passthrough and at it's core it works but it is very unstable and generated freezes mainly for the guest os but causes abrt to get some reports https://retrace.fedoraproject.org/faf/reports/496174/ although this is probably not connected.

Comment 8 Robin Tetour 2022-08-22 19:17:36 UTC
Looks like the problem goes away when setting kernel parameter intel_idle.max_cstate=1 but it is not a permanent solution since the laptop runs very hot after that.