Created attachment 1936291 [details] boot log from failure. 1. Please describe the problem: Running the 6.1.3 test kernel on a Lenovo L14 (AMD Ryzen 7 PRO 5875U) the USB-C attached monitors fail to come online and the kernel throws the following splat: [ 15.221084] fbcon: Taking over console [ 16.046409] ------------[ cut here ]------------ [ 16.046411] WARNING: CPU: 1 PID: 116 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:3533 update_mst_stream_alloc_table+0x 129/0x130 [amdgpu] [ 16.046643] Modules linked in: nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_f ib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_se t nf_tables nfnetlink qrtr bnep sunrpc snd_sof_amd_rembrandt snd_ctl_led snd_sof_amd_renoir intel_rapl_msr intel_rapl_common snd_sof_amd _acp snd_hda_codec_realtek mt7921e snd_sof_pci vfat edac_mce_amd snd_hda_codec_generic mt7921_common fat snd_hda_codec_hdmi snd_sof kvm_ amd mt76_connac_lib snd_sof_utils btusb snd_hda_intel mt76 btrtl snd_intel_dspcfg kvm uvcvideo snd_intel_sdw_acpi btbcm btintel videobuf 2_vmalloc snd_hda_codec btmtk snd_soc_core mac80211 irqbypass videobuf2_memops rapl snd_hda_core videobuf2_v4l2 snd_compress pcspkr vide obuf2_common snd_hwdep ac97_bus think_lmi snd_pcm_dmaengine libarc4 bluetooth firmware_attributes_class wmi_bmof snd_seq videodev snd_pc i_ps snd_seq_device snd_rpl_pci_acp6x snd_pci_acp6x [ 16.046681] cfg80211 snd_pci_acp5x mc joydev k10temp i2c_piix4 thinkpad_acpi snd_pcm snd_rn_pci_acp3x snd_acp_config ledtrig_audio s nd_soc_acpi platform_profile snd_timer snd_pci_acp3x snd rfkill soundcore acpi_cpufreq hid_microsoft ff_memless cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii uas usb_storage amdgpu drm_ttm_helper ttm iommu_v2 nvme gpu_sched drm_buddy sdhci_pci nvme_core drm_display_helper cqhci sdhci video ucsi_acpi crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni hid_multitouch polyval_generic ghash_clmulni_int el sha512_ssse3 mmc_core typec_ucsi ccp serio_raw r8169 cec sp5100_tco typec nvme_common i2c_hid_acpi wmi i2c_hid ip6_tables ip_tables f use [ 16.046702] CPU: 1 PID: 116 Comm: kworker/1:1 Not tainted 6.1.3-200.fc37.x86_64 #1 [ 16.046704] Hardware name: LENOVO 21C5000VUS/21C5000VUS, BIOS R1YET38W (1.15 ) 10/27/2022 [ 16.046706] Workqueue: events fbcon_register_existing_fbs [ 16.046710] RIP: 0010:update_mst_stream_alloc_table+0x129/0x130 [amdgpu] [ 16.046833] Code: e8 03 89 c1 f3 48 a5 48 81 c4 90 00 00 00 5b 5d 41 5c c3 cc cc cc cc 41 0f b7 40 04 4d 89 19 49 89 59 08 66 41 89 4 1 10 eb 87 <0f> 0b e9 14 ff ff ff 0f 1f 44 00 00 55 48 89 fd 53 bb 0a 00 00 00 [ 16.046834] RSP: 0018:ffffb9cdc05d3598 EFLAGS: 00010202 [ 16.046835] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000000 [ 16.046836] RDX: 0000000000000000 RSI: ffffb9cdc05d3598 RDI: ffffb9cdc05d3628 [ 16.046837] RBP: ffff973dda280aa0 R08: ffffb9cdc05d3650 R09: ffffb9cdc05d33d0 [ 16.046837] R10: ffff973ddaa83c00 R11: ffff973dcc8a7ba0 R12: 0000000000000002 [ 16.046838] R13: ffff973dc94f2800 R14: ffffffffc0d5e4c0 R15: 0000000000000000 [ 16.046838] FS: 0000000000000000(0000) GS:ffff97448ee40000(0000) knlGS:0000000000000000 [ 16.046839] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.046840] CR2: 000055a5de8ab000 CR3: 00000003a0010000 CR4: 0000000000750ee0 [ 16.046840] PKRU: 55555554 [ 16.046841] Call Trace: [ 16.046843] <TASK> [ 16.046845] dc_link_allocate_mst_payload+0x85/0x280 [amdgpu] [ 16.046965] core_link_enable_stream+0x780/0x930 [amdgpu] [ 16.047082] dce110_apply_ctx_to_hw+0x649/0x6f0 [amdgpu] [ 16.047208] dc_commit_state_no_check+0x37e/0xc70 [amdgpu] [ 16.047327] ? dc_validate_global_state+0x240/0x3e0 [amdgpu] [ 16.047447] dc_commit_state+0x92/0x110 [amdgpu] [ 16.047575] amdgpu_dm_atomic_commit_tail+0x4a0/0x2a90 [amdgpu] [ 16.047704] ? dcn21_fast_validate_bw+0x39a/0x460 [amdgpu] [ 16.047841] ? dcn21_validate_bandwidth_fp+0xfe/0x6b0 [amdgpu] [ 16.047971] ? resource_build_scaling_params+0x93c/0xf30 [amdgpu] [ 16.048101] ? kernel_fpu_end+0x1e/0x40 [ 16.048104] ? ___slab_alloc+0x2f1/0x930 [ 16.048107] ? drm_atomic_helper_setup_commit+0x1bc/0x840 [ 16.048110] ? dma_resv_iter_first_unlocked+0x62/0x70 [ 16.048113] ? dma_resv_get_fences+0x4f/0x200 [ 16.048115] ? preempt_count_add+0x6a/0xa0 [ 16.048117] ? _raw_spin_lock_irq+0x19/0x40 [ 16.048119] ? _raw_spin_unlock_irq+0x1b/0x40 [ 16.048120] ? wait_for_completion_timeout+0x12a/0x140 [ 16.048122] ? wait_for_completion_interruptible+0x111/0x1b0 [ 16.048123] ? dm_plane_helper_prepare_fb+0x181/0x2a0 [amdgpu] [ 16.048249] commit_tail+0x94/0x130 [ 16.048252] drm_atomic_helper_commit+0x112/0x140 [ 16.048253] drm_atomic_commit+0x67/0xd0 [ 16.048255] ? drm_plane_get_damage_clips.cold+0x1c/0x1c [ 16.048258] drm_client_modeset_commit_atomic+0x1e8/0x220 [ 16.048261] drm_client_modeset_commit_locked+0x56/0x160 [ 16.048262] drm_client_modeset_commit+0x21/0x40 [ 16.048263] drm_fb_helper_set_par+0x9e/0xe0 [ 16.048265] drm_fb_helper_hotplug_event+0x9f/0xe0 [ 16.048266] drm_fb_helper_set_par+0xbe/0xe0 [ 16.048268] fbcon_init+0x248/0x540 [ 16.048270] visual_init+0xcc/0x120 [ 16.048272] do_bind_con_driver.isra.0+0x1da/0x2e0 [ 16.048274] do_take_over_console+0x153/0x180 [ 16.048275] do_fbcon_takeover+0x5a/0xc0 [ 16.048277] fbcon_register_existing_fbs+0x3b/0x70 [ 16.048278] process_one_work+0x1c7/0x380 [ 16.048280] worker_thread+0x4d/0x380 [ 16.048281] ? _raw_spin_lock_irqsave+0x23/0x50 [ 16.048282] ? rescuer_thread+0x380/0x380 [ 16.048283] kthread+0xe9/0x110 [ 16.048285] ? kthread_complete_and_exit+0x20/0x20 [ 16.048286] ret_from_fork+0x22/0x30 [ 16.048289] </TASK> [ 16.048290] ---[ end trace 0000000000000000 ]--- [ 16.397596] Console: switching to colour frame buffer device 240x67 2. What is the Version-Release number of the kernel: 6.1.3 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes its working with the 6.0.x kernels in F37. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Reboot the machine with 6.1 kernel. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: No idea, this is a "production" machine. 6. Are you running any modules that not shipped directly with Fedora's kernel?: No, secure boot is enabled. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. I have attached the 6.0 boot and 6.1 boot logs for comparison purposes.
Created attachment 1936292 [details] 6.0 functional monitors log Here is a good boot with 6.0
I guess it should be noted that this is a USB MST setup.
This splat goes away with the rawhide 6.2rc2, but that doesn't make the monitors work unless the resolution is decreased 2k. So that would imply it could be some of the DSC or HBR3/DP1.4 tweaks causing the machine not to enable the pair of 4k monitors at 60Hz?
So, I spent a bit of time testing a few things. And DSC+MST doesn't work with mainline v6.0 either until https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.0.19&id=e0a89bd789cb48a44722791920b4dc2b6d409912 lands in v6.0.3. From there, the 6.0 branch is quite stable. 6.1 has lots of issues, but 6.2, as I mentioned, does tend to work if the resolution of the MST monitors is decreased sufficiently that it works without DSC. And of course, the upstream bug tracking much of this is https://gitlab.freedesktop.org/drm/amd/-/issues/2171 which proposes a patch that fixes my blank monitors in 6.2 by revering the 6.1 4d07b0bc "drm/display/dp_mst: Move all payload info into the atomic state" patch.
JFYI upstream's well aware of this issue and has been trying to fix this for a while. I just got a series of patches from Harry that should hopefully fix this, so my hope is we should have a solution for this very soon.
This was fixed in the 6.3 (IIRC) revision, although there were suspend/resume issues associated with it requiring the iommu to be disabled, but the GPU portions appear to be working in recent kernels, so this should probably be closed.