Bug 2158902 - 6.1+ kernels fail to bring MST attached graphics heads online with amdgpu
Summary: 6.1+ kernels fail to bring MST attached graphics heads online with amdgpu
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 37
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-06 20:50 UTC by Jeremy Linton
Modified: 2023-09-14 01:01 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-14 01:01:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
boot log from failure. (642.73 KB, text/plain)
2023-01-06 20:50 UTC, Jeremy Linton
no flags Details
6.0 functional monitors log (521.97 KB, text/plain)
2023-01-06 20:51 UTC, Jeremy Linton
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab drm amd issues 2171 0 None opened Displays behind MST hubs non-functional (regression in kernel 6.1) 2023-01-18 20:50:36 UTC

Description Jeremy Linton 2023-01-06 20:50:14 UTC
Created attachment 1936291 [details]
boot log from failure.

1. Please describe the problem: 

Running the 6.1.3 test kernel on a Lenovo L14 (AMD Ryzen 7 PRO 5875U) the USB-C attached monitors fail to come online and the kernel throws the following splat:

[   15.221084] fbcon: Taking over console
[   16.046409] ------------[ cut here ]------------
[   16.046411] WARNING: CPU: 1 PID: 116 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:3533 update_mst_stream_alloc_table+0x
129/0x130 [amdgpu]
[   16.046643] Modules linked in: nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_f
ib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_se
t nf_tables nfnetlink qrtr bnep sunrpc snd_sof_amd_rembrandt snd_ctl_led snd_sof_amd_renoir intel_rapl_msr intel_rapl_common snd_sof_amd
_acp snd_hda_codec_realtek mt7921e snd_sof_pci vfat edac_mce_amd snd_hda_codec_generic mt7921_common fat snd_hda_codec_hdmi snd_sof kvm_
amd mt76_connac_lib snd_sof_utils btusb snd_hda_intel mt76 btrtl snd_intel_dspcfg kvm uvcvideo snd_intel_sdw_acpi btbcm btintel videobuf
2_vmalloc snd_hda_codec btmtk snd_soc_core mac80211 irqbypass videobuf2_memops rapl snd_hda_core videobuf2_v4l2 snd_compress pcspkr vide
obuf2_common snd_hwdep ac97_bus think_lmi snd_pcm_dmaengine libarc4 bluetooth firmware_attributes_class wmi_bmof snd_seq videodev snd_pc
i_ps snd_seq_device snd_rpl_pci_acp6x snd_pci_acp6x
[   16.046681]  cfg80211 snd_pci_acp5x mc joydev k10temp i2c_piix4 thinkpad_acpi snd_pcm snd_rn_pci_acp3x snd_acp_config ledtrig_audio s
nd_soc_acpi platform_profile snd_timer snd_pci_acp3x snd rfkill soundcore acpi_cpufreq hid_microsoft ff_memless cdc_mbim cdc_wdm cdc_ncm
 cdc_ether usbnet mii uas usb_storage amdgpu drm_ttm_helper ttm iommu_v2 nvme gpu_sched drm_buddy sdhci_pci nvme_core drm_display_helper
 cqhci sdhci video ucsi_acpi crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni hid_multitouch polyval_generic ghash_clmulni_int
el sha512_ssse3 mmc_core typec_ucsi ccp serio_raw r8169 cec sp5100_tco typec nvme_common i2c_hid_acpi wmi i2c_hid ip6_tables ip_tables f
use
[   16.046702] CPU: 1 PID: 116 Comm: kworker/1:1 Not tainted 6.1.3-200.fc37.x86_64 #1
[   16.046704] Hardware name: LENOVO 21C5000VUS/21C5000VUS, BIOS R1YET38W (1.15 ) 10/27/2022
[   16.046706] Workqueue: events fbcon_register_existing_fbs
[   16.046710] RIP: 0010:update_mst_stream_alloc_table+0x129/0x130 [amdgpu]
[   16.046833] Code: e8 03 89 c1 f3 48 a5 48 81 c4 90 00 00 00 5b 5d 41 5c c3 cc cc cc cc 41 0f b7 40 04 4d 89 19 49 89 59 08 66 41 89 4
1 10 eb 87 <0f> 0b e9 14 ff ff ff 0f 1f 44 00 00 55 48 89 fd 53 bb 0a 00 00 00
[   16.046834] RSP: 0018:ffffb9cdc05d3598 EFLAGS: 00010202
[   16.046835] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000000
[   16.046836] RDX: 0000000000000000 RSI: ffffb9cdc05d3598 RDI: ffffb9cdc05d3628
[   16.046837] RBP: ffff973dda280aa0 R08: ffffb9cdc05d3650 R09: ffffb9cdc05d33d0
[   16.046837] R10: ffff973ddaa83c00 R11: ffff973dcc8a7ba0 R12: 0000000000000002
[   16.046838] R13: ffff973dc94f2800 R14: ffffffffc0d5e4c0 R15: 0000000000000000
[   16.046838] FS:  0000000000000000(0000) GS:ffff97448ee40000(0000) knlGS:0000000000000000
[   16.046839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.046840] CR2: 000055a5de8ab000 CR3: 00000003a0010000 CR4: 0000000000750ee0
[   16.046840] PKRU: 55555554
[   16.046841] Call Trace:
[   16.046843]  <TASK>
[   16.046845]  dc_link_allocate_mst_payload+0x85/0x280 [amdgpu]
[   16.046965]  core_link_enable_stream+0x780/0x930 [amdgpu]
[   16.047082]  dce110_apply_ctx_to_hw+0x649/0x6f0 [amdgpu]
[   16.047208]  dc_commit_state_no_check+0x37e/0xc70 [amdgpu]
[   16.047327]  ? dc_validate_global_state+0x240/0x3e0 [amdgpu]
[   16.047447]  dc_commit_state+0x92/0x110 [amdgpu]
[   16.047575]  amdgpu_dm_atomic_commit_tail+0x4a0/0x2a90 [amdgpu]
[   16.047704]  ? dcn21_fast_validate_bw+0x39a/0x460 [amdgpu]
[   16.047841]  ? dcn21_validate_bandwidth_fp+0xfe/0x6b0 [amdgpu]
[   16.047971]  ? resource_build_scaling_params+0x93c/0xf30 [amdgpu]
[   16.048101]  ? kernel_fpu_end+0x1e/0x40
[   16.048104]  ? ___slab_alloc+0x2f1/0x930
[   16.048107]  ? drm_atomic_helper_setup_commit+0x1bc/0x840
[   16.048110]  ? dma_resv_iter_first_unlocked+0x62/0x70
[   16.048113]  ? dma_resv_get_fences+0x4f/0x200
[   16.048115]  ? preempt_count_add+0x6a/0xa0
[   16.048117]  ? _raw_spin_lock_irq+0x19/0x40
[   16.048119]  ? _raw_spin_unlock_irq+0x1b/0x40
[   16.048120]  ? wait_for_completion_timeout+0x12a/0x140
[   16.048122]  ? wait_for_completion_interruptible+0x111/0x1b0
[   16.048123]  ? dm_plane_helper_prepare_fb+0x181/0x2a0 [amdgpu]
[   16.048249]  commit_tail+0x94/0x130
[   16.048252]  drm_atomic_helper_commit+0x112/0x140
[   16.048253]  drm_atomic_commit+0x67/0xd0
[   16.048255]  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
[   16.048258]  drm_client_modeset_commit_atomic+0x1e8/0x220
[   16.048261]  drm_client_modeset_commit_locked+0x56/0x160
[   16.048262]  drm_client_modeset_commit+0x21/0x40
[   16.048263]  drm_fb_helper_set_par+0x9e/0xe0
[   16.048265]  drm_fb_helper_hotplug_event+0x9f/0xe0
[   16.048266]  drm_fb_helper_set_par+0xbe/0xe0
[   16.048268]  fbcon_init+0x248/0x540
[   16.048270]  visual_init+0xcc/0x120
[   16.048272]  do_bind_con_driver.isra.0+0x1da/0x2e0
[   16.048274]  do_take_over_console+0x153/0x180
[   16.048275]  do_fbcon_takeover+0x5a/0xc0
[   16.048277]  fbcon_register_existing_fbs+0x3b/0x70
[   16.048278]  process_one_work+0x1c7/0x380
[   16.048280]  worker_thread+0x4d/0x380
[   16.048281]  ? _raw_spin_lock_irqsave+0x23/0x50
[   16.048282]  ? rescuer_thread+0x380/0x380
[   16.048283]  kthread+0xe9/0x110
[   16.048285]  ? kthread_complete_and_exit+0x20/0x20
[   16.048286]  ret_from_fork+0x22/0x30
[   16.048289]  </TASK>
[   16.048290] ---[ end trace 0000000000000000 ]---
[   16.397596] Console: switching to colour frame buffer device 240x67



2. What is the Version-Release number of the kernel: 6.1.3


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes its working with the 6.0.x kernels in F37.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below: 

Reboot the machine with 6.1 kernel.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

No idea, this is a "production" machine.


6. Are you running any modules that not shipped directly with Fedora's kernel?:

No, secure boot is enabled.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I have attached the 6.0 boot and 6.1 boot logs for comparison purposes.

Comment 1 Jeremy Linton 2023-01-06 20:51:00 UTC
Created attachment 1936292 [details]
6.0 functional monitors log

Here is a good boot with 6.0

Comment 2 Jeremy Linton 2023-01-06 20:58:38 UTC
I guess it should be noted that this is a USB MST setup.

Comment 3 Jeremy Linton 2023-01-09 17:59:24 UTC
This splat goes away with the rawhide 6.2rc2, but that doesn't make the monitors work unless the resolution is decreased 2k. So that would imply it could be some of the DSC or HBR3/DP1.4 tweaks causing the machine not to enable the pair of 4k monitors at 60Hz?

Comment 4 Jeremy Linton 2023-01-17 23:47:56 UTC
So, I spent a bit of time testing a few things.

And DSC+MST doesn't work with mainline v6.0 either until https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.0.19&id=e0a89bd789cb48a44722791920b4dc2b6d409912 lands in v6.0.3. From there, the 6.0 branch is quite stable.

6.1 has lots of issues, but 6.2, as I mentioned, does tend to work if the resolution of the MST monitors is decreased sufficiently that it works without DSC. And of course, the upstream bug tracking much of this is https://gitlab.freedesktop.org/drm/amd/-/issues/2171 which proposes a patch that fixes my blank monitors in 6.2 by revering the 6.1 4d07b0bc "drm/display/dp_mst: Move all payload info into the atomic state" patch.

Comment 5 Lyude 2023-01-18 20:50:37 UTC
JFYI upstream's well aware of this issue and has been trying to fix this for a while. I just got a series of patches from Harry that should hopefully fix this, so my hope is we should have a solution for this very soon.

Comment 6 Jeremy Linton 2023-09-14 01:01:32 UTC
This was fixed in the 6.3 (IIRC) revision, although there were suspend/resume issues associated with it requiring the iommu to be disabled, but the GPU portions appear to be working in recent kernels, so this should probably be closed.


Note You need to log in before you can comment on or make changes to this bug.