Thinkpad T14s Gen 3 AMD AMD Ryzen 7 PRO 6850U with Radeon Graphics kernel-6.7.3-200.fc39.x86_64 Suspend causes deadlock. Screen goes black but does not turn on. Keyboard lights are on. Caps Lock does not respond suggesting deadlock. Nothing is logged to journal. Working Versions kernel-6.5.* kernel-6.6.* Reproducible: Always
I have the same problem on my HP 845 G9 (Same CPU)
What WiFi modules do these have out of interest?
01:00.0 Network controller: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter (rev 01) Subsystem: Foxconn International, Inc. Device e0c4 Flags: bus master, fast devsel, latency 0, IRQ 120, IOMMU group 11 Memory at b4000000 (64-bit, non-prefetchable) [size=2M] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=32/32 Maskable+ 64bit- Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [148] Secondary PCI Express Capabilities: [158] Transaction Processing Hints Capabilities: [1e4] Latency Tolerance Reporting Capabilities: [1ec] L1 PM Substates Kernel driver in use: ath11k_pci Kernel modules: ath11k_pci I plan to do a bisect, might take a while.
01:00.0 Network controller: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter (rev 01) Subsystem: Lenovo Device 9309 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin ? routed to IRQ 92 IOMMU group: 11 Region 0: Memory at 98000000 (64-bit, non-prefetchable) [size=2M] Capabilities: <access denied> Kernel driver in use: ath11k_pci Kernel modules: ath11k_pci 1. Disable Wifi 2. modprobe -r ath11k_pci ath11k 3. suspend and resume works 4. modprobe ath11k_pci causes deadlock
This is helpful for the bisect.
So I suspect it might be this bug: https://bugzilla.kernel.org/show_bug.cgi?id=218364 Fixed upstream with: 556857aa1d0855aba02b1c63bc52b91ec63fc2cc A fix should be heading to a 6.7 soon
556857aa1d0855aba02b1c63bc52b91ec63fc2cc was already included in kernel-6.7.3 yet we experience this suspend crash. The crash seems to be gone from kernel-6.7.4 though. It seems they fixed something else?
hello everone. i just upgraded my Fedora 39 kernel to 6.7.3-200.fc39.x86_64 and I am using that Qualcomm Technologies, Inc QCNFA765 (Lenovo P16s AMD Gen2) Tested the modern-standby after kernel upgrade and reboot, and I am not seeing the issue currently on 6.7.3 I just received. Available for any tests, any kernel you want me to try if you need help.
My hardware wifi card : gf@aesir:~$ lspci -vv -s 01:00.0 01:00.0 Network controller: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter (rev 01) Subsystem: Lenovo Device 9309 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin ? routed to IRQ 91 IOMMU group: 12 Region 0: Memory at 78600000 (64-bit, non-prefetchable) [size=2M] Capabilities: <access denied> Kernel driver in use: ath11k_pci Kernel modules: ath11k_pci
Same issue here for me in Fedora 39 since installing kernel 6.7.3. Till kernel 6.6.13 installed, everything worked fine. There seems to be an issue in connection to kernel 6.7.x (and above) and an AMD RX 7800 XT graphics card. And it isn’t a specific Fedora issue. When rebooting, the monitor goes into sleep mode. There is no way to wake it up again by pressing keys. In the background, it seems that the OS is booting up. So it would be possible to log into the system without seeing anything. Pressing the power button shuts down the computer. Turning it on again and the boot screen is visible and also the login screen appears. Everything is working until the next reboot. So, the only solution to prevent the monitor going to sleep is shutting down the computer when a reboot is needed. The same issue appeared in Nobara 39 when kernel 6.7.0-200 was available and installed. The developer was able to fix this issue by himself. With kernel 6.7.0-204 installed, the issue is fixed in Nobara 39. Looks like this commit was to blame here and was reverted by the developer of Nobara. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f38ac54e60562323ea4abb1bfb37d043ee23357 Now I switched back to kernel 6.6.13 in Grub menu to be able to reboot my system if needed. I'm not using a Wi-Fi card! Here is the output of inxi -Fz: System: Kernel: 6.6.13-200.fc39.x86_64 arch: x86_64 bits: 64 Desktop: KDE Plasma v: 5.27.10 Distro: Fedora release 39 (Thirty Nine) Machine: Type: Desktop Mobo: Micro-Star model: MAG X570 TOMAHAWK WIFI (MS-7C84) v: 1.0 serial: <superuser required> UEFI: American Megatrends LLC. v: 1.F0 date: 10/12/2023 CPU: Info: 12-core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP cache: L2: 6 MiB Speed (MHz): avg: 2336 min/max: 2200/4672 cores: 1: 2054 2: 2049 3: 2200 4: 2200 5: 2200 6: 3800 7: 2200 8: 2200 9: 2200 10: 2200 11: 2200 12: 2200 13: 2038 14: 2031 15: 4515 16: 2200 17: 2200 18: 2200 19: 2200 20: 2200 21: 2200 22: 2199 23: 2200 24: 2200 Graphics: Device-1: AMD Navi 32 [Radeon RX 7700 XT / 7800 XT] driver: amdgpu v: kernel Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 23.2.4 compositor: kwin_wayland driver: X: loaded: amdgpu unloaded: fbdev,modesetting,radeon,vesa dri: radeonsi gpu: amdgpu resolution: 3440x1440 API: EGL v: 1.5 drivers: radeonsi,swrast platforms: wayland,x11,surfaceless,device API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 23.3.3 renderer: AMD Radeon RX 7800 XT (radeonsi navi32 LLVM 17.0.6 DRM 3.54 6.6.13-200.fc39.x86_64) API: Vulkan v: 1.3.268 drivers: radv,llvmpipe surfaces: xcb,xlib,wayland Audio: Device-1: AMD Navi 31 HDMI/DP Audio driver: snd_hda_intel Device-2: AMD Starship/Matisse HD Audio driver: snd_hda_intel API: ALSA v: k6.6.13-200.fc39.x86_64 status: kernel-api Server-1: PipeWire v: 1.0.3 status: active Network: Device-1: Realtek RTL8125 2.5GbE driver: r8169 IF: enp38s0 state: up speed: 1000 Mbps duplex: full mac: <filter> Bluetooth: Device-1: Intel AX200 Bluetooth driver: btusb type: USB Report: btmgmt ID: hci0 state: up address: <filter> bt-v: 5.2 Drives: Local Storage: total: 7.51 TiB used: 2.65 TiB (35.3%) ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO 1TB size: 931.51 GiB ID-2: /dev/nvme1n1 vendor: Samsung model: SSD 970 EVO Plus 2TB size: 1.82 TiB ID-3: /dev/sda vendor: Samsung model: SSD 850 EVO 1TB size: 931.51 GiB ID-4: /dev/sdb vendor: Seagate model: ST2000DM001-1ER164 size: 1.82 TiB ID-5: /dev/sdc vendor: Seagate model: ST2000DM001-1CH164 size: 1.82 TiB ID-6: /dev/sdd vendor: Samsung model: SSD 850 PRO 256GB size: 238.47 GiB Partition: ID-1: / size: 929.93 GiB used: 30.85 GiB (3.3%) fs: btrfs dev: /dev/nvme0n1p3 ID-2: /boot size: 973.4 MiB used: 352 MiB (36.2%) fs: ext4 dev: /dev/nvme0n1p2 ID-3: /boot/efi size: 598.8 MiB used: 19 MiB (3.2%) fs: vfat dev: /dev/nvme0n1p1 ID-4: /home size: 929.93 GiB used: 30.85 GiB (3.3%) fs: btrfs dev: /dev/nvme0n1p3 Swap: ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) dev: /dev/zram0 Sensors: System Temperatures: cpu: 41.0 C mobo: N/A gpu: amdgpu temp: 37.0 C Fan Speeds (rpm): N/A gpu: amdgpu fan: 1 Info: Processes: 715 Uptime: 18m Memory: total: 32 GiB available: 31.26 GiB used: 4.39 GiB (14.1%) Shell: Bash inxi: 3.3.31
Just forget to mention, that the same issue exists in Fedora 40 Rawhide kernel 6.8.0 RC also!
Yes. This seems very plausible to me. Since I have my own laptop, from time to time the screen went black. I found out that the laptop was still working fine, it's just the display that goes black. Putting the machine to sleep and waking it turns the display ON again. In the system log I usually see this when it happens : [ 5260.723233] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=14 [ 5260.723557] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait I'm using the integrated GPU that comes with the 7840U : a Radeon 780M And from the Lenovo forums it seems we're quite a few to have issues with the recent Radeon GPUs :( Mark Pearson from the Lenovo's team told me to add this to my kernel options : amdgpu.dcdebugmask=0x10 Since two monthes I have been running all my Fc39 kernels with that option, and that issue very rarely happens (maybe once when watching Youtube or doing anything graphically intensive, every 2 weeks or so). My laptop came with a low-power version of the LCD so I wonder if that's the issue : the kernel tries to tell the graphic card to do something related to power-use, and since i'm using already a low-power display, it fails to do something you can you in standard displays with high power/low power modes (mine is in permanent low-power to reduce power use, it's an option when you order the display). Sadly, I don't have a 7800 XT so I cannot give useful information if the issue is tied to that hardware part.
I was mistaken. The ath11k suspend crash is not fixed in 6.7.4. Investigating...
6.7.4 ath11 crashes on suspend if bluetooth is enabled. Disable bluetooth and it doesn't crash. It does have a separate problem where data transfer becomes very slow after resume. Removing and loading the ath11k_pci kernel module again seems to be the only fix without a reboot.
Also there looks to be a GPU suspend regression reported here: https://gitlab.freedesktop.org/drm/amd/-/issues/3132
Blank screen when resuming from suspend. I guess the keyboard becomes non-responsive since the CapsLK is non-responsive. I have to do hard reboot. My system information is below. Laptop model: Lenovo Slim 7 ProX 14ARH CPU: AMD Ryzen™ 9 6900HS Creator Edition × 16 Graphics: AMD Radeon™ Graphics / NVIDIA GeForce RTX™ 3050 Laptop GPU Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a) OS: Fedora 39 Kernel: Linux 6.7.4-200.fc39.x86_64
The regression for ath11k (WCN6855) is actually in the linux-firmware. Here is the fixed binary: https://gitlab.com/kernel-firmware/linux-firmware/-/commit/5217b76bed90ae86d5f3fe9a5f4e2301868cdd02 Here is the broken version string: fw_version 0x1109996e fw_build_timestamp 2023-12-19 11:11 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.36 Here is the fixed version string: fw_version 0x1106196e fw_build_timestamp 2024-01-12 11:30 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37
I opened this thread: https://discussion.fedoraproject.org/t/random-resume-after-suspend-issue-on-thinkpad-t14s-amd-gen3-radeon-680m-ryzen-7/103452/7 I think I have the bug described in the current issue. > Here is the fixed version string: fw_version 0x1106196e fw_build_timestamp 2024-01-12 11:30 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37 @mario.limonciello How can I apply this fix?
From that link I posted above https://gitlab.com/kernel-firmware/linux-firmware/-/commit/5217b76bed90ae86d5f3fe9a5f4e2301868cdd02 there is a view file button. You'd need to download that firmware binary, compress it with the same compression matched in Fedora (xz w/ crc32 IIRC) and then replace the file in the right structure in /lib/firmware. I hope that the Fedora can get an updated firmware binary into updates ASAP though.
I've extensively tested linux-firmware/ath11k/WCN6855/hw2.0/amss.bin versions 23, 36 and 37 with kernels 6.6.14, 6.7.4 and 6.8.0-rc3. * It was claimed that 37 fixes suspend broken by 36 but it does not. It seems a little better than firmware 36 but it still frequently deadlocks in 6.7.4 and 6.8.0-rc3 during suspend. Sometimes it fails the first time. Sometimes it works the first time then fails the second time. Sometimes it succeeds at suspending 10 times in a row. * 6.6.14 doesn't fail to suspend but firmware 37 makes data throughput very slow after suspend. Firmware 37 is also broken in that regard with 6.7.4 and 6.8.0-rc3. See notes below. * 6.8.0-rc3 fails more than 6.7.4 but is otherwise similar in behavior. * Firmware version 23 from 2023-02-15 does not suffer from slow data throughput after suspend like firmware 38. This behavior happens with 6.6.14, 6.7.4, and 6.8.0-rc3. * Suspend deadlock behavior in 6.7.4 seems to behave the same with firmware 23 and 37. Bluetooth and power_save toggle =============================== Bluetooth disable or power_save mode off seems to have some effect on the likelihood of suspend deadlock. Unclear. Regarding power_save quirky behavior ==================================== https://forums.lenovo.com/t5/Other-Linux-Discussions/QCNFA765-Linux-ath11k-wifi-crippled-high-latency-packet-loss-frequent-disassociations-T14s-AMD/m-p/5252399 Since kernel 6.4.* and earlier many of us have struggling with this flaky driver while power_save=on. With many but not all access points it would exhibit extreme packet loss. Many of us have been turning off power_save which used to workaround the problem. You can see the current status with: iw dev wlp1s0 get power_save With fresh boot it starts with power_save=on. Recent more kernels switched to power_save=off automatically on suspend. I'm guessing it's because somebody realized power_save was problematic. Behavior is different pre and post-suspend ========================================== kernel-6.6.14 with firmware 23 or 37 kernel-6.7.4 with firmware 23 or 37 Both the above exhibit slow data throughput immediately after boot while power_save=on. I see a maximum of 1MB/sec. (I am not seeing packet loss like with kernel-6.4 but I am on a different network so can't compare at the moment. The slow data throughput is despite the lack of packet loss.) # iw dev wlp1s0 set power_save off Immediately after turning it off power_save able to achieve 15MB/sec. kernel-6.6.14 and 6.7.4 with firmware 23 After suspend data throughput is still 15MB/sec while power_save=off. kernel-6.6.14 and 6.7.4 with firmware 36 or 37 After suspend data throughput becomes CRIPPLED like 11KB/sec with maximum 50KB/sec. Toggling power_save after this point doesn't fix it. Half of the time unloading and reloading the ath11k_pci kernel module brings it back to the original state of max 1MB/sec where power_save=off can reach 15MB/sec. The other half of the time reloading the kernel module deadlocks the machine.
kernel-6.6.14 linux-firmware/ath11k/WCN6855/hw2.0/amss.bin power_save=off Only this combination of kernel, firmware, and settings has been crash-free and fast for me. I have experienced zero problems with this combination.
kernel-6.6.14 linux-firmware/ath11k/WCN6855/hw2.0/amss.bin version 23 power_save=off Only this combination of kernel, firmware, and settings has been crash-free and fast for me. I have experienced zero problems with this combination.
I'm going with the assumption that the continued suspend problems are in fact the amdgpu regression. I will separately file the "suspend with firmware 37 makes ath11k slow" problem upstream.
I did the same "tests" @wtogami postulated and I came to the same conclusion on my HP 845 G9. It is not fixed (yet). Even downgrading to the Firmware from 2022 does not help. @wtogami in the kernel bugzilla someone suggested to file a separate bug. I think that is a good idea and mark it with "regression" so that Thorsten Leemhuis gets involved.
Same problem here, very annoying. Is there a workaround? rmmod'ing ath11k_pci before suspend doesn't seem to help. Latest on second suspend deadlocks the machine. System: Kernel: 6.7.3-200.fc39.x86_64 arch: x86_64 bits: 64 Desktop: GNOME v: 45.3 Distro: Fedora release 39 (Thirty Nine) Machine: Type: Laptop System: LENOVO product: 21CQCTO1WW v: ThinkPad T14s Gen 3 Mobo: LENOVO model: 21CQCTO1WW v: SDK0T76530 WIN UEFI: LENOVO v: R22ET65W (1.35 ) date: 08/08/2023 CPU: Info: 8-core model: AMD Ryzen 7 PRO 6850U with Radeon Graphics bits: 64 type: MT MCP cache: L2: 4 MiB Speed (MHz): avg: 616 min/max: 400/4768 cores: 1: 1397 2: 1862 3: 400 4: 400 5: 400 6: 400 7: 400 8: 400 9: 400 10: 400 11: 400 12: 400 13: 400 14: 400 15: 1397 16: 400 Graphics: Device-1: AMD Rembrandt [Radeon 680M] driver: amdgpu v: kernel Network: Device-1: Qualcomm QCNFA765 Wireless Network Adapter driver: ath11k_pci
Please upgrade your kernel to 6.7.4-200. Seems to fix the atheros issue as one of the comments says : https://bodhi.fedoraproject.org/updates/FEDORA-2024-3ca09cc1a0 Please create an account on bodhi.fedoraproject.org And run the kernel regression tests as explained here : https://fedoramagazine.org/running-fedora-kernel-regression-tests/ If you configure it properly it will upload results. And report how it goes there : https://bodhi.fedoraproject.org/updates/FEDORA-2024-3ca09cc1a0
As mentioned above there are two separate regressions. 1) The ath11k issue is fixed by the upgraded firmware binary. -36 is definitely broken for many but not all people. -37 fixes it. This needs to be updated in Fedora. 2) There is a GPU driver regression: https://gitlab.freedesktop.org/drm/amd/-/issues/3132. This is only triggered when there is activity specifically at suspend time such as triggering the lock screen from a lid close event. It's fixed by this series https://lore.kernel.org/amd-gfx/20240208055256.130917-1-mario.limonciello@amd.com/ which patches 1 and 2 should be sent out to the 6.8-rc5 fixes pull request.
(In reply to Gilbert Fernandes from comment #26) > Please upgrade your kernel to 6.7.4-200. Seems to fix the atheros issue as > one of the comments says : > https://bodhi.fedoraproject.org/updates/FEDORA-2024-3ca09cc1a0 kernel 6.7.4-200 does *not* fix the suspend issues. Neither disabling bluetooth nor unloading ath11k_pci does help. Deadlock can happen at resume or suspend time.
(In reply to Mario Limonciello from comment #27) > As mentioned above there are two separate regressions. > 2) There is a GPU driver regression: > https://gitlab.freedesktop.org/drm/amd/-/issues/3132. This is only > triggered when there is activity specifically at suspend time such as > triggering the lock screen from a lid close event. It's fixed by this > series > https://lore.kernel.org/amd-gfx/20240208055256.130917-1-mario. > limonciello/ which patches 1 and 2 should be sent out to the 6.8-rc5 > fixes pull request. These fixes are in linux-next now, and I have pulled them back so that they will be in the 6.7.5 stable update when it release.
I seem to have the same problem on kernels >6.6.14 (e.g., 6.7.3, 6.7.4) on my Dell XPS 9320. Interestingly, this machine does not have a non-integrated GPU, and this problem started exactly when the IPU camera stopped working, so initially I thought these were related. ``` System: Kernel: 6.6.14-200.fc39.x86_64 arch: x86_64 bits: 64 Desktop: GNOME v: 45.4 Distro: Fedora Linux 39 (Workstation Edition) Machine: Type: Laptop System: Dell product: XPS 9320 v: N/A serial: <superuser required> Mobo: Dell model: 0JPN6G v: A00 serial: <superuser required> UEFI: Dell v: 1.9.0 date: 09/23/2022 CPU: Info: 12-core (4-mt/8-st) model: 12th Gen Intel Core i7-1260P bits: 64 type: MST AMCP cache: L2: 9 MiB Graphics: Device-1: Intel Alder Lake-P GT2 [Iris Xe Graphics] driver: i915 v: kernel Display: wayland server: X.Org v: 23.2.4 with: Xwayland v: 23.2.4 compositor: gnome-shell driver: dri: iris gpu: i915 resolution: 1920x1200~60Hz API: OpenGL v: 4.6 vendor: intel mesa v: 23.3.5 renderer: Mesa Intel Graphics (ADL GT2) API: EGL Message: EGL data requires eglinfo. Check --recommends. Audio: Device-1: Intel Alder Lake Imaging Signal Processor driver: intel-ipu6 Device-2: Intel Alder Lake PCH-P High Definition Audio driver: sof-audio-pci-intel-tgl API: ALSA v: k6.6.14-200.fc39.x86_64 status: kernel-api Server-1: PipeWire v: 1.0.3 status: active Network: Device-1: Intel Alder Lake-P PCH CNVi WiFi driver: iwlwifi IF: wlp0s20f3 state: up mac: <filter> Bluetooth: Device-1: Intel AX211 Bluetooth driver: btusb type: USB Report: btmgmt ID: hci0 rfk-id: 0 state: down bt-service: enabled,running rfk-block: hardware: no software: yes address: <filter> bt-v: 5.3 Drives: Local Storage: total: 953.87 GiB used: 66.18 GiB (6.9%) ID-1: /dev/nvme0n1 vendor: SK Hynix model: PC801 NVMe 1TB size: 953.87 GiB Swap: ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) dev: /dev/zram0 Info: Memory: total: 16 GiB note: est. available: 15.23 GiB used: 3.02 GiB (19.8%) Processes: 420 Uptime: 25m Shell: fish inxi: 3.3.32 ```
*** Bug 2264875 has been marked as a duplicate of this bug. ***
(In reply to Justin M. Forbes from comment #29) > These fixes are in linux-next now, and I have pulled them back so that they > will be in the 6.7.5 stable update when it release. My issue (Comment 10) is fixed after installing kernel 6.7.5 (https://bodhi.fedoraproject.org/updates/FEDORA-2024-88847bc77a) in Fedora 39 KDE. I'm now able to reboot my system without getting a black screen.
While I had occasional problems with suspend in the past, since a recent update I can now reproduce this issue every single time I enter the suspend mode. If my Thinkpad is plugged into an external power source and if the lid is closed for entering the suspend mode, the screen just stays black after attempting a wake up. If it is not plugged in, it wakes up just fine most of the time. Tried booting with a older kernel that previously worked, with the current kernel and with the 6.8.0-0.rc4 rawhide kernel - all show the same problem. I am using a P14s Gen3 AMD with AMD Ryzen 7 PRO 6850U with Radeon Graphics. Fedora 39.
FEDORA-2024-0e9661ca97 (linux-firmware-20240220-1.fc39) has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-0e9661ca97
FEDORA-2024-355c0ca9d3 (linux-firmware-20240220-1.fc38) has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2024-355c0ca9d3
I experience exactly the same issue with a T14s Gen 3 AMD
FEDORA-2024-355c0ca9d3 has been pushed to the Fedora 38 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-355c0ca9d3` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-355c0ca9d3 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-0e9661ca97 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-0e9661ca97` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-0e9661ca97 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-0e9661ca97 (linux-firmware-20240220-1.fc39) has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.
Problem still persists with the latest linux-firmware 20240220-1.fc39. And in the kernel 6.7.5 suspend mode (and network + some of the FN keys) is not working any more.
As I mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2262577#c27 there are two separate suspend related problems that happened at about the same time. There are two kernels patches that need to be backported still.
(In reply to Mario Limonciello from comment #41) > As I mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2262577#c27 > there are two separate suspend related problems that happened at about the > same time. > There are two kernels patches that need to be backported still. If you are talking about: - drm/amd: Stop evicting resources on APUs in suspend (Mario Limonciello) - Revert "drm/amd: flush any delayed gfxoff on suspend entry" (Mario Limonciello) I pulled those back as soon as they hit linux-next, and they are included in the 6.7.5 fedora kernels.
In that case we must still have another problem :/
I can't trip it, but I think we're looking at a race condition problem. This isn't a solution; but at least to confirm that hypothesis can you build a Fedora test kernel for people to try that reverts 6b1adc1bd3fe38c7af00aed18086b86d13f5db8b but is otherwise the same?
(In reply to Mario Limonciello from comment #44) > I can't trip it, but I think we're looking at a race condition problem. > > This isn't a solution; but at least to confirm that hypothesis can you build > a Fedora test kernel for people to try that reverts > 6b1adc1bd3fe38c7af00aed18086b86d13f5db8b but is otherwise the same? What tree is that commit ID from? It doesn't exist in linus's tree, stable 6.7.y, or the fedora tree.
Must have been a bad copy paste somehow; sorry I can't even find it locally. Here's the hash I meant: 94b1e028e15c94362420f9f3f711fafbf9d52996
https://koji.fedoraproject.org/koji/taskinfo?taskID=113897958 should finish soon, it has been building for a bit, this is the current fedora 6.7.5 with the revert if people will test.
Hello, I have a LENOVO 21D2CTO1WW (ThinkPad Z13 Gen 1) running BIOS 1.64 (N3GET64W (1.64 )) AMD Ryzen 7 PRO 6860Z with Radeon Graphics (family 19 model 44) WCN6855 WLAN (fw build id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37) Fedora Linux 39 (Workstation Edition) which is affected by the issue described in this report (machine hangs/crash on suspend when the lid is closed). FWIW, I rebuilt 6.8-rc5 without 94b1e028e15c94362420f9f3f711fafbf9d52996 and I can't see anymore crashes: I can consistently close the lid and the machine suspends correctly. I'm willing to try 6.7.5 but I can't find it on koji.
https://koji.fedoraproject.org/koji/taskinfo?taskID=113897958 is the build in koji of 6.7.5 withe that patch reverted.
(In reply to Justin M. Forbes from comment #49) > https://koji.fedoraproject.org/koji/taskinfo?taskID=113897958 is the build > in koji of 6.7.5 withe that patch reverted. $ cd $(mktemp -d) && koji download-build --arch=x86_64 kernel-6.7.5-201.fc39 No such build: kernel-6.7.5-201.fc39 Perhaps I'm doing something wrong, first time I use koji, bear with me.
(In reply to Alessandro from comment #50) > (In reply to Justin M. Forbes from comment #49) > > https://koji.fedoraproject.org/koji/taskinfo?taskID=113897958 is the build > > in koji of 6.7.5 withe that patch reverted. > > $ cd $(mktemp -d) && koji download-build --arch=x86_64 kernel-6.7.5-201.fc39 > No such build: kernel-6.7.5-201.fc39 > > Perhaps I'm doing something wrong, first time I use koji, bear with me. Ahh yes, you can't do that with scratch builds because there could be several with the same name... cd $(mktemp -d) && koji download-task 113897958
(In reply to Justin M. Forbes from comment #51) > (In reply to Alessandro from comment #50) > > (In reply to Justin M. Forbes from comment #49) > > > https://koji.fedoraproject.org/koji/taskinfo?taskID=113897958 is the build > > > in koji of 6.7.5 withe that patch reverted. > > > > $ cd $(mktemp -d) && koji download-build --arch=x86_64 kernel-6.7.5-201.fc39 > > No such build: kernel-6.7.5-201.fc39 > > > > Perhaps I'm doing something wrong, first time I use koji, bear with me. > > Ahh yes, you can't do that with scratch builds because there could be > several with the same name... > > cd $(mktemp -d) && koji download-task 113897958 Thanks Justin, that works. TIL I rebooted into 6.7.5-201 and it seems consistent: - I ran amd_s2idle.py --count 4 and it didn't break; - I closed the lid three times and it didn't break; - I suspended from the Gnome menu two times and it didn't break; It looks good so far. The only issue I keep seeing with amd_s2idle.py is this: Explanations for your system 🚦 ACPI BIOS Errors detected When running a firmware component utilized for s2idle the ACPI interpreter in the Linux kernel encountered some problems. This usually means it's a bug in the system BIOS that should be fixed the system manufacturer. You may have problems with certain devices after resume or high power consumption when this error occurs. ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GP17.XHC0.PSTA], AE_ALREADY_EXISTS (20230628/dswload2-326) ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20230628/psobject-220) Perhaps I'll have to liaise with the manufacturer on the latter, though any hints would be appreciated.
Well if the scratch build (and your rc5 test) works it confirms there is either a race condition or a mutex deadlock occurring. Let me explain the situation: When you close the lid or suspend from the GNOME menu it uses logind to kick off the suspend sequence. Logind isn't synchronous, and so the kernel suspend sequence will start while userspace is still active. During this time the lock screen will come up, DPMS engaged, etc. What's happening is that there is some SDMA traffic at this time from the lock screen coming up or the DPMS action. That patch that you reverted intentionally blocks GFXOFF from occurring to workaround a low level platform issue that was reported under SDMA stress. During the suspend sequence there is a point when all pending GFXOFF requests are flushed, and I "think" that's conflicting. Unfortunately, I can't reproduce the issue locally, so it's very hard for me to accurately hypothesize the specifics. So this is purely a guess; but does this help? You can apply it to 6.8-rc6. diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index 0058f3f7cf6e..c78aa71d8753 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -1655,7 +1655,8 @@ static void sdma_v5_2_ring_begin_use(struct amdgpu_ring *ring) * this GFXOFF will be disallowed anyway when SDMA is * active, this just makes it explicit. */ - amdgpu_gfx_off_ctrl(adev, false); + if (!adev->in_s0ix) + amdgpu_gfx_off_ctrl(adev, false); } static void sdma_v5_2_ring_end_use(struct amdgpu_ring *ring) @@ -1666,7 +1667,8 @@ static void sdma_v5_2_ring_end_use(struct amdgpu_ring *ring) * disallow GFXOFF in some cases leading to * hangs in SDMA. Allow GFXOFF when SDMA is complete. */ - amdgpu_gfx_off_ctrl(adev, true); + if (!adev->in_s0ix) + amdgpu_gfx_off_ctrl(adev, true); } const struct amd_ip_funcs sdma_v5_2_ip_funcs = {
(In reply to Mario Limonciello from comment #53) > Well if the scratch build (and your rc5 test) works it confirms there is > either a race condition or a mutex deadlock occurring. > > Let me explain the situation: > When you close the lid or suspend from the GNOME menu it uses logind to kick > off the suspend sequence. > Logind isn't synchronous, and so the kernel suspend sequence will start > while userspace is still active. > During this time the lock screen will come up, DPMS engaged, etc. > > What's happening is that there is some SDMA traffic at this time from the > lock screen coming up or the DPMS action. > > That patch that you reverted intentionally blocks GFXOFF from occurring to > workaround a low level platform issue that was reported under SDMA stress. > During the suspend sequence there is a point when all pending GFXOFF > requests are flushed, and I "think" that's conflicting. > > Unfortunately, I can't reproduce the issue locally, so it's very hard for me > to accurately hypothesize the specifics. > So this is purely a guess; but does this help? You can apply it to 6.8-rc6. > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > index 0058f3f7cf6e..c78aa71d8753 100644 > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > @@ -1655,7 +1655,8 @@ static void sdma_v5_2_ring_begin_use(struct > amdgpu_ring *ring) > * this GFXOFF will be disallowed anyway when SDMA is > * active, this just makes it explicit. > */ > - amdgpu_gfx_off_ctrl(adev, false); > + if (!adev->in_s0ix) > + amdgpu_gfx_off_ctrl(adev, false); > } > > static void sdma_v5_2_ring_end_use(struct amdgpu_ring *ring) > @@ -1666,7 +1667,8 @@ static void sdma_v5_2_ring_end_use(struct amdgpu_ring > *ring) > * disallow GFXOFF in some cases leading to > * hangs in SDMA. Allow GFXOFF when SDMA is complete. > */ > - amdgpu_gfx_off_ctrl(adev, true); > + if (!adev->in_s0ix) > + amdgpu_gfx_off_ctrl(adev, true); > } > > const struct amd_ip_funcs sdma_v5_2_ip_funcs = { Thanks for your reply and for the patch, Mario. I applied it to 6.8-rc6 without reverting 94b1e028e15c94362420f9f3f711fafbf9d52996. I can consistently suspend the machine with amd_s2idle.py; however, when I close the lid the machine hangs/crashes. There is nothing in /var/lib/systemd/pstore/ and this is the last line I found in the kernel log: Feb 27 22:14:43 kernel: PM: suspend entry (s2idle) Is there anything else I can do to help you debug further? I'm not very accustom with the code in amdgpu but willing to help you sorting this out. Thanks
Thanks for checking it and confirming that patch doesn't help. Let's discuss next steps for ideas on the upstream bug report as this one is closed. If we come up with a solution we'll nominate it for stable and ping jforbes and Fedora can pick it up more quickly considering the regression.
FEDORA-2024-355c0ca9d3 (linux-firmware-20240220-1.fc38) has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report.
> I rebooted into 6.7.5-201 and it seems consistent: > > - I ran amd_s2idle.py --count 4 and it didn't break; > - I closed the lid three times and it didn't break; > - I suspended from the Gnome menu two times and it didn't break; @alessandro.cassese I didn't see kernel 201 in https://koji.fedoraproject.org/koji/packageinfo?packageID=8 I see only: - kernel-6.7.5-200.fc39 - and kernel-6.7.6-200.fc39 Questions: - Do you think that the kernel kernel-6.7.6-200.fc39 contains the bugfix patch? - Else, why isn't the kernel-6.7.5-201.fc39 available? Best regards, Stéphane
> @alessandro.cassese I didn't see kernel 201 in > https://koji.fedoraproject.org/koji/packageinfo?packageID=8 It was a scratch build as referenced in comment 47. > - Do you think that the kernel kernel-6.7.6-200.fc39 contains the bugfix > patch? No reference to it in the changelog so unless it's in the upstream 6.7.6 changelog no. > - Else, why isn't the kernel-6.7.5-201.fc39 available? It's a scratch build with a test in it.
> > - Do you think that the kernel kernel-6.7.6-200.fc39 contains the bugfix > patch? No reference to it in the changelog so unless it's in the upstream 6.7.6 changelog no. I applied all the Fedora 39 updates and I've noticed since 24 hours that my initial problem seems to be fixed. See package details here: https://discussion.fedoraproject.org/t/random-resume-after-suspend-issue-on-thinkpad-t14s-amd-gen3-radeon-680m-ryzen-7/103452/19 Best regards, Stéphane