Created attachment 1525522 [details] dmesg from failed suspend attempt 1. Please describe the problem: Starting with kernel 4.19 my Lenovo ThinkPad X1 Carbon 5th fails to suspend to RAM. When closing the lid or executing "systemctl suspend" the screen goes black and the status led starts to blink rapidly (just like when power is plugged in). The keyboard lights can still be toggled using Fn+space, so the firmware appears to be (partly) alive. 2. What is the Version-Release number of the kernel: kernel-4.20.5-200.fc29.x86_64 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes, it still works with kernel-4.18.18-300.fc29.x86_64. It basically started failing with the first 4.19 kernel that hit updates-testing. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Yes, this happens every single time suspend is triggered with a 4.19+ kernel. I only have to boot it up and try to suspend. Even in runlevel 3. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Hmm... I'll try that and add a comment. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No, no out-of-tree modules. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag.
I've installed kernel-5.0.0-0.rc4.git2.1.fc30.x86_64 from koji... still fails to suspend.
Just realized I probably wasn't very clear about the implications of the failed suspend... After the screen goes blank and the led starts blinking there is no way to wake up / recover the system. Only way out is to press the power button several seconds to force it off and power cycle.
Several of the entries at the end of the log are for the wireless device, so it might be worth seeing if disabling wireless allows suspending to complete. If the BIOS supports it, you could try disabling wireless there.
(In reply to Steve from comment #3) > Several of the entries at the end of the log are for the wireless device, so > it might be worth seeing if disabling wireless allows suspending to > complete. If the BIOS supports it, you could try disabling wireless there. I've just disabled wireless in the BIOS and booted 4.20.6-200.fc29.x86_64, but suspend still fails. The kernel log doesn't show anything interesting at the end, probably because it doesn't hit the disk :(
(In reply to Thomas Müller from comment #4) ... > I've just disabled wireless in the BIOS and booted 4.20.6-200.fc29.x86_64, > but suspend still fails. ... Thanks for checking that. There is also a USB Sierra Wireless EM7455 Qualcomm Snapdragon X7 LTE-A device. Is there a way to disable it? Snippet from attached log: $ grep -n 'usb 1-6' dmesg_4.20.5_failedSuspend-1.txt 704:Jan 31 20:11:51 kernel: usb 1-6: new high-speed USB device number 2 using xhci_hcd 709:Jan 31 20:11:51 kernel: usb 1-6: config 1 has an invalid interface number: 12 but max is 1 710:Jan 31 20:11:51 kernel: usb 1-6: config 1 has an invalid interface number: 13 but max is 1 711:Jan 31 20:11:51 kernel: usb 1-6: config 1 has an invalid interface number: 13 but max is 1 712:Jan 31 20:11:51 kernel: usb 1-6: config 1 has no interface number 0 713:Jan 31 20:11:51 kernel: usb 1-6: config 1 has no interface number 1 714:Jan 31 20:11:51 kernel: usb 1-6: New USB device found, idVendor=1199, idProduct=9079, bcdDevice= 0.06 715:Jan 31 20:11:51 kernel: usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3 716:Jan 31 20:11:51 kernel: usb 1-6: Product: Sierra Wireless EM7455 Qualcomm Snapdragon X7 LTE-A 717:Jan 31 20:11:51 kernel: usb 1-6: Manufacturer: Sierra Wireless, Incorporated
Does your BIOS have an option to set the sleep state to "Linux"? This is for Carbon, 6th, but it mentions "BIOS version 1.30", and the attached log shows "1.36", so this might be applicable: Lenovo ThinkPad X1 Carbon (Gen 6) Suspend issues https://wiki.archlinux.org/index.php/Lenovo_ThinkPad_X1_Carbon_(Gen_6)#Suspend_issues Snippet from attached log: $ egrep 'DMI:|ACPI.*supports' dmesg_4.20.5_failedSuspend-1.txt Jan 31 20:11:51 kernel: DMI: LENOVO 20HRCTO1WW/20HRCTO1WW, BIOS N1MET51W(1.36) 01/11/2019 Jan 31 20:11:51 kernel: ACPI: (supports S0 S3 S4 S5)
(In reply to Steve from comment #5) > (In reply to Thomas Müller from comment #4) > ... > > I've just disabled wireless in the BIOS and booted 4.20.6-200.fc29.x86_64, > > but suspend still fails. > ... > > Thanks for checking that. There is also a USB Sierra Wireless EM7455 > Qualcomm Snapdragon X7 LTE-A device. Is there a way to disable it? Actually it already was disabled alongside wireless lan during the last boot of 4.20.6. I'll add the full kernel log from that unsuccessful experiment for reference. (In reply to Steve from comment #6) > Does your BIOS have an option to set the sleep state to "Linux"? > > This is for Carbon, 6th, but it mentions "BIOS version 1.30", and the > attached log shows "1.36", so this might be applicable: > > Lenovo ThinkPad X1 Carbon (Gen 6) > Suspend issues > https://wiki.archlinux.org/index.php/ > Lenovo_ThinkPad_X1_Carbon_(Gen_6)#Suspend_issues > > Snippet from attached log: > > $ egrep 'DMI:|ACPI.*supports' dmesg_4.20.5_failedSuspend-1.txt > Jan 31 20:11:51 kernel: DMI: LENOVO 20HRCTO1WW/20HRCTO1WW, BIOS > N1MET51W(1.36) 01/11/2019 > Jan 31 20:11:51 kernel: ACPI: (supports S0 S3 S4 S5) No, the 5th does not have this option, even with the current BIOS version. However, S3 is advertised as supported by the firmware according to the kernel log: > $ cat dmesg_4.20.6-200.fc29.x86_64_noWifi_noLTE | grep -i "acpi: (supports" > Feb 05 19:28:08 kernel: ACPI: (supports S0 S3 S4 S5)
Created attachment 1527433 [details] dmesg from failed suspend attempt with 4.20.6 and wifi and LTE disabled
Thanks for your followup report and for attaching the 4.20.6 output. For the record, could you post the output from: $ grep . /sys/power/*
(In reply to Steve from comment #9) > Thanks for your followup report and for attaching the 4.20.6 output. For the > record, could you post the output from: > > $ grep . /sys/power/* 4.18.18: > /sys/power/disk:[disabled] > /sys/power/image_size:6609518592 > /sys/power/mem_sleep:s2idle [deep] > /sys/power/pm_async:1 > /sys/power/pm_debug_messages:0 > /sys/power/pm_freeze_timeout:20000 > /sys/power/pm_print_times:0 > /sys/power/pm_test:[none] core processors platform devices freezer > /sys/power/pm_trace:0 > /sys/power/pm_trace_dev_match:acpi > /sys/power/pm_trace_dev_match:memory > grep: /sys/power/pm_wakeup_irq: No data available > /sys/power/reserved_size:1048576 > /sys/power/resume:0:0 > /sys/power/resume_offset:0 > /sys/power/state:freeze mem > /sys/power/wakeup_count:75 4.20.6 > /sys/power/disk:[disabled] > /sys/power/image_size:6607970304 > /sys/power/mem_sleep:s2idle [deep] > /sys/power/pm_async:1 > /sys/power/pm_debug_messages:0 > /sys/power/pm_freeze_timeout:20000 > /sys/power/pm_print_times:0 > /sys/power/pm_test:[none] core processors platform devices freezer > /sys/power/pm_trace:0 > /sys/power/pm_trace_dev_match:memory > grep: /sys/power/pm_wakeup_irq: No data available > /sys/power/reserved_size:1048576 > /sys/power/resume:0:0 > /sys/power/resume_offset:0 > /sys/power/state:freeze mem > /sys/power/wakeup_count:1
(In reply to Thomas Müller from comment #10) ... > > /sys/power/pm_test:[none] core processors platform devices freezer ... Thanks for posting the /sys/power/ output. Here is a possible debugging strategy using "pm_test". In a terminal window, run: $ dmesg -w In a separate terminal window, run as root: # sync # cat /sys/power/pm_test # echo devices > /sys/power/pm_test # Echo "devices" or one of the other strings in pm_test. # cat /sys/power/pm_test # This should show "[devices]" (in square brackets). # echo mem > /sys/power/state Wait for about 10 seconds -- the system should automatically resume. Documentation: Debugging hibernation and suspend https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt Scroll down to "2. Testing suspend to RAM (STR)". I'm not sure how to use this information, but you don't need to do a mount before running: # cat /sys/kernel/debug/suspend_stats This documents the files in /sys/power/, but not "pm_test": https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-power
When I execute > echo core > /sys/power/pm_test and then > echo mem > /sys/power/state the system immediately goes blank and freezes just like when I really try to activate suspend... No chance to get anything from `dmesg -w` :( The other options (processors platform devices freezer) worked without any errors.
Correction, both "core" and "processors" fail.
(In reply to Thomas Müller from comment #13) > Correction, both "core" and "processors" fail. Thanks for your report. The documentation says: 'If the "processors" test fails, the disabling/enabling of nonboot CPUs does not work (of course, this only may be an issue on SMP systems) and the problem should be reported. In that case you can also try to switch the nonboot CPUs off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and see if that works.' https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt Try: # cat /sys/devices/system/cpu/cpu*/online # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > /sys/devices/system/cpu/cpu2/online # echo 0 > /sys/devices/system/cpu/cpu3/online # cat /sys/devices/system/cpu/cpu*/online (NB: There is no "online" file for "cpu0".) After that, try: # echo processors > /sys/power/pm_test # echo mem > /sys/power/state For the record, the Intel i7-7600U has two cores and four threads: $ grep 'smpboot: CPU0:' dmesg_4.20.6-noWifi_noLTE-1.txt Feb 05 19:28:08 kernel: smpboot: CPU0: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz (family: 0x6, model: 0x8e, stepping: 0x9) https://ark.intel.com/products/97466/Intel-Core-i7-7600U-Processor-4M-Cache-up-to-3-90-GHz-
Here is a more elegant way to manage CPUs: # lscpu -e # list # chcpu -d 1,2,3 # disable # lscpu -e # chcpu -e 1,2,3 # enable Documentation: $ man lscpu $ man chcpu
Well, we are coming closer to the actual problem I guess... Initially, lscpu -e shows the following > CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ > 0 0 0 0 0:0:0:0 ja 3900,0000 400,0000 > 1 0 0 1 1:1:1:0 ja 3900,0000 400,0000 > 2 0 0 0 0:0:0:0 ja 3900,0000 400,0000 > 3 0 0 1 1:1:1:0 ja 3900,0000 400,0000 If I try to execute # chcpu -d 1,2,3 or # echo 0 > /sys/devices/system/cpu/cpu1/online on 4.20.8-200.fc29.x86_64, the command blocks, while the system itself remains (mostly) usable. lscpu still shows the same output (i.e. all cpus online), but if i try to read directly from /sys/devices/system/cpu/cpu1/online (i.e. `cat /sys/devices/system/cpu/cpu1/online`) that command also blocks indefinitely. Unfortunately, no message whatsoever is shown in the kernel logs. Also, reboot or poweroff no longer works and the system needs a hard reset. :( On 4.18.18-300.fc29.x86_64 the above commands successfully take a cpu offline (and online again).
Thanks for testing and for your report. I suggest updating the bug summary to say something like this: "disabling secondary CPU hangs with kernel 4.19+ on Lenovo ThinkPad X1 Carbon 5th"
These messages could be related. For comparison, could you attach a log for 4.18.18-300.fc29.x86_64? $ grep -n 'CPU.*temp' dmesg_4.20.6-noWifi_noLTE-1.txt 693:Feb 05 19:28:08 kernel: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1) 694:Feb 05 19:28:08 kernel: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1) 695:Feb 05 19:28:08 kernel: CPU2: Core temperature above threshold, cpu clock throttled (total events = 1) 696:Feb 05 19:28:08 kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1) 697:Feb 05 19:28:08 kernel: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1) 698:Feb 05 19:28:08 kernel: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1) 701:Feb 05 19:28:08 kernel: CPU0: Core temperature/speed normal 702:Feb 05 19:28:08 kernel: CPU2: Core temperature/speed normal 703:Feb 05 19:28:08 kernel: CPU2: Package temperature/speed normal 704:Feb 05 19:28:08 kernel: CPU3: Package temperature/speed normal 705:Feb 05 19:28:08 kernel: CPU1: Package temperature/speed normal 706:Feb 05 19:28:08 kernel: CPU0: Package temperature/speed normal
Created attachment 1535625 [details] dmesg from 4.18.18 with successful suspend (In reply to Steve from comment #18) > These messages could be related. For comparison, could you attach a log for > 4.18.18-300.fc29.x86_64? I've attached a log from 4.18.18 for reference. It also contains a successful suspend and resume at the end of the log. I'm pretty sure those messages are unrelated as I've always been seeing them and they also appear with 4.18.18. The X1 is quite small and cooling seems to be a bit undersized which is why the cpus get throttled every now and then.
I have bisected the kernel and found the culprit (or at least something, that triggers the bad behavior): [be45bf5395e0886a93fc816bbe41a008ec2e42e2] watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug be45bf5395e0886a93fc816bbe41a008ec2e42e2 is the first bad commit commit be45bf5395e0886a93fc816bbe41a008ec2e42e2 Author: Peter Zijlstra <peterz> Date: Fri Jul 13 12:42:08 2018 +0200 watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug When scheduling is delayed for longer than the softlockup interrupt period it is possible to double-queue the cpu_stop_work, causing list corruption. Cure this by adding a completion to track the cpu_stop_work's progress. Reported-by: kernel test robot <lkp> Tested-by: Rong Chen <rong.a.chen> Signed-off-by: Peter Zijlstra (Intel) <peterz> Cc: Linus Torvalds <torvalds> Cc: Peter Zijlstra <peterz> Cc: Thomas Gleixner <tglx> Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work") Link: http://lkml.kernel.org/r/20180713104208.GW2494@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo> :040000 040000 6aca2dbb84bc33fe442b18b3d0a135c27adff7b9 2710af12d32e4b98df07768716689b213bce45fc M kernel
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs. Fedora XX has now been rebased to 5.0.6 Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30. If you experience different issues, please open a new bug report for those.
Good news: starting with 5.0.6 suspend is working again.