Bug 2301921 - ACPI events wake up Thinkpad laptops when they shouldn't (regression in kernel 6.10 in Qualcomm wifi driver)
Summary: ACPI events wake up Thinkpad laptops when they shouldn't (regression in kerne...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 40
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2306298 (view as bug list)
Depends On:
Blocks: 2184978
TreeView+ depends on / blocked
 
Reported: 2024-07-31 08:20 UTC by Kamil Páral
Modified: 2024-09-18 11:17 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
If this bug requires documentation, please select an appropriate Doc Type value.
Last Closed: 2024-09-18 11:17:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg.txt (128.53 KB, text/plain)
2024-07-31 08:20 UTC, Kamil Páral
no flags Details
rpm-qa.txt (75.43 KB, text/plain)
2024-07-31 08:21 UTC, Kamil Páral
no flags Details
Commits skipped during "git bisect" to get a successful result on Qualcomm W-Fi. (5.10 KB, text/plain)
2024-08-07 19:12 UTC, Renjith Pananchikkal
no flags Details
Log of the issue on the Thinkpad Z13 (274.43 KB, text/plain)
2024-08-22 16:05 UTC, Timur Kristóf
no flags Details
Log from amd_s2idle.py: ran while plugged in, waited for sleep, then unplugged (189.30 KB, text/plain)
2024-08-24 14:13 UTC, Chris Adams
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 219196 0 P3 NEW ath11k: Suspend broken in 6.10 and later on Lenovo platforms 2024-08-26 11:23:25 UTC

Internal Links: 2266265

Description Kamil Páral 2024-07-31 08:20:22 UTC
1. Please describe the problem:

From kernel 6.10, Thinkpad P16v Gen 1 (AMD) wakes up from suspend during these actions:
* power supply connected
* power supply disconnected
* lid closed (closed, not opened)

These actions are not supposed to trigger system resume. With kernel 6.9 and earlier, these actions don't wake up the laptop [1]. From kernel 6.10, they wake it up every time.

This breaks regular workflows, like "suspend the laptop, disconnect from power, put it into your bag". If you don't have autosuspend on idle enabled in your desktop environment (I don't), it will run until it completely depletes battery and dies. Also, it's no longer possible to have HandleLidSwitch=ignore configured in logind.conf, because then there's no way to suspend the laptop (if you suspend it manually using the power button and the close the lid, it wakes up).

The laptop uses s2idle, no other sleep mode is supported. The laptop has the latest firmware installed [2].


[1] I think I saw a non-frequent race condition that sometimes woke it up on a power supply change even with older kernels. However, from 6.10, it seems to happen consistently every time.
[2] System Firmware 0.1.52. For updating all firmware parts, a Windows machine had to be used.


2. What is the Version-Release number of the kernel:

kernel-6.10.1-200.fc40.x86_64


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

BROKEN:  kernel-6.10.1-200.fc40.x86_64
WORKING: kernel-6.9.12-200.fc40.x86_64


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

1. Close the lid to suspend the laptop, see the LED blinking to indicate sleep
2. Disconnect (or connect) AC power
3. See the LED shining, it woke up

or

1. Suspend the laptop manually by pressing the power button, see the LED blinking to indicate sleep
2. Close the lid
3. See the LED shininig, it woke up


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Yes, kernel-6.11.0-0.rc1.20240730git94ede2a3e913.17.fc41 is also affected.


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Kamil Páral 2024-07-31 08:20:57 UTC
Created attachment 2043130 [details]
dmesg.txt

Comment 2 Kamil Páral 2024-07-31 08:21:01 UTC
Created attachment 2043131 [details]
rpm-qa.txt

Comment 3 Kamil Páral 2024-07-31 08:24:14 UTC
@mpearson Hey Mark, is there anything you can do from your side to fix this? Thanks.

Comment 4 Mark Pearson 2024-07-31 17:00:45 UTC
Need to check with the AMD folk - as I think this may be 'by design'. Not sure if the Lenovo FW team need involving or not.

My understanding is these ACPI events happen, but on Windows they get handled with the screen turned off and the system goes back to sleep shortly after. I think that support is still missing in Linux.

I don't know what changed in 6.10 though. If the AMD folk don't have the answer off the top of their heads I can look at doing a bisect and tracking it down. If it's something we're doing wrong in FW I can get it looked at.

Can you confirm which BIOS and EC you are using please?

Mark

Comment 5 Kamil Páral 2024-08-01 11:32:32 UTC
Thanks, Mark. BIOS is 1.52, EC is 1.07.

Comment 6 Renjith Pananchikkal 2024-08-07 19:10:33 UTC
Hi Mark,
It's definitely not an added feature.
We did some debugging and figured out that the issue was first seen in 6.10-rc1.
One of my colleagues did a "git bisect" and figured out that if 72 commits from Qualcomm (mostly ath12k & ath11k) are skipped, everything works fine.
The above was verified on Lenovo P14s with Qualcomm Wi-Fi card. I tried kernel 6.10-rc1 on multiple laptops and the failure follows Qualcomm Wi-Fi.
Two laptops with MediaTek Wi-Fi (Lenovo Z16 Gen2 & and a laptop from another OEM) does not exhibit this issue. 

Kind Regards,
Renjith,
AMD Client Linux.

Comment 7 Renjith Pananchikkal 2024-08-07 19:12:44 UTC
Created attachment 2043648 [details]
Commits skipped during "git bisect" to get a successful result on Qualcomm W-Fi.

Commits skipped during "git bisect" to get a successful result on Qualcomm W-Fi.

Comment 8 Kamil Páral 2024-08-08 10:15:04 UTC
Renjith, thanks a lot for narrowing down the problem! It's also interesting to hear that this affects multiple Thinkpad models.

Mark, do you think you could raise this on an appropriate kernel mailing list, or contact the Qualcomm team directly? Or what is the best course of action now? Thanks!

Comment 9 Mark Pearson 2024-08-13 00:38:33 UTC
I've forwarded the details to Qualcomm.
As a note - tracking internally with LO-3246, and there's a report of similar on the Z13 G1 that may be related on our forums.
Mark

Comment 10 Mark Pearson 2024-08-14 17:38:02 UTC
Strange - I posted this yesterday...but it didn't take :(

Initial feedback is there isn't anything obvious in those commits that should impact suspend (they were targeted at hibernate). They requested the following extra logs if possible, please:
----
enable PM debug with following steps and try again to collect logs:
        echo 1 > /sys/power/pm_debug_messages
        echo 1 > /sys/power/pm_print_times

PCI devices info is helpful (this keeps unchanged before/after suspend, so can do it anytime):
        lspci -vt

IRQ stats right BEFORE & AFTER suspend/resume
        cat /proc/interrupts
---

As a note - we haven't reproduced the issue yet. Still trying.
Thanks
Mark

Comment 11 Timur Kristóf 2024-08-22 16:05:42 UTC
Created attachment 2044616 [details]
Log of the issue on the Thinkpad Z13

Hi,

I'm the person who reported a very similar issue on the Thinkpad Z13 on the Lenovo forums. Basically on the new kernel, it always wakes up after AC is plugged in/out, and even manually sleeping the laptop won't work because then it wakes up again when I close the lid.

This basically renders the laptop very annoying to use. Also makes me wonder if anyone is testing these devices with newer kernel releases. I suppose Qualcomm definitely isn't.

I've ran the commands suggested by Mark and attaching the resulting log to this issue, hope this helps.

Comment 12 Chris Adams 2024-08-24 14:09:09 UTC
*** Bug 2306298 has been marked as a duplicate of this bug. ***

Comment 13 Chris Adams 2024-08-24 14:12:45 UTC
I am seeing this issue on my Thinkpad T14s Gen4 AMD. Examples of ways to trigger it:
- on battery, close lid (goes to sleep), plug in (wakes)
- plugged in, suspend, unplug (wakes)

I used https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py to gather debug info; I had the system plugged in, ran as root:

./amd_s2idle.py --log s2idle_report-$(date +%F-%T).txt --duration 60 --wait 1 --count 1

When the system went to sleep, I waited a few seconds, then unplugged it, triggering an "early" wakeup. Attaching the log.

Comment 14 Chris Adams 2024-08-24 14:13:32 UTC
Created attachment 2044772 [details]
Log from amd_s2idle.py: ran while plugged in, waited for sleep, then unplugged

Comment 15 Mark Pearson 2024-08-26 01:23:23 UTC
It took me longer then it should - last week was hectic, but I finished doing the full bisect on this on the P16v AMD this weekend, and tracked down the offending commit (https://github.com/torvalds/linux/commit/166a490f59ac10340ee5330e51c15188ce2a7f8f)

I'm flagging this to Qualcomm - and will get a kernel bugzilla open for tracking.
For those seeing it on other platforms - could you confirm that if you disable WLAN that you don't see the problem please? Just to make sure we're not chasing two issues.

Mark

Comment 16 Mark Pearson 2024-08-26 01:37:54 UTC
Upstream kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=219196

Comment 17 Kamil Páral 2024-08-26 11:23:25 UTC
Thanks, Mark, for finding the offending commit and filing the kernel bugzilla.

Comment 18 Timur Kristóf 2024-08-27 12:15:03 UTC
I can confirm that reverting the problematic patch indeed solves the issue (Mark sent me a kernel build to test). However, without reverting that patch, simply disabling the WiFi does NOT solve it.

Would it be possible to ask the Fedora Kernel team to revert that patch until a proper fix is found upstream?

Comment 19 Chris Adams 2024-08-27 13:34:40 UTC
I tried a revert on top of Fedora's 6.10.6-200.fc40 RPM - there was a simple-looking conflict, but I guess not so simple. While it fixed the problem (no unwanted wake when asleep+plugged in), it also caused a kernel error for "Hardware became unavailable upon resume." and wifi didn't work. The conflict was in the resume function, so I guess I didn't get that right (and didn't have time to investigate more).

Comment 20 Mark Pearson 2024-08-27 13:49:16 UTC
My understanding is the Fedora kernel team, quite sensibly, don't like pulling in fixes that aren't accepted upstream.
Qualcomm are looking at this with some urgency - particularly as the 6.11 window is closing so they want to get the fix before then. I have an internal ticket with them as well as the upstream kernel bugzilla.

As a side note, I'm hoping there will be a fix or revert and they'll tag any fix as a stable backport so Fedora automatically gets it. But if for some reason they don't, I will do a MR for the Fedora 6.10 kernel.

One reason I'm hesitant to just blindly revert this is it's possible it's not the drivers fault, and it's just uncovering something wrong in Lenovo FW. I don't _think_ that's the case here, but from the reports I've seen so far, only Lenovo's are the impacted devices. I can't rule it out yet.

My recommendation for now is to run 6.9 - and we'll have a better idea in the next week of what the fix will be.

Chris - just FYI, I was building from Linus's tree rather than the Fedora kernel-ark so I wonder if that's the reason? In my case it was a simple git revert on top of Linus's 6.10 tag. I do have a kernel-ark based build running for something else I'm working on - I'll check it out on there.

Mark

Comment 21 Timur Kristóf 2024-08-27 19:07:06 UTC
> One reason I'm hesitant to just blindly revert this is it's possible it's not the drivers fault, and it's just uncovering something wrong in Lenovo FW. I don't _think_ that's the case here, but from the reports I've seen so far, only Lenovo's are the impacted devices. I can't rule it out yet.

That makes sense.

> My recommendation for now is to run 6.9 - and we'll have a better idea in the next week of what the fix will be.

Thanks, will do.

> Chris - just FYI, I was building from Linus's tree rather than the Fedora kernel-ark so I wonder if that's the reason? In my case it was a simple git revert on top of Linus's 6.10 tag.

AFAIU the 6.10 tag corresponds to the 6.10.0 version, while Chris was trying to revert it from 6.10.6 ― there may have been other code changes between those two versions that conflict with the revert.

Comment 22 Chris Adams 2024-09-13 22:58:51 UTC
I saw this had been added to upstream 6.10.10 and that version is in updates-testing for F40, so I gave it a try on my Thinkpad. It does fix the issue for me.

I understand this may not be a permanent solution upstream, but it's working for me for now.

Comment 23 Kamil Páral 2024-09-18 11:17:23 UTC
I also tested kernel 6.11.0-63.fc41 on Fedora 41, and the problem is also fixed there. Thanks everyone for helping to resolve this! I think we can now close this bug as fixed (kernel 6.10.10 will be in stable F40 today or tomorrow).


Note You need to log in before you can comment on or make changes to this bug.