Bug 2162013 - Suspend immediately resumes with kernel 6.1
Summary: Suspend immediately resumes with kernel 6.1
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 37
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-18 15:15 UTC by Nathan Smythe
Modified: 2023-12-07 14:56 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-07 14:56:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel log (112.90 KB, text/plain)
2023-01-18 15:15 UTC, Nathan Smythe
no flags Details
acpi dump result (994.70 KB, text/plain)
2023-01-18 15:54 UTC, Nathan Smythe
no flags Details
Debug kernel log (327.57 KB, text/plain)
2023-01-19 03:08 UTC, Nathan Smythe
no flags Details
dmidecode output (16.13 KB, text/plain)
2023-01-19 04:35 UTC, Nathan Smythe
no flags Details

Description Nathan Smythe 2023-01-18 15:15:54 UTC
Created attachment 1938929 [details]
kernel log

1. Please describe the problem:

System (System76 Pangolin (pang11) laptop with AMD Ryzen 5 5500U) immediately resumes when trying to suspend while running a 6.1 kernel

2. What is the Version-Release number of the kernel:

6.1.6-200.fc37.x86_64
6.1.5-200.fc37.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes, works fine when booting into 6.0.18-300.fc37.x86_64


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Power->suspend while booted into a 6.1 kernel

Comment 1 Mario Limonciello 2023-01-18 15:21:45 UTC
Can you please reproduce the issue with this script and provide the log that it saves?
https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py

Comment 2 Mario Limonciello 2023-01-18 15:26:47 UTC
Sorry; I looked at your log and the suspend was issued using "deep" not "s2idle".  So you won't be able to use that script to capture debug information.
You'll need to look at /sys/power/pm_wakeup_irq to determine what the IRQ is that caused the wakeup and then to figure out next steps.

Comment 3 Nathan Smythe 2023-01-18 15:33:43 UTC
Thanks for the reply.

cat /sys/power/pm_wakeup_irq
7

Comment 4 Mario Limonciello 2023-01-18 15:39:56 UTC
OK thanks.

That's typically the GPIO controller on AMD machines.  You could confirm this from /proc/interrupts.
If that's correct, please do the following:

1) Capture an acpidump for your system and attach to the issue.  This will let us attempt to determine what GPIO is associated with what device.
2) Add https://github.com/torvalds/linux/commit/1d66e379731f79ae5039a869c0fde22a4f6a6a91 to your kernel.
3) Turn on dynamic debugging for the pinctrl-amd driver.  Here is information how to do that if you're not familiar with it: https://www.kernel.org/doc/html/next/admin-guide/dynamic-debug-howto.html
4) Reproduce the issue and add a new kernel log. It should tell you what GPIOs were active during resume.

Comment 5 Nathan Smythe 2023-01-18 15:53:04 UTC
lsdev shows pinctrl_amd using irq 7.

/proc/interrupts shows:
 7:      14329          0          0          0          0       2724          0          0          0          0          0          0  IR-IO-APIC    7-fasteoi   pinctrl_amd

1) I'll attached the dump after I submit this.
2-4) Looks like I'll need to compile a kernel? I'll have to do these tonight after work.

Comment 6 Nathan Smythe 2023-01-18 15:54:13 UTC
Created attachment 1938940 [details]
acpi dump result

Comment 7 Mario Limonciello 2023-01-18 15:59:09 UTC
> 1) I'll attached the dump after I submit this.

Thanks.  It won't be useful to analyze until after we see which GPIO caused the wakeup.

> 2-4) Looks like I'll need to compile a kernel? I'll have to do these tonight after work.

It's included in 6.1.7.  If you have access to a distribution binary built with 6.1.7 you can avoid compiling your own kernel.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=046c9972dd40775cff4d12294ca14b0b624f2c35

Comment 8 Hans de Goede 2023-01-18 16:48:27 UTC
(In reply to Mario Limonciello from comment #7)

Mario, thank you for taking a look at this.

> It's included in 6.1.7.  If you have access to a distribution binary built
> with 6.1.7 you can avoid compiling your own kernel.
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?h=linux-6.1.y&id=046c9972dd40775cff4d12294ca14b0b624f2c35

Usually Fedora is pretty quick with picking up new stable releases. It is probably easiest (certainly the most hassle free) to just wait for that.

ATM I'm not seeing anything yet, but I expect 6.1.7 to be available here within 24 hours:

https://koji.fedoraproject.org/koji/packageinfo?packageID=8

(and in updates-testing soon after that)

For some quick instructions for installing a kernel directly from koji (the Fedora build system), see:

https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

Comment 9 Nathan Smythe 2023-01-19 03:08:33 UTC
Created attachment 1939071 [details]
Debug kernel log

Comment 10 Nathan Smythe 2023-01-19 03:09:46 UTC
I've attached a kernel log that hopefully has the debugging information enabled correctly.

Comment 11 Mario Limonciello 2023-01-19 03:49:40 UTC
It looks like GPIO #9 which is for your touchpad.. this might actually be very similar to the issue that was raised VERY recently in kernel 6.2:
https://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git/commit/?h=gpio/for-current&id=4cb786180dfb5258ff3111181b5e4ecb1d4a297b

Nathan:
Can you please check /sys/bus/i2c/devices/i2c-ELAN0415\:00/power/wakeup?  Is this enabled or disabled for you by default?

If it's enabled, can you please check if "echo disabled > /sys/bus/i2c/devices/i2c-ELAN0415\:00/power/wakeup" fixes it?

Hans:

Did Fedora's kernel pick up

1796f808e4bb2 ("HID: i2c-hid: acpi: Stop setting wakeup_capable")

If so, that could explain this.  Otherwise something in userspace must have changed the policy. I double checked upstream 6.1.7 didn't pick it up.

Comment 12 Nathan Smythe 2023-01-19 04:12:51 UTC
(In reply to Mario Limonciello from comment #11)
> It looks like GPIO #9 which is for your touchpad.. this might actually be
> very similar to the issue that was raised VERY recently in kernel 6.2:
> https://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git/commit/
> ?h=gpio/for-current&id=4cb786180dfb5258ff3111181b5e4ecb1d4a297b
> 
> Nathan:
> Can you please check /sys/bus/i2c/devices/i2c-ELAN0415\:00/power/wakeup?  Is
> this enabled or disabled for you by default?
> 
> If it's enabled, can you please check if "echo disabled >
> /sys/bus/i2c/devices/i2c-ELAN0415\:00/power/wakeup" fixes it?
> 

It was a slightly different path (i2c-FTCS1000), but it was enabled by default. Setting it to disabled fixes it for both the 6.1.7 kernel and the Fedora 6.1.6 kernel. Setting it back to enabled causes the immediate wakeup behavior to resume.

Comment 13 Nathan Smythe 2023-01-19 04:19:59 UTC
(In reply to Mario Limonciello from comment #11)
> It looks like GPIO #9 which is for your touchpad.. this might actually be
> very similar to the issue that was raised VERY recently in kernel 6.2:
> https://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git/commit/
> ?h=gpio/for-current&id=4cb786180dfb5258ff3111181b5e4ecb1d4a297b

This looks like a likely candidate. The System76 laptop has a Clevo part number of NL50NU.

Comment 14 Mario Limonciello 2023-01-19 04:23:29 UTC
> It was a slightly different path (i2c-FTCS1000), but it was enabled by default. Setting it to disabled fixes it for both the 6.1.7 kernel and the Fedora 6.1.6 kernel. Setting it back to enabled causes the immediate wakeup behavior to resume.

I don't understand why the policy is default to enabled on 6.1.6+ for you.
It's supposed to be disabled in 6.1.y.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/hid/i2c-hid/i2c-hid-acpi.c?h=linux-6.1.y#n110

Could something in your userspace be changing it?

> This looks like a likely candidate. The System76 laptop has a Clevo part number of NL50NU.

Oh, really?  Then yeah they probably made the same mistake if it's the same H/W and BIOS design.  Is there any better way to identify it's the same family?  Can I see dmidecode output?

Comment 15 Nathan Smythe 2023-01-19 04:35:44 UTC
Created attachment 1939078 [details]
dmidecode output

Comment 16 Nathan Smythe 2023-01-19 04:41:06 UTC
(In reply to Mario Limonciello from comment #14)

> I don't understand why the policy is default to enabled on 6.1.6+ for you.
> It's supposed to be disabled in 6.1.y.
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/
> drivers/hid/i2c-hid/i2c-hid-acpi.c?h=linux-6.1.y#n110
> 
> Could something in your userspace be changing it?

I don't know of anything. They only way I can think of to test would be to try a fresh install.
 
> > This looks like a likely candidate. The System76 laptop has a Clevo part number of NL50NU.
> 
> Oh, really?  Then yeah they probably made the same mistake if it's the same
> H/W and BIOS design.  Is there any better way to identify it's the same
> family?  Can I see dmidecode output?

I've attached the output, but it looks like everything is rebranded with System76. The only thing I see is the sticker on the back saying:
"Clevo CO Code: NL50NU"

Comment 17 Nathan Smythe 2023-01-19 04:58:23 UTC
It looks to me from what I can find from old product listings that the NL50RU is Ryzen 4500U/4700U and the NL50NU is Ryzen 5500U/5700U. No specs on the BIOS/motherboard, but it doesn't seem unreasonable that they would be at least very similar.

Comment 18 Mario Limonciello 2023-01-19 05:03:12 UTC
Oh... I get why you're reproducing it even in 6.1.y.  It's because your machine is in S3 mode, not S2idle mode and that code doesn't run in S3 mode.

OK, so first off here's a solution via a quirk. There are 3 patches in this branch, 2 of them on their way to 6.2-rc for the other Clevo system, and the third is unique for your system.
https://gitlab.freedesktop.org/superm1/linux/-/tree/mlimonci/rhbz-2162013
Please confirm this works for you.

Before I upstream this, I would really like if you can bisect between 6.0.18 and 6.1.5 to find the original cause for your system to see if the quirk is really the best solution or not.

Comment 19 Mario Limonciello 2023-01-19 05:16:40 UTC
Actually I have a pretty good guess at the cause.

b38f2d5d9615 ("i2c: acpi: Use ACPI wake capability bit to set wake_irq")

That landed in 6.1-rc1.  Instead of bisecting, if you revert it does the behavior go away?  If so, then I think my quirk proposal is the right way to do it, and will look forward to the results with it.

Comment 20 Nathan Smythe 2023-01-19 05:23:23 UTC
(In reply to Mario Limonciello from comment #18)

> OK, so first off here's a solution via a quirk. There are 3 patches in this
> branch, 2 of them on their way to 6.2-rc for the other Clevo system, and the
> third is unique for your system.
> https://gitlab.freedesktop.org/superm1/linux/-/tree/mlimonci/rhbz-2162013
> Please confirm this works for you.

Thanks for being so fast with this, but I'm going to have to do some reading on how to manually compile and install the kernel. I probably won't be able to get it done until this weekend.

> Before I upstream this, I would really like if you can bisect between 6.0.18
> and 6.1.5 to find the original cause for your system to see if the quirk is
> really the best solution or not.

I'm not really sure what this means. If you mean to try and find the code change(s) between the two that causes the behavior, I'll give it a try but it may be beyond my skills.


>Actually I have a pretty good guess at the cause.

>b38f2d5d9615 ("i2c: acpi: Use ACPI wake capability bit to set wake_irq")

>That landed in 6.1-rc1.  Instead of bisecting, if you revert it does the behavior go away?  If so, then I think my quirk proposal is the right way to do it, and will look >forward to the results with it.

I'm more confident I can figure this out, but I've still got some learning to do...

Comment 21 Mario Limonciello 2023-01-19 05:33:45 UTC
@Hans,

Can you maybe make Nathan two binary Fedora kernels to test this stuff so he doesn't have to learn how to compile, revert and bisect?

1) 6.1.7 + revert of b38f2d5d9615
2) 6.1.7 + 3 patches posted to https://gitlab.freedesktop.org/superm1/linux/-/tree/mlimonci/rhbz-2162013

> Thanks for being so fast with this, but I'm going to have to do some reading on how to manually compile and install the kernel. I probably won't be able to get it done until this weekend.

OK.

> I'm not really sure what this means. If you mean to try and find the code change(s) between the two that causes the behavior, I'll give it a try but it may be beyond my skills.

I think if the revert helps it, no need to bisect.  It confirms the root cause then.  Only would need to bisect if the revert doesn't do anything.

For your learning in the future, here is how bisecting is done: https://docs.kernel.org/admin-guide/bug-bisect.html

> I'm more confident I can figure this out, but I've still got some learning to do...

OK.  At a high level:

1) # git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/
2) # git checkout linux-6.1.y
3) # git revert b38f2d5d9615
4) Copy your kernel config from the distro kernel into .config
5) Build the kernel

If Hans can't compile the kernels for you, I'll look forward to your results of these two tests next week.

Comment 22 Nathan Smythe 2023-01-19 15:04:34 UTC
(In reply to Mario Limonciello from comment #21)

> 1) 6.1.7 + revert of b38f2d5d9615
> 2) 6.1.7 + 3 patches posted to
> https://gitlab.freedesktop.org/superm1/linux/-/tree/mlimonci/rhbz-2162013
> 

I was able to get partway through the tests:

1) 6.1.7 build without revert fails (expected), I still need to build with revert tonight.
2) Suspend works correctly with this. The version string was 6.2.0-rc4, but the branch was rhbz-2162013 so hopefully that was correct.

Comment 23 Hans de Goede 2023-01-19 15:48:01 UTC
> @Hans Can you maybe make Nathan two binary Fedora kernels to test this stuff so he doesn't have to learn how to compile, revert and bisect?

I see that Nathan has already figured out how to build his own kernels. So I guess there no longer is a need for this.

Nathan if you do need me to build a set of kernel rpms for you after all, let me know.

Comment 24 Mario Limonciello 2023-01-19 16:08:00 UTC
> 1) 6.1.7 build without revert fails (expected), I still need to build with revert tonight.

OK.

> 2) Suspend works correctly with this. The version string was 6.2.0-rc4, but the branch was rhbz-2162013 so hopefully that was correct.

Yeah that's correct.

Comment 25 Mario Limonciello 2023-01-20 00:54:59 UTC
Can you please have a try with this branch to see if it also solves the problem for you?
https://gitlab.freedesktop.org/superm1/linux/-/tree/mlimonci/rhbz-2162013-gitlab-2357-v2

Comment 26 Nathan Smythe 2023-01-20 05:44:50 UTC
(In reply to Hans de Goede from comment #23)
> > @Hans Can you maybe make Nathan two binary Fedora kernels to test this stuff so he doesn't have to learn how to compile, revert and bisect?
> 
> I see that Nathan has already figured out how to build his own kernels. So I
> guess there no longer is a need for this.
> 
> Nathan if you do need me to build a set of kernel rpms for you after all,
> let me know.

It looks like I've got it working, but thank you for the offer.

Comment 27 Nathan Smythe 2023-01-20 05:48:13 UTC
Okay, here is a full summary of the builds so far:

1) 6.1.7 build                                                                                 - suspend fails
2) 6.1.7 + revert of b38f2d5d9615 build                                                        - suspend works
3) https://gitlab.freedesktop.org/superm1/linux/-/tree/mlimonci/rhbz-2162013                   - suspend works
4) https://gitlab.freedesktop.org/superm1/linux/-/tree/mlimonci/rhbz-2162013-gitlab-2357-v2    - suspend works

Comment 28 Mario Limonciello 2023-01-20 05:52:29 UTC
Thanks!  Currently leaning upon #4 pending some testing results for the other two machines that regressed as well and some further discussion.if it's the right solution.

Comment 29 Mario Limonciello 2023-01-20 21:04:28 UTC
The discussion has pointed towards a different (more general) solution.  Can you see if this also works for you?
https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci/rhbz-2162013-gitlab-2357-v4/

Comment 30 Nathan Smythe 2023-01-21 01:42:02 UTC
(In reply to Mario Limonciello from comment #29)
> The discussion has pointed towards a different (more general) solution.  Can
> you see if this also works for you?
> https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci/rhbz-2162013-
> gitlab-2357-v4/

Yes, suspend works with this version.

Comment 31 Aoife Moloney 2023-11-23 01:00:13 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 32 Aoife Moloney 2023-12-07 14:56:57 UTC
Fedora Linux 37 entered end-of-life (EOL) status on 2023-12-05.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.