Bug 2050036 - Framework laptop: 5.16.5 breaks s2idle sleep
Summary: Framework laptop: 5.16.5 breaks s2idle sleep
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-03 05:01 UTC by Dimitris
Modified: 2022-03-10 09:35 UTC (History)
22 users (show)

Fixed In Version: kernel-5.16.11-200.fc35 kernel-5.16.11-100.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-27 03:21:43 UTC
Type: Bug


Attachments (Terms of Use)
kernel log (186.78 KB, text/plain)
2022-02-03 05:01 UTC, Dimitris
no flags Details
updated kernel log (104.34 KB, text/plain)
2022-02-03 05:09 UTC, Dimitris
no flags Details

Description Dimitris 2022-02-03 05:01:22 UTC
Created attachment 1858809 [details]
kernel log

1. Please describe the problem:

Laptop no longer able to enter s2idle

2. What is the Version-Release number of the kernel:

5.16.5-200.fc35.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Has worked without issues up to 5.15.18-200.fc35

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Press power button to put the machine to modern standby.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Haven't tested yet

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Dimitris 2022-02-03 05:09:41 UTC
Created attachment 1858810 [details]
updated kernel log

Apologies, the previous kernel log was from a boot with mem_sleep_default=deep.  This is one without it.

Comment 2 Justin M. Forbes 2022-02-04 22:41:10 UTC
The logs are less useful. Some questions that might help narrow this down.  Does it work with 5.16.0/1/2/3/4? If it came in with a 5.16 stable update, it will be easier to narrow down than if it came in at some point during the 5.16 development cycle.  Either way, it would be handy to know exactly which version brought in the issue.  Also, does it work with 5.17-rc from rawhide? That will tell us if this is something that is already fixed upstream, and we just need to find out which patch to backport.

If it does not work with 5.16.0 or any of the 5.16 series, it might be worth trying the 5.16-rc kernels to see which one first exhibits the issue. All of them should be available from koji: https://koji.fedoraproject.org/koji/packageinfo?packageID=8

Comment 3 Dimitris 2022-02-05 19:51:29 UTC
WIP update: Tried 5.16.{1,2,3,4} from koji, all have the same behavior as 5.16.5.  In case this is a hint, the only consistent difference I've noticed so far in `dmesg` between 5.15.18 succeeding and the 5.16 series failing to go into s2idle is:

$ diff <(cat dmesg-s2idle-5.15.18 |awk -F '] ' '{print $2}') <(cat dmesg-s2idle-5.16.3 |awk -F '] ' '{print $2}')
6,7c6,7
< Filesystems sync: 0.015 seconds
< Freezing user space processes ... (elapsed 0.003 seconds) done.
---
> Filesystems sync: 0.014 seconds
> Freezing user space processes ... (elapsed 0.002 seconds) done.
9c9
< Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
---
> Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
11c11
< PM: suspend devices took 0.472 seconds
---
> PM: suspend devices took 0.360 seconds
14a15,17
> iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 1, ret=-1
> iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 2, ret=-1
> iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 3, ret=-1
16c19
< PM: resume devices took 0.292 seconds
---
> PM: resume devices took 0.293 seconds
19a23
> mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])


iwlwifi in this case is driving a:

aa:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)

Bus 003 Device 004: ID 8087:0032 Intel Corp. AX210 Bluetooth


Re: the 5.17-rc rawhide series, I assume those are the ones with .fc36 in their name in koji, correct?

Comment 4 Dimitris 2022-02-05 20:14:22 UTC
Tried 5.17.0-0.rc2.20220204gitdcb85f85fa6f.86.fc36, same problem persists.

Comment 5 Dimitris 2022-02-05 22:29:50 UTC
Also tried 5.16.0-60.fc36 and 5.16.0-0.rc2.18.fc36, same results.  So it does look like something introduced in the 5.16 cycle.

Comment 6 Marko Bevc 2022-02-07 13:32:27 UTC
Seems something similar on T490s, when trying to wake up from sleep:
```
[  151.203713] Bluetooth: hci0: Firmware revision 0.1 build 60 week 44 2021
[  151.262744] Bluetooth: hci0: Failed to read codec capabilities (-56)
[  151.263740] Bluetooth: hci0: Failed to read codec capabilities (-56)
[  151.264744] Bluetooth: hci0: Failed to read codec capabilities (-56)
[  151.265730] Bluetooth: hci0: Failed to read codec capabilities (-56)
[  151.266729] Bluetooth: hci0: Failed to read codec capabilities (-56)
[  151.267727] Bluetooth: hci0: Failed to read codec capabilities (-56)
[  153.460602] wlp0s20f3: authenticated
[  153.472778] wlp0s20f3: associated
[  153.507141] IPv6: ADDRCONF(NETDEV_CHANGE): wlp0s20f3: link becomes ready
[  176.128164] usb 1-10: USB disconnect, device number 18
[  176.128362] Bluetooth: hci0: HCI reset during shutdown failed
[  348.961987] PM: suspend entry (deep)
[  348.980776] Filesystems sync: 0.018 seconds
[  349.038552] Freezing user space processes ... 
[  369.045787] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
[  369.045847] task:alsactl         state:D stack:    0 pid: 1534 ppid:     1 flags:0x00000004
[  369.045858] Call Trace:
[  369.045862]  <TASK>
[  369.045870]  __schedule+0x2d6/0x10b0
[  369.045887]  ? get_nohz_timer_target+0x18/0x1a0
[  369.045901]  ? lock_timer_base+0x61/0x80
[  369.045909]  ? _raw_spin_unlock_irqrestore+0x25/0x40
[  369.045915]  ? __mod_timer+0x1f4/0x390
[  369.045924]  schedule+0x4e/0xc0
[  369.045932]  schedule_timeout+0x7a/0x130
[  369.045943]  ? __bpf_trace_tick_stop+0x10/0x10
[  369.045953]  snd_power_ref_and_wait+0x91/0xf0 [snd]
[  369.045976]  ? wake_up_q+0x90/0x90
[  369.045983]  snd_ctl_elem_info+0x48/0x190 [snd]
[  369.046004]  snd_ctl_elem_info_user+0x45/0xa0 [snd]
[  369.046029]  snd_ctl_ioctl+0x1bd/0x7e0 [snd]
[  369.046049]  ? security_file_ioctl+0x2f/0x50
[  369.046057]  __x64_sys_ioctl+0x7f/0xb0
[  369.046066]  do_syscall_64+0x38/0x90
[  369.046075]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  369.046082] RIP: 0033:0x7f48a059d31b
[  369.046090] RSP: 002b:00007ffc38a44ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  369.046098] RAX: ffffffffffffffda RBX: 00007ffc38a45020 RCX: 00007f48a059d31b
[  369.046102] RDX: 00007ffc38a44f00 RSI: 00000000c1105511 RDI: 0000000000000009
[  369.046106] RBP: 00007ffc38a457d0 R08: 0000000000000000 R09: 0000000000000000
[  369.046110] R10: 000056089bd74fb0 R11: 0000000000000246 R12: 00007ffc38a457e0
[  369.046114] R13: 000056089bd58290 R14: 00007ffc38a44f00 R15: 000056089bd60780
[  369.046123]  </TASK>

[  369.046387] OOM killer enabled.
[  369.046388] Restarting tasks ... done.
[  369.098407] PM: suspend exit
[  369.098488] PM: suspend entry (s2idle)
[  369.143772] Filesystems sync: 0.045 seconds
[  369.143932] Freezing user space processes ... 
[  389.151417] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
[  389.151475] task:alsactl         state:D stack:    0 pid: 1534 ppid:     1 flags:0x00004004
[  389.151487] Call Trace:
[  389.151491]  <TASK>
[  389.151499]  __schedule+0x2d6/0x10b0
[  389.151517]  ? get_nohz_timer_target+0x18/0x1a0
[  389.151530]  ? lock_timer_base+0x61/0x80
[  389.151538]  ? _raw_spin_unlock_irqrestore+0x25/0x40
[  389.151544]  ? __mod_timer+0x1f4/0x390
[  389.151552]  schedule+0x4e/0xc0
[  389.151560]  schedule_timeout+0x7a/0x130
[  389.151571]  ? __bpf_trace_tick_stop+0x10/0x10
[  389.151582]  snd_power_ref_and_wait+0x91/0xf0 [snd]
[  389.151605]  ? wake_up_q+0x90/0x90
[  389.151611]  snd_ctl_elem_info+0x48/0x190 [snd]
[  389.151632]  snd_ctl_elem_info_user+0x45/0xa0 [snd]
[  389.151657]  snd_ctl_ioctl+0x1bd/0x7e0 [snd]
[  389.151677]  ? security_file_ioctl+0x2f/0x50
[  389.151685]  __x64_sys_ioctl+0x7f/0xb0
[  389.151694]  do_syscall_64+0x38/0x90
[  389.151703]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  389.151711] RIP: 0033:0x7f48a059d31b
[  389.151719] RSP: 002b:00007ffc38a44ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  389.151727] RAX: ffffffffffffffda RBX: 00007ffc38a45020 RCX: 00007f48a059d31b
[  389.151731] RDX: 00007ffc38a44f00 RSI: 00000000c1105511 RDI: 0000000000000009
[  389.151735] RBP: 00007ffc38a457d0 R08: 0000000000000000 R09: 0000000000000000
[  389.151739] R10: 000056089bd74fb0 R11: 0000000000000246 R12: 00007ffc38a457e0
[  389.151743] R13: 000056089bd58290 R14: 00007ffc38a44f00 R15: 000056089bd60780
[  389.151752]  </TASK>

[  389.152002] OOM killer enabled.
[  389.152004] Restarting tasks ... done.
[  389.197963] PM: suspend exit
[  389.279394] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[  390.862956] wlp0s20f3: authenticated
[  390.869287] wlp0s20f3: RX AssocResp from 4c:5e:0c:39:de:cd (capab=0x431 status=0 aid=2)
[  390.873614] wlp0s20f3: associated
[  390.896853] IPv6: ADDRCONF(NETDEV_CHANGE): wlp0s20f3: link becomes ready
```

Comment 7 Dimitris 2022-02-07 15:38:44 UTC
@marko.bevc@gmail.com did you mean to follow my other bug? https://bugzilla.redhat.com/show_bug.cgi?id=2051038

Comment 8 Marko Bevc 2022-02-07 18:20:48 UTC
It does seem related as well!

Comment 9 Dimitris 2022-02-08 04:27:18 UTC
No change with 5.16.7 from updates-testing.

Comment 10 Dimitris 2022-02-12 07:42:25 UTC
Another framework user bisected this and it seems to be due to an addition of a new _OSI entry: https://community.frame.work/t/fedora-35-kernel-5-16-5-s2idle-and-maybe-wifi-issues/14704/13

Adding "acpi_osi=!Windows 2020" to kernel arguments seems to work around the problem.

What's the practice for this type of issue - in addition to working with the vendor to address it in the firmware?  Does the kernel implement quirks for this type of thing?

Comment 11 Jun Aruga 2022-02-19 10:24:56 UTC
Thank you for working on this issue.
It was reported that the following is the upstream patch [1] to fix this issue at [2].

[1] https://patchwork.kernel.org/project/alsa-devel/patch/20220214125711.20531-1-tiwai@suse.de/
[2] https://community.frame.work/t/fedora-35-kernel-5-16-5-s2idle-and-maybe-wifi-issues/14704/22

Comment 12 Fedora Update System 2022-02-23 21:43:58 UTC
FEDORA-2022-952bb7b856 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2022-952bb7b856

Comment 13 Fedora Update System 2022-02-23 21:44:10 UTC
FEDORA-2022-edbd74424e has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2022-edbd74424e

Comment 14 Fedora Update System 2022-02-24 23:41:04 UTC
FEDORA-2022-952bb7b856 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-952bb7b856`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-952bb7b856

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2022-02-24 23:55:48 UTC
FEDORA-2022-edbd74424e has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-edbd74424e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-edbd74424e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 16 Dimitris 2022-02-25 03:46:52 UTC
Still reproduces here under 5.16.11-200.fc35.x86_64 if I omit the "acpi_osi=!Windows 2020" workaround

Comment 17 Fedora Update System 2022-02-25 16:52:11 UTC
FEDORA-2022-952bb7b856 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 18 Dimitris 2022-02-25 22:03:26 UTC
Sorry, this is not fixed, although the workaround is still effective.

Comment 19 Justin M. Forbes 2022-02-25 22:52:14 UTC
Then the patch from comment 11 is not the proper fix, as that patch was included

Comment 20 Dimitris 2022-02-25 23:38:25 UTC
Correct, that wasn't related to this issue; it's related/should be addressing bug 2051038

Comment 21 Fedora Update System 2022-02-27 03:21:43 UTC
FEDORA-2022-edbd74424e has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 22 Dimitris 2022-02-27 07:54:40 UTC
Apologies, reopening again, I guess there's workflow tied to that patch that auto-closes this?

Comment 23 Hans de Goede 2022-03-07 08:11:51 UTC
(In reply to Dimitris from comment #22)
> Apologies, reopening again, I guess there's workflow tied to that patch that
> auto-closes this?

Yeah, the kernel-update with that patch was tagged as fixing this, since that update is now in the stable repo, there should be no more auto-closures of this bug.

Comment 24 Hans de Goede 2022-03-09 16:30:30 UTC
Note this is a generic comment added to several bugs at once:

There have been quite a few different bug reports now about suspend/resume issues with kernel versions >= 5.16.10, there are some reports that this is caused by the patch titled "ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE".

So to help debugging this further I've done 2 Fedora test kernel builds with that patch reverted:

5.16.13: https://koji.fedoraproject.org/koji/taskinfo?taskID=83901424
5.17-rc7: https://koji.fedoraproject.org/koji/taskinfo?taskID=83901915

Note these are still building atm, this should be finished in a couple of hours. See here for some generic instructions for installing a kernel directly from koji (the Fedora build-system): https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

If you are seeing suspend/resume issues with the recent Fedora kernels, please give both these builds a try and report back how they work for you.

Comment 25 Dimitris 2022-03-10 03:18:45 UTC
Tried both test kernels, no change:  I can only get to pre-5.16.5 behavior by appending "acpi_osi=!Windows 2020" to the kernel command line - see comment 10 too.

Comment 26 Hans de Goede 2022-03-10 09:35:43 UTC
(In reply to Dimitris from comment #25)
> Tried both test kernels, no change:  I can only get to pre-5.16.5 behavior
> by appending "acpi_osi=!Windows 2020" to the kernel command line - see
> comment 10 too.

Thank you for testing. I've access to a Framework laptop myself, I'm a bit busy atm but I will try to reproduce this when I can make some time and then investigate.


Note You need to log in before you can comment on or make changes to this bug.