Bug 2266265 - Thinkpad P16v sometimes resumes with CPU frequency locked to 400-500 MHz
Summary: Thinkpad P16v sometimes resumes with CPU frequency locked to 400-500 MHz
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 39
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2184978
TreeView+ depends on / blocked
 
Reported: 2024-02-27 11:36 UTC by Kamil Páral
Modified: 2024-09-24 02:03 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
kernel log after resume with stuck low frequencies (162.45 KB, text/plain)
2024-02-27 11:36 UTC, Kamil Páral
no flags Details
lscpu.txt (3.16 KB, text/plain)
2024-02-27 11:40 UTC, Kamil Páral
no flags Details
lspci.txt (4.09 KB, text/plain)
2024-02-27 11:40 UTC, Kamil Páral
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2301921 0 unspecified CLOSED ACPI events wake up Thinkpad laptops when they shouldn't (regression in kernel 6.10 in Qualcomm wifi driver) 2024-09-18 11:17:22 UTC

Description Kamil Páral 2024-02-27 11:36:49 UTC
Created attachment 2019129 [details]
kernel log after resume with stuck low frequencies

1. Please describe the problem:

My Thinkpad P16v gen1 laptop sometimes resumes with CPU frequency locked to 400 Mhz - 544 MHz range. Even when I put load on CPU, the frequency doesn't go above 544 MHz (while in normal case, I can easily see 4,5 GHz), and stay at 400 MHz during idle. The system is then slow and unresponsive, it's easy to spot. I have to reboot to "fix" the issue.

This bug has already been seen by two different people  on two different laptops (both Thinkpad P16v), so it's not some hardware issue with my exact device. It's probably common to (at least) all Thinkpad P16v laptops.

When the system is stuck to low frequencies, I see this output from cpupower:

$ sudo cpupower frequency-info
analyzing CPU 14:
  driver: amd-pstate-epp
  CPUs which run at the same hardware frequency: 14
  CPUs which need to have their frequency coordinated by software: 14
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 5.76 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 5.76 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 544 MHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 220. Maximum Frequency: 5.76 GHz.
    AMD PSTATE Nominal Performance: 145. Nominal Frequency: 3.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 42. Lowest Non-linear Frequency: 1.10 GHz.
    AMD PSTATE Lowest Performance: 16. Lowest Frequency: 400 MHz.



2. What is the Version-Release number of the kernel:

kernel-6.7.6-200.fc39.x86_64


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

This is a new laptop, I haven't used older kernels with it.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Unfortunately it's random. Suspend the laptop and resume it. In most cases, it works as expected, but sometimes, this bug occurs and only low CPU frequencies are available.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

I can test if needed, but it might take a long time before I'm able to say whether it's affected or not.


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Kernel log is attached. The resume happened at 07:06:31 and then another one at 10:27:51. In both cases, the CPU freq was locked to 400-500MHz.

Comment 1 Kamil Páral 2024-02-27 11:40:43 UTC
Created attachment 2019130 [details]
lscpu.txt

Comment 2 Kamil Páral 2024-02-27 11:40:46 UTC
Created attachment 2019131 [details]
lspci.txt

Comment 3 Kamil Páral 2024-02-27 11:44:48 UTC
@mpearson Hey Mark, this is something that you might be interested at, perhaps. Thanks.

Comment 4 Mark Pearson 2024-02-27 14:06:54 UTC
Ack - we're already looking at this one. Internal ticket is LO-2468

FW team are having trouble reproducing it with the images we certified the platform with so are pointing at the kernel - which doesn't yet make sense to me. Normally these issues are FW related. We're doing ongoing debug to try and narrow down the issue.
Thanks for the report and details - it's useful to have some other logs to review

Mark

Comment 5 Kamil Páral 2024-02-28 08:40:37 UTC
Thanks, Mark. If you want me to provide any further debugging logs, just tell me how. It just happened to me again this morning (that's a second time in 6 days of usage of this laptop), so it seems there's a decent chance to trigger it every few days.

Comment 6 Mark Pearson 2024-03-04 13:55:18 UTC
As a note, I reproduced this on my system. Easy repro is to suspend, unplug from power, and resume. 
Using this I narrowed it down to breaking between 6.4 and 6.5-rc1. Did a bisect and it looks like this commit is causing the issue:
https://github.com/torvalds/linux/commit/b5539eb5ee70257520e40bb636a295217c329a50

I'm working with AMD on determining best next steps - but for now this is looking like a kernel regression issue.

Mark

Comment 7 Kamil Páral 2024-07-30 12:40:59 UTC
I've fully updated my ThinkPad P16v [1] and used kernel-6.10.1-200.fc40, and the issue is now different. I can no longer use "suspend, unplug from power, and resume" reproducer, because any time I connect or disconnect AC power during suspend, the laptop immediately resumes. Which is quite annoying (breaks the "close laptop, unplug, put it into your bag" workflow), but also precludes testing any fix for this. The issue might or might not still be there, but with the current behavior, I can't tell.

[1] System Firmware 0.1.52

Comment 8 Kamil Páral 2024-07-31 08:25:17 UTC
(In reply to Kamil Páral from comment #7)
> I can no longer use "suspend, unplug from power,
> and resume" reproducer, because any time I connect or disconnect AC power
> during suspend, the laptop immediately resumes.

I've reported this problem separately as bug 2301921.

Comment 9 Kamil Páral 2024-09-24 02:03:18 UTC
Now that bug 2301921 was resolved, I was able to re-test this. Unfortunately this is still an issue, exactly the same symptoms, exactly the same reproducer (see comment 6). Tested on Fedora 41 with kernel-6.11.0-63.fc41.


Note You need to log in before you can comment on or make changes to this bug.