Bug 1578072 - New F28 SATA AHCI LPM MOBILE POLICY causes suspend to fail on Lenovo ThinkPad W541
Summary: New F28 SATA AHCI LPM MOBILE POLICY causes suspend to fail on Lenovo ThinkPad...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 28
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-14 18:25 UTC by Mark Thacker
Modified: 2019-05-28 21:52 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-28 21:52:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
modified grub.conf that turns of AHCI and enables proper suspend (7.45 KB, text/plain)
2018-05-14 18:25 UTC, Mark Thacker
no flags Details
dmesg output from grdryn (75.22 KB, text/plain)
2018-05-30 10:08 UTC, Gerard Ryan
no flags Details
dmesg output after suggested ahci.mobile_lpm_policy=0 (75.37 KB, text/plain)
2018-05-30 10:29 UTC, Gerard Ryan
no flags Details

Description Mark Thacker 2018-05-14 18:25:46 UTC
Created attachment 1436444 [details]
modified grub.conf that turns of AHCI and enables proper suspend

Description of problem:
Fedora 28, with kernel 4.16.7-300.fc28.x86_64 will fail to properly suspend when the power button is pressed UNLESS ahci.mobile_lpm_policy=0 is added to the kernel boot command line. Pressing the power button begins the suspend sequence - LCD panel turns off, keyboard backlight turns off, but system does not suspend and no control is available. Hard power-off required to regain system.

Tested also with test kernel 4.17.0-0.rc4.git0.1.fc29.x86_64 with same result.

System is running nouveau video drivers on xorg.

Version-Release number of selected component (if applicable):
Fedora 28 with kernel 4.16.7-300.fc28.x86_64
Fedora 28 with test kernel 4.17.0-0.rc4.git0.1.fc29.x86_64

How reproducible:
100% reproducible. If the ahci is not disabled, the system will fail to suspend properly.

Steps to Reproduce:
1. Install Fedora 28 on Lenovo W541
2. Press power button to initiate suspend
3. System will hang during suspend process

Actual results:
Suspend starts, LCD panel suspends, keyboard backlight suspends, but then system hangs. The power light is still on as is the keyboard speaker/microphone light. System becomes unresponsive, fan is running and the only solution is to hold down power button until a hard power stop occurs.

Expected results:
As with previous Fedora 27 and RHEL releases, pressing the power button, or closing the lid, should result in a suspend of the system.

Additional info:
Adding the following line to the kernel boot command line seems to be a good workaround as it turns off the new AHCI power control for SATA devices : 
  ahci.mobile_lpm_policy=0
With this line in place, normal suspend / resume works fine in Fedora 28 with either kernel mentioned in this bz.

Comment 1 Hans de Goede 2018-05-23 08:21:18 UTC
What is the output of:

cat /sys/class/dmi/id/bios_version 

And please shortly after a fresh boot run:

dmesg > dmesg.log

And attach the generated dmesg.log file here.

Comment 2 Gerard Ryan 2018-05-30 10:07:16 UTC
I experience the same symptoms, but I'm not sure if I've got the same cause or not, since Mark's suggested fix didn't seem to help in my case (I'll try a couple more times, in case I had a typo. If it does start to help, I'll reply here again).

Also, I upgraded from Fedora 27, rather than installing Fedora 28 from scratch. My current kernel version is kernel-4.16.12-300.fc28.x86_64, but I've seen it on some of the other kernel versions that have come in as upgrades to Fedora 28 in the past 2-ish weeks.

I _do_ seem to be able to successfully suspend if I go to a virtual terminal with ctrl+alt+f2 before logging into GNOME, and running `systemctl suspend`. However, the suspend button on the GNOME desktop once logged-in, results in the crash as Mark describes it.

(In reply to Hans de Goede from comment #1)
> What is the output of:
> 
> cat /sys/class/dmi/id/bios_version

$ cat /sys/class/dmi/id/bios_version
GNET85WW (2.33 )

> And please shortly after a fresh boot run:
> 
> dmesg > dmesg.log
> 
> And attach the generated dmesg.log file here.

I'll attach my dmesg output as dmesg_grdryn.log. One thing you'll see in there, are 3 new "ACPI Error" messages that I see before I get prompted for my disk encryption password. I'm not sure if they're at all related since I don't understand them in the slightest, but if I remember correctly, they didn't appear on this machine for older Fedora versions:

[    0.033178] ACPI Error: Needed type [Reference], found [Integer] 000000007dccfc0f (20180105/exresop-103)
[    0.033249] ACPI Error: AE_AML_OPERAND_TYPE, While resolving operands for [OpcodeName unavailable] (20180105/dswexec-461)
[    0.033314] ACPI Error: Method parse/execution failed \_PR.CPU0._PDC, AE_AML_OPERAND_TYPE (20180105/psparse-550)

Comment 3 Gerard Ryan 2018-05-30 10:08:14 UTC
Created attachment 1445750 [details]
dmesg output from grdryn

Comment 4 Gerard Ryan 2018-05-30 10:29:37 UTC
Created attachment 1445769 [details]
dmesg output after suggested ahci.mobile_lpm_policy=0

I've just tried the suggested fix again (see new attachment, please tell me if I'm doing it wrong!) with mixed results:

I logged in to the desktop, logged the new dmesg as seen in the new attachment, then successfully suspended, so some success there! However, I then resumed, waited a few seconds, tried suspending again, and hit the same crash.

I've had this machine for a couple of years now (Thinkpad W541), and suspend has always worked on it in older Fedora releases. I also know a few others who have the same machine who will be upgrading soon (if they haven't already), so I'll try to be as responsive as possible here if there are other things you'd like me to try! :)

Comment 5 Hans de Goede 2018-05-30 10:35:23 UTC
To check if the passing of ahci.mobile_lpm_policy=0 is working correctly do:

cat /sys/bus/scsi/devices/host?/scsi_host/host?/link_power_management_policy

This should show:

max_performance
max_performance
max_performance

It may show max_performance more or less then 3 times, that is fine, but it should only show max_performance. If it only shows max_performance then you have correctly passed ahci.mobile_lpm_policy=0.

Comment 6 Hans de Goede 2018-05-30 10:36:14 UTC
Also please try blacklisting the nouveau kernel module. I would not be surprised if this is a nouveau bug being exposed do to power-management changes.

Comment 7 Gerard Ryan 2018-05-30 11:27:42 UTC
(In reply to Hans de Goede from comment #5)
> To check if the passing of ahci.mobile_lpm_policy=0 is working correctly do:
> 
> cat /sys/bus/scsi/devices/host?/scsi_host/host?/link_power_management_policy
> 
> This should show:
> 
> max_performance
> max_performance
> max_performance
> 
> It may show max_performance more or less then 3 times, that is fine, but it
> should only show max_performance. If it only shows max_performance then you
> have correctly passed ahci.mobile_lpm_policy=0.

My kernel boot command line is now the following:

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.16.12-300.fc28.x86_64 root=/dev/mapper/luks-0c1a16dd-902c-4b15-8a72-bc31777b689b ro rd.luks.uuid=luks-0c1a16dd-902c-4b15-8a72-bc31777b689b rd.md.uuid=fe10bb26:f2502109:376a93e6:3a4ef20a rd.luks.uuid=luks-c7eaa5fb-05a1-4b53-aa31-1c43be645ebd rd.lvm.lv=vg1/swap rhgb quiet LANG=en_US.UTF-8 ahci.mobile_lcm_policy=0 modprobe.blacklist=nouveau

but the output of that command doesn't show max_performance:

[grdryn@w541 ~]$ cat /sys/bus/scsi/devices/host?/scsi_host/host?/link_power_management_policy
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm
med_power_with_dipm

(In reply to Hans de Goede from comment #6)
> Also please try blacklisting the nouveau kernel module. I would not be
> surprised if this is a nouveau bug being exposed do to power-management
> changes.

This seems to have fixed the issue for me -- in the above command line, I've got Mark's suggested `ahci.mobile_lpm_policy=0`, which from your info doesn't seem to have done what it should have; and also `modprobe.blacklist=nouveau` which has done what it should (`lsmod | grep nouveau` doesn't return any output). I can now suspend reliably again. Thanks a lot Hans!

Comment 8 Gerard Ryan 2018-05-30 11:29:02 UTC
I've just realized that I _did_ have a typo...

ahci.mobile_lpm_policy=0
ahci.mobile_lcm_policy=0

Comment 9 Gerard Ryan 2018-05-31 09:13:24 UTC
My fan seems to be running harder since I blacklisted nouveau. If any nouveau people would like me to try any fixes to whatever bug this might be, I'm happy to spend some time at it! :)

Comment 10 Hans de Goede 2018-05-31 09:50:57 UTC
Changing component to nouveau (the xorg-x11-drv-nouveau component is also used to track nouveau kernel driver bugs).

Comment 11 David Mitchell 2018-05-31 14:12:28 UTC
I am seeing some other issues with Wayland (see Bug #1579859 https://bugzilla.redhat.com/show_bug.cgi?id=1579859 ) on the Lenovo W541. I haven't tested it since I set the ahci.mobile_lpm_policy and the problem went away, but its possible the bug only presents itself when running Wayland (which is default) with the nouveau driver as opposed to X-Windows.

If its valuable I could try removing the ahci.mobile_lpm_policy kernel option, and using XWindows to see if the same hard freeze issue presents itself, or if this is only restricted to Wayland?

Comment 12 Hans de Goede 2018-05-31 15:38:48 UTC
(In reply to David Mitchell from comment #11)
> If its valuable I could try removing the ahci.mobile_lpm_policy kernel
> option, and using XWindows to see if the same hard freeze issue presents
> itself, or if this is only restricted to Wayland?

That would be somewhat valuable chances are the problem is not Wayland or Xorg specific, but still if you want to try it certainly cannot hurt.

Comment 13 Hans de Goede 2018-06-24 12:24:38 UTC
Note that Mark Thacker reports in bug 1571330 that installing the latest BIOS update from Lenovo fixes the SATA LPM triggered problems he was seeing.

Comment 14 Mark Thacker 2018-07-10 15:26:54 UTC
(In reply to Hans de Goede from comment #13)
> Note that Mark Thacker reports in bug 1571330 that installing the latest
> BIOS update from Lenovo fixes the SATA LPM triggered problems he was seeing.

Indeed, it generally did fix the issues. HOWEVER, I did find an additional issue in that, even with the latest BIOS, the system would not resume if it changed it's dock state from when it was put to sleep.

Thus, I have had to disable power management again to be able to handle my daily use case for transitioning from dock to undocked while asleep (and vice versa).

The dmidecode output has been uploaded as a private attachment to this BZ.

Comment 15 Hans de Goede 2018-07-11 08:17:12 UTC
(In reply to Mark Thacker from comment #14)
> (In reply to Hans de Goede from comment #13)
> > Note that Mark Thacker reports in bug 1571330 that installing the latest
> > BIOS update from Lenovo fixes the SATA LPM triggered problems he was seeing.
> 
> Indeed, it generally did fix the issues. HOWEVER, I did find an additional
> issue in that, even with the latest BIOS, the system would not resume if it
> changed it's dock state from when it was put to sleep.
> 
> Thus, I have had to disable power management again to be able to handle my
> daily use case for transitioning from dock to undocked while asleep (and
> vice versa).

Bummer :|  I will contact you privately for escalating this to Lenovo.

Comment 16 Gerard Ryan 2018-07-11 10:29:31 UTC
I think I've had the same experience as Mark above after upgrading the bios firmware, except I'm using the workaround of blacklisting nouveau instead of the disabling power management.

This is probably not related, but around the same time as I did the firmware update, the built-in webcam has a strong green tint (I guess this could be some hardware failure, or related to a package upgrade from Fedora).

Comment 17 Mark Thacker 2018-07-16 13:25:59 UTC
Update to all involved:

I re-installed Fedora 28 Workstation on my Lenovo W541 today, this time with UEFI Boot only enabled to see if this was an issue with UEFI verses BIOS.

Result : No change. Suspend/resume doesn't function properly if the laptop changes between docked/undocked or undocked/docked while suspended.

Workaround : Still disabling Power Management in the grub.cfg file with "ahci.mobile_lpm_policy=0" at the end of the vmlinuz command line.

For the record :
Fedora 28 Workstation
Kernel : 4.17.5-200.fc28.x86_64
Xorg Server
Gnome
BIOS : GNET87WW (2.35 )
Graphics Drivers : built-in nouveau

Comment 18 Ben Cotton 2019-05-02 21:18:36 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Ben Cotton 2019-05-28 21:52:00 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.