Bug 1571330 - CONFIG_SATA_MOBILE_LPM_POLICY=3 makes laptop hang when changing screen brightness
Summary: CONFIG_SATA_MOBILE_LPM_POLICY=3 makes laptop hang when changing screen bright...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://fedoraproject.org/wiki/Common...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-24 14:24 UTC by Oyvind Saether
Modified: 2019-12-05 04:14 UTC (History)
45 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-05-09 16:48:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmidecode output for T450s (15.52 KB, text/plain)
2018-06-20 14:45 UTC, Pierre
no flags Details
X250 BIOS 1.31 dmidecode (16.98 KB, text/plain)
2018-06-20 15:28 UTC, Dimitris
no flags Details
DMI Decode L450 (14.67 KB, text/plain)
2018-06-21 12:39 UTC, Bruno Lavoie
no flags Details
dmidecode output for T440s (14.08 KB, text/plain)
2018-07-25 14:01 UTC, lukes
no flags Details
Gigabyte Brix i7-5500 dmidecode (22.64 KB, text/plain)
2019-12-05 04:14 UTC, Erik Sejr
no flags Details

Description Oyvind Saether 2018-04-24 14:24:36 UTC
CONFIG_SATA_MOBILE_LPM_POLICY=3 makes my laptop totally hang freeze when changing screen brightness and also makes it hang when redshift is running. No logs, no ssh, no nothing, it just locks up completely.

A fix is to add ahci.mobile_lpm_policy=1 to the kernel boot command line.

This became a problem when upgrading to Fedora 28 with kernel 4.16.2. After wasting a lot of time I found CONFIG_SATA_MOBILE_LPM_POLICY=3 to be the cause. I can also reproduce it with kernel 4.15.17 left by Fedora 27 by running
for powercontrol in /sys/class/scsi_host/host*/link_power_management_policy ; do
  echo 'min_powermin_power' > $powercontrol
done
and changing the screen brightness with the laptop hotkeys. Looks like CONFIG_SATA_MOBILE_LPM_POLICY=3 has been a problem for a long time (with my laptop, anyway) - it's just not been a default setting.

I find this behavior bizarre and do not understand. It can be reproduced consistently. Why a SATA min_powermin_power link policy would cause a hard freeze when changing the screen brightness is beyond me but that's .. what happens.

Fedora should reconsider having CONFIG_SATA_MOBILE_LPM_POLICY=3 as a default kernel configuration setting. I'm fine with using ahci.mobile_lpm_policy=1 now but figuring out what was wrong did waste too much of my time.

Comment 1 Hans de Goede 2018-04-24 14:42:02 UTC
Hi,

Thank you for reporting this bug.

First of all some background on th Enabling SATA LPM by default saves about 1W to 1.5W of idle power consumption on any modern laptop with a SATA disk. 1W to 1.5W is a lot of power, note this change has not been done lightly, see:
https://hansdegoede.livejournal.com/18412.html

And:
https://fedoraproject.org/wiki/Changes/ImprovedLaptopBatteryLife#How_To_Test

Which has a table with all systems on which this was tested (unfortunately not many people responded to my call for testing).

With that all said lets take a look at fixing this bug for you, first if all can you do:

dmesg | grep UDMA

This should output something like this:

[    0.919752] ata2.00: ATA-8: WDC WD10EACS-00D6B1, 01.01A01, max UDMA/133

Which gives me the model and firmware-version of your disk/SSD, so that I can blacklist LPM for it.

I find the interaction with changing brightness really strange though, so maybe a motherboard / machine specific quirk might be better.

On which brand/model laptop are you seeing this?

And can you please run: "lspci -nn" and copy and paste the output here?

Comment 2 Oyvind Saether 2018-04-25 07:00:54 UTC
This is a Lenovo-G50-80, Intel(R) Core(TM) i7-5500U CPU. 

I do not think blacklisting PM for the SSD is the way to go on this one. Anyway, it's
$ dmesg | grep UDMA
[    0.666908] ata1: SATA max UDMA/133 abar m2048@0xc1218000 port 0xc1218100 irq 45
[    0.977408] ata1.00: ATA-9: Samsung SSD 750 EVO 500GB, MAT01B6Q, max UDMA/133

I seriously doubt it's SSD/HDD specific. I'll even bother verifying this, probably later today, with some acient 2.5" spinning rust. Something tells me the SSD or HDD connected won't make a difference. I do think blacklisting based on something else would be preferable. Not sure what. 

As I wrote when I opened this bug: It's a strange one. Strange to me, anyway. Could be something specific to this models Lenovo motherboard or some Lenovo motherboards that make the system hang/freeze when SATA power saving is enabled and screen brightness is changed. Perhaps an electricial engineer at Lenovo would know. I'm guessing more people will run into this if it's somehow the chipset. The only thing I know for sure is that it wasn't a problem prior to Fedora 28 with 4.16.x kernel and that ahci.mobile_lpm_policy=1 solves it. And it's 100% reproducable, instantly, every time.

$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Broadwell-U Host Bridge -OPI [8086:1604] (rev 09)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 5500 [8086:1616] (rev 09)
00:03.0 Audio device [0403]: Intel Corporation Broadwell-U Audio Controller [8086:160c] (rev 09)
00:14.0 USB controller [0c03]: Intel Corporation Wildcat Point-LP USB xHCI Controller [8086:9cb1] (rev 03)
00:16.0 Communication controller [0780]: Intel Corporation Wildcat Point-LP MEI Controller #1 [8086:9cba] (rev 03)
00:1b.0 Audio device [0403]: Intel Corporation Wildcat Point-LP High Definition Audio Controller [8086:9ca0] (rev 03)
00:1c.0 PCI bridge [0604]: Intel Corporation Wildcat Point-LP PCI Express Root Port #1 [8086:9c90] (rev e3)
00:1c.2 PCI bridge [0604]: Intel Corporation Wildcat Point-LP PCI Express Root Port #3 [8086:9c94] (rev e3)
00:1c.3 PCI bridge [0604]: Intel Corporation Wildcat Point-LP PCI Express Root Port #4 [8086:9c96] (rev e3)
00:1f.0 ISA bridge [0601]: Intel Corporation Wildcat Point-LP LPC Controller [8086:9cc3] (rev 03)
00:1f.2 SATA controller [0106]: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] [8086:9c83] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation Wildcat Point-LP SMBus Controller [8086:9ca2] (rev 03)
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 10)
03:00.0 Network controller [0280]: Intel Corporation Wireless 3160 [8086:08b4] (rev 93)

Comment 3 Hans de Goede 2018-04-25 11:18:01 UTC
Ok, thank you for the info. I will wait for you to report back with the results of testing with a normal HDD instead of a SSD.

Comment 4 Oyvind Saether 2018-04-25 19:49:12 UTC
A Fedora 28 installation on an ancient Seagate 160 GB 5400 RPM HDD produced the exact same result (Fedora install was done on another system). Popping the drive into the laptop and logging in and turning the screen brightness up and down made the laptop totally freeze.

It's clearly not the SSD or HDD so I don't know what, if anything, this should be blacklisted against. It could be a hardware bug triggered by SATA power saving that's specific to the Lenovo-G50-80 with the Intel(R) Core(TM) i7-5500U CPU (seems like a too strange bug to be a common problem?).

Perhaps the simplest course of action is to leave the bug open for a few months and see if anyone else shows up with the same/very similar problem?

Comment 5 Hans de Goede 2018-04-27 09:32:24 UTC
I just checked and the drivers/ata/ahci.c code already has a whole bunch of system-id based quirks (instead of disk-id based quirks).

So I think it would be appropriate to add a system-id based quirk for the Lenovo-G50-80 to never allow LPM on this laptop.

Can you run:

sudo dmidecode > dmidecode.log

And then *email* me the generated dmidecode.log file at hdegoede? The reason I'm asking to do this by email is because the dmi info of your machine also contains some uniquely identifying info like a serial-number which you may not want to have public in bugzilla.

Comment 6 Hans de Goede 2018-04-28 18:32:52 UTC
Thank you for the DMI info.

In the mean time I've been getting more bug-reports by email from users of various Lenovo 50 series laptops (Thinkpad x250, T450s) with a similar issue.

Yet I also have a report of this working fine on a T450s from a while back, can you try downloading and installing an older kernel, say 4.14 and do the:

for powercontrol in /sys/class/scsi_host/host*/link_power_management_policy ; do
  echo 'min_power' > $powercontrol
done

Thing there? You can find older Fedora kernels here:

https://koji.fedoraproject.org/koji/packageinfo?packageID=8

And generic instructions for testing kernels from koji here:
https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

If you can see if using 4.14 or 4.13 with min-power does work, then we can see from there, because I don't really want to blacklist LPM on all Lenovo 50 series models.

Comment 7 Hans de Goede 2018-04-28 21:12:51 UTC
p.s.

Please also try a 4.17-rc2 build from koji, perhaps we get lucky and the underlying issue has been fixed in the mean time.

Comment 8 Oyvind Saether 2018-04-29 18:03:14 UTC
It's all bad.

The latest 4.14 kernel linux-4.14.37 refused to compile make bzImage for some reason,
pager.c:36:12: error: passing argument 2 to restrict-qualified parameter aliases with argument 4 [-Werror=restrict]
  select(1, &in, NULL, &in, NULL);
brief search indicates it's something related to gcc 8. 

Ignoring that kernel I tried 4.13.0-1.fc28.x86_64 from
https://koji.fedoraproject.org/koji/buildinfo?buildID=965691
and kernel 4.13.0 has this problem with both min_power and med_power_with_dipm

I also tried today's linus from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git with make oldconfig using 4.16.4 fedora config and my 4.17.0-rc2linus.git-seohyun+ kernel has the same problem with both min_power and med_power_with_dipm.

I did notice that med_power_with_dipm makes it freeze immediately when turning the screen brightness up and down while min_power took like a minute before a freeze. Or, it was random. Either way, it quickly freezes. I guess I'll have to stick with ahci.mobile_lpm_policy=1 for now since all the kernels 4.13-4.17git are bad.

as for blacklisting,
Handle 0x0002, DMI type 2, 15 bytes
Version: SDK0J40700 WIN
^^blacklist by the motherboard in this thing could be an option. 

Last unimportant detail: I know there's a BIOS update for this machine. It's eeew .exe file. This machine did come with a wintendo license but it seems like a lot of trouble to infect it with windows and update the BIOS just to see if it helps. I don't want to.

Comment 9 Hans de Goede 2018-05-02 15:31:58 UTC
Hi Oyvind,

Thank you for all the testing. I'm currently also trying to get more input from users who are seeing LPM related issues on other 50 series Lenovo models, to be continued.

Regards,

Hans

Comment 10 Andreas Farre 2018-05-03 08:59:57 UTC
Hi,

I'm fairly certain that I have this issue as well. I'm on a Lenovo W541 laptop.

Comment 11 Hans de Goede 2018-05-03 11:07:42 UTC
(In reply to Andreas Farre from comment #10)
> I'm fairly certain that I have this issue as well. I'm on a Lenovo W541
> laptop.

Try adding ahci.mobile_lpm_policy=0 to your kernel command line, if the problem then goes away then your issue is related. If that does not fix things then you've an unrelated issue.

Comment 12 Andreas Farre 2018-05-03 13:50:46 UTC
(In reply to Hans de Goede from comment #11)
>
> Try adding ahci.mobile_lpm_policy=0 to your kernel command line, if the
> problem then goes away then your issue is related. If that does not fix
> things then you've an unrelated issue.

I thought I could trigger it consistently with but not without ahci.mobile_lpm_policy=0, but now it seems I can't. I'll have a look at filed bugs and see if I can find something that fits, otherwise I'll report a new one. 
Thanks.

Comment 13 Hans de Goede 2018-05-07 10:51:07 UTC
Oyvind,

In the mean time I've received reports from several Lenovo 50 series users that they are not affected, so I'm trying to figure out if there is anything which people which are having issues with LPM on Lenovo 50 series havee in common.

Can you provide me with the following info?  :

1) "cat /sys/class/dmi/id/bios_version /sys/class/dmi/id/bios_date" output
2) Are you using a dock and if yes, then which dock model and what is the
dock firmware version? (I don't know how to query this, if you
don't know either just letting me know if you use a dock would be great)
3) Which desktop environment are you using (the crashes might be GPU load
related).

Regards,

Hans

Comment 14 Mark Thacker 2018-05-07 14:23:28 UTC
I can confirm that adding ahci.mobile_lpm_policy=0 to the kernel boot line DID FIX an issue with my Lenovo W541 laptop that refused to suspend when the power button was pressed.

Issue : Press power button to suspend. Display goes blank, keyboard backlight turns off, system does NOT power down, fan still running, unable to retain control with keyboard press or power button press.

Adding the above line addressed the issue immediately.

Again, system is Lenovo W541 running Fedora 28. Issue did NOT appear in Fedora 27 nor did it appear when F27 was upgraded to the 4.16 kernel. Only after a complete system upgrade to Fedora 28.

Hope this helps.

Comment 15 Oyvind Saether 2018-05-08 22:08:55 UTC
1) $ cat /sys/class/dmi/id/bios_version /sys/class/dmi/id/bios_date
B0CN95WW
07/31/2015

Lenovo does offer some kind of .exe file BIOS update.
https://pcsupport.lenovo.com/se/en/products/laptops-and-netbooks/lenovo-g-series-laptops/g50-80/downloads/ds102231

2) Never owned or used a dock.

3) XFCE4. Xorg not Wayland.

4) Not that I think it matters, but. I have replaced the 1080p eeeew TN screen panel with a 1080p IPS. I replaced the included 4GB DDR3 SO-DIMM with 2x8GB. Also replaced the included 128 GB SSD with 512 GB. And I obviously fixed the stock windows infection with Fedora. Everything else is original.

Comment 16 Hans de Goede 2018-05-09 07:48:25 UTC
(In reply to Mark Thacker from comment #14)
> I can confirm that adding ahci.mobile_lpm_policy=0 to the kernel boot line
> DID FIX an issue with my Lenovo W541 laptop that refused to suspend when the
> power button was pressed.

Hmm, that seems only somewhat related to LPM, enabling LPM allows the uncore (everything which is integrated into a CPU now a days which is not CPU cores, so PCI-E root, DRAM controller, GPU, etc.) to reach lower power-states (higher-numbered PC-states). I guess that that is somehow triggering a problem.

Can you file a bug for this at:

https://bugzilla.kernel.org/enter_bug.cgi?product=ACPI

Please?

Comment 17 Hans de Goede 2018-05-09 08:00:08 UTC
(In reply to Oyvind Saether from comment #15)
> 1) $ cat /sys/class/dmi/id/bios_version /sys/class/dmi/id/bios_date
> B0CN95WW
> 07/31/2015
> 
> Lenovo does offer some kind of .exe file BIOS update.
> https://pcsupport.lenovo.com/se/en/products/laptops-and-netbooks/lenovo-g-
> series-laptops/g50-80/downloads/ds102231
> 
> 2) Never owned or used a dock.
> 
> 3) XFCE4. Xorg not Wayland.
> 
> 4) Not that I think it matters, but. I have replaced the 1080p eeeew TN
> screen panel with a 1080p IPS. I replaced the included 4GB DDR3 SO-DIMM with
> 2x8GB. Also replaced the included 128 GB SSD with 512 GB. And I obviously
> fixed the stock windows infection with Fedora. Everything else is original.

Erm, replacing the RAM and the disks is fine, but replacing the panel and then getting a hard-freeze when changing the brightness sounds like that might be
part of the problem here.

As mentioned in the previous comment, enabling LPM allows the uncore (everything which is integrated into a CPU now a days which is not CPU cores, so PCI-E root, DRAM controller, GPU, etc.) to reach lower power-states (higher-numbered PC-states). I guess there is an issue with changing brightness with your alternative panel while the CPU is in high-numbered PC-states (package C-states).

As such I'm tempted to close this bug and suggest you keep using ahci.mobile_lpm_policy=1 as a workaround for your unique laptop.

Comment 18 Oyvind Saether 2018-05-09 16:48:39 UTC
I guess it's not too far-fetched to assume this bug is unique to my laptop unless and close the bug until/unless someone else indicates it's not.

Comment 19 Mark Thacker 2018-05-09 16:50:33 UTC
Close it if you like, but the 'fix' of disabling ahci addressed a big issue that I had with my Lenovo not being able to suspend properly.

Comment 20 Oyvind Saether 2018-05-09 18:05:37 UTC
(In reply to Mark Thacker from comment #19)
I have the impression that the default ahci.mobile_lpm_policy does cause all kinds of (other) problems and that your bug absolutely should be looked into and that perhaps your motherboard and/or something else should be put on a blacklist so it defaults to ahci.mobile_lpm_policy=0 or ahci.mobile_lpm_policy=1. I just don't think it would have the same cause (if my bug is indeed caused by replacing the eeeew TN panel with a IPS).

Do feel free to re-open this bug or file another. It's up to you.

Comment 21 Hans de Goede 2018-05-10 08:05:45 UTC
(In reply to Mark Thacker from comment #19)
> Close it if you like, but the 'fix' of disabling ahci addressed a big issue
> that I had with my Lenovo not being able to suspend properly.

Right, but that is another bug, it seems likely given the 100% reproducable connection between changing brightness and the freeze Oyvind is seeing that his bug, which this bug is about, is caused by the LCD panel in his machine being replaced with a non OEM part.

As mentioned already the main reason for enabling SATA LPM is that it allows the uncore to achieve much lower power-states, typically the lowest state reached changes from PC2 to PC7, which results in a huge gain in battery life. This is a change which affects *runtime* power-management of the disks. So most people with LPM related problems see them during runtime, not during suspend resume, so I doubt the root cause of your issue is SATA LPM. Not having SATA LPM keeps the uncore awake, which likely papers over your issue (just as it likely helps avoid the issue caused by Oyvind's panel swap).

Your W541 is a laptop with hybrid Intel / NVIDIA graphics, those have never worked really well with Linux and recent models (including your model) are even worse.

Anyways please file a new bug for this.

In this new bug report, please include testing results from the latest 4.17 kernel: https://koji.fedoraproject.org/koji/buildinfo?buildID=1080118
Install instructions here: https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

And also please add a note there if you are using the nouveau or nvidia-binary driver.

Comment 22 Hans de Goede 2018-05-11 09:14:44 UTC
One last about the W541 problem, this might be related to:
https://bugzilla.kernel.org/show_bug.cgi?id=199057

It probably isn't, but you could try adding: "mem_sleep_default=deep" to the kernel commandline and see if that helps.

Comment 23 Mark Thacker 2018-05-14 18:27:06 UTC
FYI that I have opened a new bug related to the ThinkPad W541 having suspend issues. Also note that I tried with the 4.17.xxx test kernel and still had the same issue. Workaround is to just turn off ahci for this system.

New Bug : https://bugzilla.redhat.com/show_bug.cgi?id=1578072

Comment 24 David Mitchell 2018-05-29 20:53:28 UTC
I am seeing this issue as well on my Lenovo W541, setting the kernel options doesn't seem to fix it, it happens most often after I lock my machine or it goes idle and locks. Once I unlock, it will hard freeze within a second of displaying the screen. I am using BIOS version GNET71WW (2.19 ). Unfortunately, the update utility for the BIOS appears to be for Windows only.

Comment 25 Hans de Goede 2018-05-29 22:46:10 UTC
(In reply to David Mitchell from comment #24)
> I am seeing this issue as well on my Lenovo W541, setting the kernel options
> doesn't seem to fix it, it happens most often after I lock my machine or it
> goes idle and locks. Once I unlock, it will hard freeze within a second of
> displaying the screen. I am using BIOS version GNET71WW (2.19 ).
> Unfortunately, the update utility for the BIOS appears to be for Windows
> only.

Ok, thank you for the info, please use the new bug 1578072 bug for further tracking of the W541 issue.

Comment 26 Pierre 2018-06-01 11:29:31 UTC
I'm coming from here: https://fedoraproject.org/wiki/Common_F28_bugs#lpm-hang

Running a T450s with up-to-date Fedora 28. Glad I've found the above article: booting with "ahci.mobile_lpm_policy=0" seems to have solved the issue on my side.

I used to have several complete freezes per day with heavily loaded browsers (both FF and Chrome with dozens of tabs), but it's now been happily running a whole day without any hiccup.

Comment 27 Hans de Goede 2018-06-01 11:45:01 UTC
Hi,

(In reply to Pierre from comment #26)
> I'm coming from here: https://fedoraproject.org/wiki/Common_F28_bugs#lpm-hang
> 
> Running a T450s with up-to-date Fedora 28. Glad I've found the above
> article: booting with "ahci.mobile_lpm_policy=0" seems to have solved the
> issue on my side.
> 
> I used to have several complete freezes per day with heavily loaded browsers
> (both FF and Chrome with dozens of tabs), but it's now been happily running
> a whole day without any hiccup.

May I ask what version BIOS your T450s is running?  :

cat /sys/class/dmi/id/bios_date /sys/class/dmi/id/bios_version

And also what disk you are using:

dmesg | grep UDMA

Regards,

Hans

Comment 28 Pierre 2018-06-01 11:53:30 UTC
(In reply to Hans de Goede from comment #27)
> Hi,
> 
> (In reply to Pierre from comment #26)
> > [...]
> 
> May I ask what version BIOS your T450s is running?  :
> 
> cat /sys/class/dmi/id/bios_date /sys/class/dmi/id/bios_version
> 
> And also what disk you are using:
> 
> dmesg | grep UDMA
> 
> Regards,
> 
> Hans

Hi Hans,

Sure! Here we go:

Latest BIOS:
 03/15/2018
 JBET69WW (1.33 )

Disk:
[0.710642] ata1: SATA max UDMA/133 abar m2048@0xf123c000 port 0xf123c100 irq 40
[1.026960] ata1.00: ATA-9: SAMSUNG MZ7LN512HCHP-000L1, EMT05L0Q, max UDMA/133
[1.028456] ata1.00: configured for UDMA/133

All the best
--Pierre

Comment 29 Hans de Goede 2018-06-01 12:01:43 UTC
Pierre,

Thanks for the info, so you are on a recent BIOS using a quite new SSD which works fine with LPM for other users AFAIK.

So I believe that what you are seeing is some unrelated problem being exposed by LPM allowing the uncore (anything but the CPU cores, including the GPU) reaching deeper powersaving states.

ahci.mobile_lpm_policy=0 is a workaround as it stop the uncore from reaching its deep powersaving states, which in turn are likely triggering a bug elsewhere.

I suggest you keep using ahci.mobile_lpm_policy=0 for now and retry without it when the 4.17 kernel becomes available in updates-testing.

Regards,

Hans

Comment 30 Pierre 2018-06-01 12:11:44 UTC
Hans,

Thanks for the detailed info and for your efforts!
Yes, I'll keep it that way until 4.17 comes out.

Greets
--Pierre

Comment 31 Randy Barlow 2018-06-04 13:38:41 UTC
I also see hangs on my 3rd gen Lenovo X1 Carbon:

[rbarlow@ohm ~]$ cat /sys/class/dmi/id/bios_date
/sys/class/dmi/id/bios_version
04/11/2018
N14ET47W (1.25 )
[rbarlow@ohm ~]$ dmesg | grep UDMA
[    0.577446] ata4: SATA max UDMA/133 abar m2048@0xf113c000 port
0xf113c280 irq 42
[    0.889039] ata4.00: ATA-9: SAMSUNG MZNLN256HCHP-000L7, EMT22L6Q, max
UDMA/133
[    0.892263] ata4.00: configured for UDMA/133 

I do not notice problems when changing screen brightness. What I've observed is that the hangs only happen when I am on battery power - this doesn't happen when I'm docked, which is most of the time.

Comment 32 Bruno Lavoie 2018-06-08 12:56:37 UTC
Hello, 

just landed here from this page: https://fedoraproject.org/wiki/Common_F28_bugs#Certain_laptops_.28Lenovo.29_hang_randomly

I can confirm that the proposed change fix and my laptop stopped freezing without any good reasons.

Lenovo L450

$ cat /sys/class/dmi/id/bios_date /sys/class/dmi/id/bios_version
03/02/2015
JDET49WW (1.11 )

$ dmesg | grep UDMA
[    1.717966] ata1: SATA max UDMA/133 abar m2048@0xe123c000 port 0xe123c100 irq 42
[    2.035907] ata1.00: ATA-9: SanDisk SDSSDXPS240G, X21200RL, max UDMA/133
[    2.038691] ata1.00: configured for UDMA/133
[  960.211682] ata1.00: configured for UDMA/133
[ 3447.710232] ata1.00: configured for UDMA/133
[ 7479.598571] ata1.00: configured for UDMA/133
[28359.578642] ata1.00: configured for UDMA/133
[30233.558549] ata1.00: configured for UDMA/133
[37197.420146] ata1.00: configured for UDMA/133

Should I update my BIOS ?

Comment 33 Hans de Goede 2018-06-09 11:31:08 UTC
(In reply to Bruno Lavoie from comment #32)
> Hello, 
> 
> just landed here from this page:
> https://fedoraproject.org/wiki/Common_F28_bugs#Certain_laptops_.28Lenovo.
> 29_hang_randomly
> 
> I can confirm that the proposed change fix and my laptop stopped freezing
> without any good reasons.
> 
> Lenovo L450
> 
> $ cat /sys/class/dmi/id/bios_date /sys/class/dmi/id/bios_version
> 03/02/2015
> JDET49WW (1.11 )

That is very old, can you try updating your BIOS and see if that makes the problem go away? I sofar have 2 reports of 450 models with Sandisk SSDs having this freeze and both involve a really old BIOS, where as other 450 model users (including users with a Sandisk SSD) don't see any issues.

Regards,

Hans

Comment 34 H.J. Lu 2018-06-09 15:58:00 UTC
I ran into the similar issue on Intel NUC NUC6i5SYB:

https://bugzilla.redhat.com/show_bug.cgi?id=1574777

Comment 35 Hans de Goede 2018-06-10 08:07:31 UTC
(In reply to H.J. Lu from comment #34)
> I ran into the similar issue on Intel NUC NUC6i5SYB:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1574777

And looking at that bug you fixed it by updating the BIOS, that is good to know, thank you.

Comment 36 Bruno Lavoie 2018-06-11 13:34:38 UTC
(In reply to Hans de Goede from comment #33)
> (In reply to Bruno Lavoie from comment #32)
> > Hello, 
> > 
> > just landed here from this page:
> > https://fedoraproject.org/wiki/Common_F28_bugs#Certain_laptops_.28Lenovo.
> > 29_hang_randomly
> > 
> > I can confirm that the proposed change fix and my laptop stopped freezing
> > without any good reasons.
> > 
> > Lenovo L450
> > 
> > $ cat /sys/class/dmi/id/bios_date /sys/class/dmi/id/bios_version
> > 03/02/2015
> > JDET49WW (1.11 )
> 
> That is very old, can you try updating your BIOS and see if that makes the
> problem go away? I sofar have 2 reports of 450 models with Sandisk SSDs
> having this freeze and both involve a really old BIOS, where as other 450
> model users (including users with a Sandisk SSD) don't see any issues.
> 
> Regards,
> 
> Hans

Hi Hans,

Updated my BIOS during past weekend and since no problem so far... Running on battery since I'm at work.

In fact, it's good news! Until you get news from me, you can take it for granted that the BIOS solved the problem.

By the way, thanks for you great work.

Bruno

Comment 37 Mark Thacker 2018-06-12 19:29:56 UTC
(In reply to Mark Thacker from comment #14)
> I can confirm that adding ahci.mobile_lpm_policy=0 to the kernel boot line
> DID FIX an issue with my Lenovo W541 laptop that refused to suspend when the
> power button was pressed.
> 
> Issue : Press power button to suspend. Display goes blank, keyboard
> backlight turns off, system does NOT power down, fan still running, unable
> to retain control with keyboard press or power button press.
> 
> Adding the above line addressed the issue immediately.
> 
> Again, system is Lenovo W541 running Fedora 28. Issue did NOT appear in
> Fedora 27 nor did it appear when F27 was upgraded to the 4.16 kernel. Only
> after a complete system upgrade to Fedora 28.
> 
> Hope this helps.

Adding in an update. 

I've updated to the latest bios and the latest F28 and the issue seems to be gone! I have removed the ahci disabling command from my kernel command line and the system is properly handling suspend and resume.

For the record :
GNOME running on XORG
Fedora 28 running 4.16.14-300.fc28.x86_64
BIOS version : GNET87WW (2.35 )

Comment 38 Hans de Goede 2018-06-13 08:02:05 UTC
Hi,

(In reply to Mark Thacker from comment #37)
> Adding in an update. 
> 
> I've updated to the latest bios and the latest F28 and the issue seems to be
> gone! I have removed the ahci disabling command from my kernel command line
> and the system is properly handling suspend and resume.
> 
> For the record :
> GNOME running on XORG
> Fedora 28 running 4.16.14-300.fc28.x86_64
> BIOS version : GNET87WW (2.35 )

That is great news, thank you.

Regards,

Hans

Comment 39 Hans de Goede 2018-06-20 14:11:32 UTC
Hi All,

I've been discussing Lenovo 50 series models needing a new BIOS with upstream and the plan is to add a LPM blacklist for older BIOS versions which will also log a warning asking users to update.

To make this list I need dmidecode output. Can you please run
sudo dmidecode > dmidecode.log

And attach the generated file here? Note the file will contain your devices serial number, so you may want to edit it to remove that and/or make the attachment private.

I'm looking for dmidecode output for:

X250
T450s
L450
W541

So if you have one of those, please run dmidecode and attach the output here.

Regards,

Hans

Comment 40 Pierre 2018-06-20 14:45:03 UTC
Created attachment 1453229 [details]
dmidecode output for T450s

Comment 41 Dimitris 2018-06-20 15:28:57 UTC
Created attachment 1453234 [details]
X250 BIOS 1.31 dmidecode

ThinkPad X250, I'm at BIOS 1.31 (latest as of now) and booting with command line:

BOOT_IMAGE=/vmlinuz-4.16.16-300.fc28.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.luks.uuid=luks-<uuid>rd.lvm.lv=fedora/swap resume=/dev/mapper/fedora-swap acpi_osi=Linux rhgb quiet LANG=en_US.UTF-8

Since upgrading to F28 over the weekend I've run for many hours on AC, and several hours on battery.  Both included suspend/resume cycles.  Haven't had any crashes.

Comment 42 Bruno Lavoie 2018-06-21 12:39:43 UTC
Created attachment 1453444 [details]
DMI Decode L450

BIOS VERSION
04/20/2018
JDET66WW (1.28 )

Comment 43 Joseph Moragrega 2018-06-23 22:34:58 UTC
This same thing is happening to me. I have a Lenovo W541 , the workaround of the kernel parameter did the trick for me. 

Is there a bugreport or something that I can provide for the troubleshooting ?

Comment 44 Hans de Goede 2018-06-24 12:19:13 UTC
Hi Joseph,

(In reply to Joseph Moragrega from comment #43)
> This same thing is happening to me. I have a Lenovo W541 , the workaround of
> the kernel parameter did the trick for me. 
> 
> Is there a bugreport or something that I can provide for the troubleshooting
> ?

This is a known issue. I'm working on a kernel fix to automatically disable LPM on various Lenovo 50 series model including the W541 when using an older BIOS,
I've been collecting dmidecode info on affected models and I still need W541 dmidecode output before I can write the patch.

Can you please run:

sudo dmidecode > dmidecode.log

And attach the generated file here? Note the file will contain your devices serial number, so you may want to edit it to remove that and/or make the attachment private.

After doing that try upgrading your W541 to the latest BIOS from Lenovo that should fix things, without you needing to specify the kernel commandline parameter.

Thank & Regards,

Hans

Comment 46 Scott Reed 2018-07-05 15:40:45 UTC
I am using a Lenovo Thinkpad X270. I added ahci.mobile_lpm_policy=0
 as a kernel parameter and I still am experiencing the issue where my mouse pointer freezes for a few seconds. It is pretty intermittent. The issue has only appeared since upgrading from FC27 to FC28. Thanks!

Comment 47 Pierre 2018-07-06 09:19:00 UTC
(In reply to Hans de Goede from comment #29)
> Pierre,
> 
> Thanks for the info, so you are on a recent BIOS using a quite new SSD which
> works fine with LPM for other users AFAIK.
> 
> So I believe that what you are seeing is some unrelated problem being
> exposed by LPM allowing the uncore (anything but the CPU cores, including
> the GPU) reaching deeper powersaving states.
> 
> ahci.mobile_lpm_policy=0 is a workaround as it stop the uncore from reaching
> its deep powersaving states, which in turn are likely triggering a bug
> elsewhere.
> 
> I suggest you keep using ahci.mobile_lpm_policy=0 for now and retry without
> it when the 4.17 kernel becomes available in updates-testing.
> 
> Regards,
> 
> Hans

FYI, I've just tried to run 4.17.3 without the workaround. The system froze after about one hour.

Comment 48 Mark Thacker 2018-07-10 10:40:30 UTC
(In reply to Hans de Goede from comment #39)
> Hi All,
> 
> I've been discussing Lenovo 50 series models needing a new BIOS with
> upstream and the plan is to add a LPM blacklist for older BIOS versions
> which will also log a warning asking users to update.
> 
> To make this list I need dmidecode output. Can you please run
> sudo dmidecode > dmidecode.log
> 
> And attach the generated file here? Note the file will contain your devices
> serial number, so you may want to edit it to remove that and/or make the
> attachment private.
> 
> I'm looking for dmidecode output for:
> 
> X250
> T450s
> L450
> W541
> 
> So if you have one of those, please run dmidecode and attach the output here.
> 
> Regards,
> 
> Hans

I'm still noticing an issue with my W541 when transitioning between docked to undocked (or vice versa) state with the system asleep.

System freezes upon opening the laptop lid if you change it's docking status.

If you don't change the status (i.e. close the lid when it's undocked and then resume later with it still undocked) everything is fine.

Using the kernel parameter to disable LPM solves this problem.
(note that I had incorrectly previously reported that the latest kernel fixes this issue - I hadn't tried the docking/undocking test at that time)

Nutshell : I have the latest BIOS for the W541 and have attached it here (as a private attachment).

	Version: GNET87WW (2.35 )
	Release Date: 04/09/2018

Comment 50 Pierre 2018-07-16 11:26:47 UTC
FWIW, I'm running kernel 4.17.5 with ahci.mobile_lpm_policy=0 and the system froze on me twice recently.

I'm probably too cool, which freezes the CPU through the keyboard. :-)

Anyway, going back to boot 4.17.4 (or .3) for a while.

Comment 51 William Cohen 2018-07-17 13:46:51 UTC
I have also noticed freezes on my Lenovo P51 after I upgraded to Fedora 28 and the  4.17.4 and 4.17.5 kernels.  The freezes seemed to happen most commonly when I was backing up my internal ssd drive to an external USB 3.0 harddrive.  I wonder if the BIOS might be doing something behind the scenes that is incompatible with Linux. I have turned off a BIOS power management setting to see if that improves the situation.  In particular I turned off PCI Express Power Management mention on page 88 of  https://download.lenovo.com/pccbbs/mobiles_pdf/p51_ug_en.pdf

Comment 52 Pierre 2018-07-17 13:53:41 UTC
(In reply to Pierre from comment #50)
> FWIW, I'm running kernel 4.17.5 with ahci.mobile_lpm_policy=0 and the system
> froze on me twice recently.
> 
> I'm probably too cool, which freezes the CPU through the keyboard. :-)
> 
> Anyway, going back to boot 4.17.4 (or .3) for a while.

Freezes happening with 4.17.4 as well.
Now under 4.17.3 and crossing fingers. (Which mkaes it hadr to tpye corrcetly!)

Comment 53 Jeremy Cline 2018-07-17 14:04:52 UTC
Hi Will, Pierre, you're likely hitting https://bugzilla.redhat.com/show_bug.cgi?id=1598462. It'll be fixed in 4.17.7.

Comment 54 Pierre 2018-07-17 14:12:04 UTC
Hi Jeremy,
Might well be it, yep. Thanks for the update!

Comment 55 Kevin Jude Concessao 2018-07-20 16:57:37 UTC
Lenovo Ideapad 320 freezes occasionally on 4.17.x but runs fine on 4.16.x

Comment 56 lukes 2018-07-25 13:37:41 UTC
the same issue on Thinkpad T440s It's driving me crazy!!!

Comment 57 lukes 2018-07-25 14:01:30 UTC
Created attachment 1470522 [details]
dmidecode output for T440s

Comment 58 David Demelier 2018-08-13 06:41:52 UTC
Happens to me as well on a x1 carbon 5th gen (2017).

Comment 59 Hans de Goede 2018-08-13 08:47:57 UTC
(In reply to David Demelier from comment #58)
> Happens to me as well on a x1 carbon 5th gen (2017).

What happens as well?

1) What is the problem you are seeing ?
2) Have you tried using ahci.mobile_lpm_policy=0 on the kernel commandline as a workaround ?
3) Does this workaround help ?
4) What is the output of "cat /sys/class/dmi/id/bios_date" ?

Comment 60 David Demelier 2018-08-13 08:58:07 UTC
1) It randomly freeze after several minutes of usage, impossible to reboot. Machine is not pingable.

2) I've enabled it, it seems to work

4) 09/27/2017

Comment 61 Hans de Goede 2018-08-13 10:12:28 UTC
(In reply to David Demelier from comment #60)
> 1) It randomly freeze after several minutes of usage, impossible to reboot.
> Machine is not pingable.
> 
> 2) I've enabled it, it seems to work

Ok, so lets see what disk you are using, please do:

dmesg | grep UDMA

And paste the output here.

Are you using the disk the machine came originally with, or did you replace it?

> 4) 09/27/2017

Please try upgrading your BIOS and see if that helps.

Comment 62 Dirk Arnold 2018-08-26 21:16:01 UTC
I've got a 3rd Gen X1 Carbon with the BIOS updated to 06/11/2018 that hangs from time to time.  ahci.mobile_lpm_policy=0 appears to fix it.  dmesg | grep UDMA gives me this:

[    0.805921] ata4: SATA max UDMA/133 abar m2048@0xf113c000 port 0xf113c280 irq 40
[    1.122485] ata4.00: ATA-9: SAMSUNG MZNLN128HCGR-000L1, EMT22L0Q, max UDMA/133
[    1.125259] ata4.00: configured for UDMA/133

It's the original disk.

Comment 63 Hans de Goede 2018-08-27 08:59:04 UTC
Hello Dirk,

(In reply to Dirk Arnold from comment #62)
> I've got a 3rd Gen X1 Carbon with the BIOS updated to 06/11/2018 that hangs
> from time to time.  ahci.mobile_lpm_policy=0 appears to fix it.  dmesg |
> grep UDMA gives me this:
> 
> [    0.805921] ata4: SATA max UDMA/133 abar m2048@0xf113c000 port 0xf113c280
> irq 40
> [    1.122485] ata4.00: ATA-9: SAMSUNG MZNLN128HCGR-000L1, EMT22L0Q, max
> UDMA/133
> [    1.125259] ata4.00: configured for UDMA/133
> 
> It's the original disk.

Do you still have Windows on the machine (multi-boot?) if so can you try running this updater for the SSD firmware? :

https://support.lenovo.com/nl/en/downloads/ds038904

Regards,

Hans

Comment 64 Devin Cofer 2018-09-13 04:28:24 UTC
Same issue on Lenovo ThinkPad X1 Carbon Gen 6.
BIOS v1.30 from a few days ago.

ahci.mobile_lpm_policy=0 fixes the problem.

dmedg | grep -i UDMA shows nothing.

Disk is a PCI-E NVMe Samsung PM981 (OEM).

Comment 65 Hans de Goede 2018-09-13 12:59:32 UTC
Hi,

(In reply to Devin Cofer from comment #64)
> Same issue on Lenovo ThinkPad X1 Carbon Gen 6.
> BIOS v1.30 from a few days ago.
> 
> ahci.mobile_lpm_policy=0 fixes the problem.
> 
> dmedg | grep -i UDMA shows nothing.
> 
> Disk is a PCI-E NVMe Samsung PM981 (OEM).

Ok, that is weird, the only difference using ahci.mobile_lpm_policy=0 makes when not using any SATA disks is that it will turn the controller off instead of leaving it in "waiting for hotplug" mode. So if that option makes the difference then the real culprit likely is somewhere else and the problem simply is that powering down the SATA controllers allows deeper power-saving states (higher PC states) to be reached which then exposes a bug elsewhere (*)

Can you do:

ls -l /sys/class/scsi_device/

And post the output, to make sure you really don't have any sata devices attached ?

Regards,

Hans


*) or at least that is what I think is happening.

Comment 66 Devin Cofer 2018-09-14 14:02:57 UTC
Apologies for adding noise here.

I thought this fixed the long lag on poweroff, but it seems it was not the cause of that.

I have since wiped my Fedora partition so I will be unable to help troubleshoot.

Comment 67 Benjamin Salchow 2019-02-02 12:14:26 UTC
Sorry for writing to a closed bug, but I had similar problems with my Lenovo ThinkPad W540 (Fedora 29 4.20.5-200.fc29.x86_64)

Lenovo ThinkPad W540
Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107GLM [Quadro K1100M] (rev a1)
DDR3 32 GB 1600

After adding ahci.mobile_lpm_policy=0 to the Kernel, everything works fine and no freezes by standby, docking station or user switch.


ls -l /sys/class/scsi_device/
Output:
lrwxrwxrwx. 1 root root 0  1. Feb 20:04 0:0:0:0 -> ../../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/scsi_device/0:0:0:0
lrwxrwxrwx. 1 root root 0  1. Feb 20:04 5:0:0:0 -> ../../devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0/scsi_device/5:0:0:0

dmesg | grep -i UDMA
Output:
[    3.312400] ata1: SATA max UDMA/133 abar m2048@0xb3a3c000 port 0xb3a3c100 irq 26
[    3.312412] ata6: SATA max UDMA/133 abar m2048@0xb3a3c000 port 0xb3a3c380 irq 26
[    3.626153] ata1.00: ATA-9: SAMSUNG MZ7TE512HMHP-000L1, EXT06L0Q, max UDMA/133
[    3.630354] ata6.00: ATAPI: HL-DT-ST DVDRAM GU90N, LU20, max UDMA/133
[    3.631133] ata1.00: configured for UDMA/133
[    3.638867] ata6.00: configured for UDMA/133
[ 3170.906350] ata1.00: configured for UDMA/133
[ 3170.924006] ata6.00: configured for UDMA/133
[ 4853.867412] ata1.00: configured for UDMA/133
[ 4853.872529] ata6.00: configured for UDMA/133

cat /sys/class/dmi/id/bios_date /sys/class/dmi/id/bios_version
Output:
04/09/2018
GNET87WW (2.35 )

(Latest version - also docking station has the latest version).


The workaround is for me to add "ahci.mobile_lpm_policy=0" to GRUB_CMDLINE_LINUX=".... - since then everything is stable. Every Freeze before did not show up in any logs, it's really hard to figure out what the issue is. But still, this line solved the freeze for me.

Thank you very much

Comment 68 Hans de Goede 2019-02-03 09:05:44 UTC
Hi,

(In reply to Benjamin Salchow from comment #67)
> Sorry for writing to a closed bug, but I had similar problems with my Lenovo
> ThinkPad W540 (Fedora 29 4.20.5-200.fc29.x86_64)
> [    3.626153] ata1.00: ATA-9: SAMSUNG MZ7TE512HMHP-000L1, EXT06L0Q, max

Thank you for your bug report, I've just submitted a patch upstream adding your SSD model and firmware version to the LPM blacklist in the kernel. It will take a while for this to trickly down into the Fedora kernels (through the stable 4.20.x releases) but eventually you should be able to drop the kernel commandline option.

Regards,

Hans

Comment 69 Teresa e Junior 2019-05-20 07:00:07 UTC
I was going to send you an email, but decided to report this publicly for other people to find. I've been struggling with hard lockups on my Lenovo G40-80 since November 2018, as shown on https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798961, and someone posted there about "ahci.mobile_lpm_policy=0" a week ago. I can confirm 6 days of uptime with no issues with this kernel option.

teresaejunior@laptop ~> ls -l /sys/class/scsi_device/
total 0
lrwxrwxrwx 1 root root 0 mai 13 23:50 0:0:0:0 -> ../../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/scsi_device/0:0:0:0/
lrwxrwxrwx 1 root root 0 mai 13 23:50 1:0:0:0 -> ../../devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/scsi_device/1:0:0:0/

teresaejunior@laptop ~> dmesg | grep UDMA
[    1.332588] ata1: SATA max UDMA/133 abar m2048@0xc1218000 port 0xc1218100 irq 46
[    1.332590] ata2: SATA max UDMA/133 abar m2048@0xc1218000 port 0xc1218180 irq 46
[    1.647247] ata2.00: ATAPI: HL-DT-ST DVDRAM GUC0N, T.02, max UDMA/133
[    1.648995] ata2.00: configured for UDMA/133
[    1.651386] ata1.00: ATA-8: ST1000LM024 HN-M101MBB, 2BA30001, max UDMA/100
[    1.657689] ata1.00: configured for UDMA/100

teresaejunior@laptop ~> cat /sys/class/dmi/id/bios_date /sys/class/dmi/id/bios_version
05/07/2015
B0CN79WW

Comment 70 Hans de Goede 2019-05-21 11:33:12 UTC
(In reply to Teresa e Junior from comment #69)
> I was going to send you an email, but decided to report this publicly for
> other people to find. I've been struggling with hard lockups on my Lenovo
> G40-80 since November 2018, as shown on
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798961, and someone
> posted there about "ahci.mobile_lpm_policy=0" a week ago. I can confirm 6
> days of uptime with no issues with this kernel option.
> 
> [    1.647247] ata2.00: ATAPI: HL-DT-ST DVDRAM GUC0N, T.02, max UDMA/133
> [    1.651386] ata1.00: ATA-8: ST1000LM024 HN-M101MBB, 2BA30001, max UDMA/100

The question is which of these 2 devices is causing the freezes (likely the hdd)
can you try booting without ahci.mobile_lpm_policy=0 and then disabling alpm on
just the hdd by doing:

sudo  sh -c "echo max_performance > /sys/class/scsi_host/host0/link_power_management_policy"

And see if things are still stable then?

Comment 71 Teresa e Junior 2019-05-27 18:34:29 UTC
(In reply to Hans de Goede from comment #70)
> can you try booting without ahci.mobile_lpm_policy=0 and then disabling alpm
> on just the hdd by doing:
> 
> sudo  sh -c "echo max_performance >
> /sys/class/scsi_host/host0/link_power_management_policy"
> 
> And see if things are still stable then?

Hello! I've been testing my system with the changes you requested for the last 6 days, and I haven't seen any problems yet!

Comment 72 Hans de Goede 2019-06-11 14:30:43 UTC
(In reply to Teresa e Junior from comment #71)
> (In reply to Hans de Goede from comment #70)
> > can you try booting without ahci.mobile_lpm_policy=0 and then disabling alpm
> > on just the hdd by doing:
> > 
> > sudo  sh -c "echo max_performance >
> > /sys/class/scsi_host/host0/link_power_management_policy"
> > 
> > And see if things are still stable then?
> 
> Hello! I've been testing my system with the changes you requested for the
> last 6 days, and I haven't seen any problems yet!

Thank you for testing this. I've just submitted a patch upstream adding your model HDD to the LPM blacklist. So future kernels should not use LPM on that disk automatically. In the mean time you can keep using your current workaround.

Comment 73 Daniel Simoes 2019-07-06 17:38:15 UTC
Hey, 

Lenovo ThinkPad x270 running fedora 30 here. It was freezing randomly constantly, after booting with the ahci.mobile_lpm_policy=0 kernel parameter the problem was solved. :)

Comment 74 Hans de Goede 2019-07-10 15:22:45 UTC
(In reply to Daniel Simoes from comment #73)
> Lenovo ThinkPad x270 running fedora 30 here. It was freezing randomly
> constantly, after booting with the ahci.mobile_lpm_policy=0 kernel parameter
> the problem was solved. :)

Hmm, the X270 is quite new, I'm surprised that that has a SATA SSD and not a NVME one and also since it is new I'm somehwat surprised it has this issue.

Anyways let start with gathering some data and going from there.

Is your BIOS at the latest version? If not first please try upgrading your BIOS.

If that does not help, I guess we probably need to to blacklist your SSD from ALPM support, to do this I need the output of the following command:


dmesg | grep -i UDMA

Comment 75 Daniel Simoes 2019-07-10 16:23:26 UTC
(In reply to Hans de Goede from comment #74)
> (In reply to Daniel Simoes from comment #73)
> > Lenovo ThinkPad x270 running fedora 30 here. It was freezing randomly
> > constantly, after booting with the ahci.mobile_lpm_policy=0 kernel parameter
> > the problem was solved. :)
> 
> Hmm, the X270 is quite new, I'm surprised that that has a SATA SSD and not a
> NVME one and also since it is new I'm somehwat surprised it has this issue.
> 
> Anyways let start with gathering some data and going from there.
> 
> Is your BIOS at the latest version? If not first please try upgrading your
> BIOS.
> 
> If that does not help, I guess we probably need to to blacklist your SSD
> from ALPM support, to do this I need the output of the following command:
> 
> 
> dmesg | grep -i UDMA

Hello Hans, thanks for the reply.

It has a NVME, you can check the configs below:

description: NVMe disk
                   product: THNSF5512GPUK TOSHIBA
                   physical id: 0
                   logical name: /dev/nvme0n1
                   size: 476GiB (512GB)
                   capabilities: gpt-1.00 partitioned partitioned:gpt

I just noticed the bios is not at the latest version, mine is: R0IET55W (1.33) and the current release is at 1.36(R0IET58W).

Do you think it is a good idea to remove the ahci.mobile_lpm_policy=0 kernel parameter, update the bios and see if it still freezes?

Oh, and "dmesg | grep -i UDMA" command returns nothing

Comment 76 Hans de Goede 2019-07-11 09:37:06 UTC
(In reply to Daniel Simoes from comment #75)
> It has a NVME, you can check the configs below:
> 
> description: NVMe disk
>                    product: THNSF5512GPUK TOSHIBA
>                    physical id: 0
>                    logical name: /dev/nvme0n1
>                    size: 476GiB (512GB)
>                    capabilities: gpt-1.00 partitioned partitioned:gpt
> 
> I just noticed the bios is not at the latest version, mine is: R0IET55W
> (1.33) and the current release is at 1.36(R0IET58W).
> 
> Do you think it is a good idea to remove the ahci.mobile_lpm_policy=0 kernel
> parameter, update the bios and see if it still freezes?
> 
> Oh, and "dmesg | grep -i UDMA" command returns nothing

So since there is no DVD-drive in this machine and your disk is a NVME SSD, that means you are not using SATA. "dmesg | grep -i UDMA" command returning nothing thus is expected.

This basically means that the ahci.mobile_lpm_policy=0 kernel parameter is not doing anything, since there are no sata links to which to apply this (I think).

So I think that you not having the freezes after setting the ahci.mobile_lpm_policy=0 parameter is a coincidence and you should try running without it. Upgrading the BIOS usually is a good idea regardless.

Comment 77 Erik Sejr 2019-11-29 01:15:52 UTC
I'm presently on Fedora 29 with a Gigabyte BRIX GB-BXi7-5500. After upgrading to Fedora 28 I had this issue which would result in the system locking up hard usually during disk operations (like downloading packages). Eventually I did a clean install of Fedora 29 and ended up with the same problem. I had tried so many things to get a kernel message or a crash dump or SOMETHING to give me a clue as to what was going on with out any success. I had just lived with the problem for over a year and almost given up and replaced this machine with a RPI when I came across the F28 common bugs report looking one last time to solve this problem.

I added ahci.mobile_lpm_policy=0 and BAM! It has not locked up since.

It is a mSATA SSD:

dmesg | grep -i UDMA
[    0.760255] ata4: SATA max UDMA/133 abar m2048@0xf7219000 port 0xf7219280 irq 45
[    2.110033] ata4.00: ATA-9: Samsung SSD 850 EVO mSATA 120GB, EMT41B6Q, max UDMA/133
[    2.113250] ata4.00: configured for UDMA/133

Comment 78 Hans de Goede 2019-11-29 10:06:38 UTC
(In reply to Erik Sejr from comment #77)
> I'm presently on Fedora 29 with a Gigabyte BRIX GB-BXi7-5500. After
> upgrading to Fedora 28 I had this issue which would result in the system
> locking up hard usually during disk operations (like downloading packages).
> Eventually I did a clean install of Fedora 29 and ended up with the same
> problem. I had tried so many things to get a kernel message or a crash dump
> or SOMETHING to give me a clue as to what was going on with out any success.
> I had just lived with the problem for over a year and almost given up and
> replaced this machine with a RPI when I came across the F28 common bugs
> report looking one last time to solve this problem.
> 
> I added ahci.mobile_lpm_policy=0 and BAM! It has not locked up since.
> 
> It is a mSATA SSD:
> 
> dmesg | grep -i UDMA
> [    0.760255] ata4: SATA max UDMA/133 abar m2048@0xf7219000 port 0xf7219280
> irq 45
> [    2.110033] ata4.00: ATA-9: Samsung SSD 850 EVO mSATA 120GB, EMT41B6Q,
> max UDMA/133
> [    2.113250] ata4.00: configured for UDMA/133

Erik,

Thank you for your bug-report. I'm not sure that adding your disk to the LPM blacklist is the right thing to do. Your system uses a 5th generation Intel CPU and we have soon lots of cases where the are LPM triggered issues (enabling LPM allows the CPU to reach much deeper power-saving states when idle) which are not the fault of the disk. Specifically on many Lenovo models a BIOS update fixed these issues. Can you see if there is a BIOS update available for your machine ?   I realize that applying it while you are running Linux may be a pain, but for starters just checking if there is a newer BIOS available would be good. There were some issues with the mSATA models of the samsung 830 SSD series, but yours is an 850 which is pretty new, so I don't really expect any LPM issues with your SSD.

Regards,

Hans

Comment 79 Erik Sejr 2019-12-02 16:07:34 UTC
Hi Hans,
Thanks for the reply. There are no further BIOS updates for the BRIX i7-5500. The last one - revision F4 was in 2015. I battled for many hours to try to figure out a way to update the firmware of the drive its self but alas it was a losing battle. I was able to get the data center version of the Samsung update tool to run, but the only firmware update for the 850 EVO on the samsung website said nothing about the mSATA version. Attempting to apply the non-mSATA 850 EVO update via the samsung tool results in an error saying it is not the correct drive.

It's up to you of course if you add this drive, I'm happy my problem is solved - LPM is not important in my use case as this machine is not battery powered (it is a NUC I use for a home theatre PC). But I will assert that this has definitely solved my issue. Just running a 'dnf update' on this machine to download 500+ MB of package updates it would lock up 5-6 times during the downloads and I would need to reboot it, delete the zero-sized/partially downloaded packages in the cache and restart the DNF update. I was able to update it in one go no problem after making this change, as well as compile and install the latest version of MythTV without a single lockup. I have not been able to that since Fedora 28 was installed on to the machine.

Regards,
Erik

Comment 80 Hans de Goede 2019-12-02 18:31:14 UTC
Hi Erik,

So thinking more about this, although we cannot be 100% sure it is your mobo and not your SSD it is much more likely that it is your mobo which is causing the hangs when LPM is active. As such we should probably just blacklist LPM support for your mobo.

Can you please run: "sudo dmidecode > dmidecode.log" and then attach dmidecode.log here ?

Regards,

Hans

Comment 81 Erik Sejr 2019-12-05 04:14:27 UTC
Created attachment 1642269 [details]
Gigabyte Brix i7-5500 dmidecode

dmidecode output as requested


Note You need to log in before you can comment on or make changes to this bug.