Bug 1351943 - Almost never boot on Lenovo T460s since 4.5.7-300.fc24.x86_64
Almost never boot on Lenovo T460s since 4.5.7-300.fc24.x86_64
Status: CLOSED DUPLICATE of bug 1353103
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
24
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-07-01 04:29 EDT by Maxime Coquelin
Modified: 2016-07-08 14:23 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-08 08:32:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Successfull kernel boot with 4.6.3-300.fc24.x86_64 (52.06 KB, text/plain)
2016-07-01 04:29 EDT, Maxime Coquelin
no flags Details
lshw sanitized output (17.22 KB, text/plain)
2016-07-01 04:29 EDT, Maxime Coquelin
no flags Details

  None (edit)
Description Maxime Coquelin 2016-07-01 04:29:05 EDT
Created attachment 1174842 [details]
Successfull kernel boot with 4.6.3-300.fc24.x86_64

Description of problem:
The kernel does not boot without any log, even with quiet replaced with debug in kernel command line.

With debug parameter, only message seen is:
Probing EDD (edd=off to disable)... OK

Then machine hangs.

When adding the debug option, I succeeded once to boot 4.6.3-300.fc24.x86_64.
I do not see any problem in the kernel log (see attachment)

Version-Release number of selected component (if applicable):
The problem start to appear with 4.5.7-300.fc24.x86_64, it boots fine with 4.5.5-300.fc24.x86_64.

The versions I tested:
 - 4.5.5-300.fc24.x86_64 : OK
 - 4.5.7-300.fc24.x86_64 : KO
 - 4.6.3-300.fc24.x86_64 : KO

How reproducible:
Only succeeded to boot once over 20 tries.
This one was with debug parameter enabled.

Steps to Reproduce:
1. Boot a kernel >= 4.5.7-300.fc24.x86_64


Actual results:
The machine does not boot.

Expected results:
The machine boots

Additional info:
I didn't tested mainline Kernel, but could do it on demand.
And didn't tried any bisection for now.

Let me know if you need more information, I can also test patches if needed.

Thanks,
Maxime
Comment 1 Maxime Coquelin 2016-07-01 04:29 EDT
Created attachment 1174843 [details]
lshw sanitized output
Comment 2 Josh Boyer 2016-07-01 08:23:18 EDT
There isn't enough information to work from here unfortunately.  If you remove the 'rhgb' option from the cmdline, do you see further output?
Comment 3 Maxime Coquelin 2016-07-01 12:29:05 EDT
Removing 'rhgb' does not help, there is no further output.
I will try to reproduce with mainline Kernel, and let you know about the results.
Comment 4 Maxime Coquelin 2016-07-01 17:08:23 EDT
Josh,

I tried mainline Kernel tag v4.5.7, and it boots fine (tried 4 times).
I will continue testing with patches that are applied on top of it.
Comment 5 Fabio Alessandro Locati 2016-07-02 06:08:55 EDT
Hello,

I have the T460s too and I'm having exactly the same problem.
Comment 6 Simone 2016-07-03 04:51:55 EDT
I had the same problem with Lenovo T460
4.5.7-300.fc24.x86_64 : OK
4.6.3-300.fc24.x86_64 : KO

I (apparently) solved the problem by updating the bios to the last version
# dmidecode -s bios-version
R06ET38W (1.12)
# dmidecode -s bios-release-date
06/01/2016

Up to now I succeeded to boot 5/5 :)

I hope this helps,
Simo
Comment 7 Fabio Alessandro Locati 2016-07-03 06:01:34 EDT
I have updated my bios from version 1.05 to version 1.13 and it seems to have fixed the problem. Thanks a lot Simone for the hint :)

sudo dmidecode -s bios-version
N1CET45W (1.13 )
sudo dmidecode -s bios-release-date
06/02/2016
Comment 8 Maxime Coquelin 2016-07-03 16:43:06 EDT
Thanks a lot Simone!
As Fabio, I did the bios update, and confirm it solves the problem.

Ticket can now be closed.

Regards,
Maxime
Comment 9 Fabio Alessandro Locati 2016-07-03 16:50:21 EDT
I think this should _NOT_ be closed. We can not ask thousands of people to update their BIOS because one of our patches creates problems. If a patch creates the problem, the patch should be fixed.
Comment 10 Maxime Coquelin 2016-07-04 03:27:05 EDT
Yes, you are right Fabio.
I re-open the ticket.
Can we downgrade the BIOS in a safely manner?
I would need to downgrade to reproduce the problem.
Comment 11 Maxime Coquelin 2016-07-04 03:46:20 EDT
Simone, do you know more about the root cause?
Or you were somehow lucky it did fixed the problem?

Thanks,
Maxime
Comment 12 Fabio Alessandro Locati 2016-07-04 03:54:07 EDT
I'm not sure if the downgrades are safe. IIRC, with Lenovo BIOS they are unless stated otherwise [0].
I can try to loop in some more people without the BIOS updated and see if it's possible to debug this even further, even if I think the only feasible way to do so is try&errors... long and boring :(.

Looking at the changelog for the T460s firmware [1], we can notice only the following elements:


<1.13>
 UEFI: 1.13 / ECP: 1.09
- (New) Added support of Remote Secure Erase with Intel AMT.
- (New) Updated the CPU microcode.
- (Fix) Fixed an issue where 16-bit aplication might not work in command prompt.

<1.11>
 UEFI: 1.11 / ECP: 1.09
- (New) Updated the CPU microcode.
- (New) New POST logo support.
- (Fix) Fixed Ethernet LAN Option ROM might not work properly with PoE Ethernet
        Switch.
- (Fix) Fixed an issue where legacy boot might fail with NVMe device after trying
        to boot from other device.
- (Fix) Added a workaround for Trackpoint does not work on pre-boot environment.
- (Fix) Thermal function was improved.

<1.08>
 UEFI: 1.08 / ECP: 1.08
- (New) Updated the CPU microcode.
- (New) Support legacy boot on system with NVMe SSD. (Driver is needed.)
- (Fix) Fixed an issue where "Boot Order Lock" in ThinkPad Setup cannot
        be changed to "Enabled".
- (Fix) Fixed an issue where NVMe SSD system cannot wake from sleep state
        when Hard Disk Password is installed.

<1.06>
 UEFI: 1.06 / ECP: 1.06
- (New) Updated the CPU microcode.
- (New) System can be turned on without AC adapter even if bottom cover
        was opened.
- (New) Added support for Windows 10 Device Guard feature.
- (Fix) Fixed an issue where error message may show on Windows 7 32 bit
        system with hard disk password after resume from hibernation.

Aside from Windows-specific stuff and Logo changes, I notice that:
- There have been multiple updates for the CPU microcode
- Two fixes for the NVMe SSD (in 1.08 and 1.11)

When I was trying to troubleshoot this with Simone in IRC, my first guess was a problem to access the disk, so I'm still pretty confident that the second category of fixes could be related. The other option is that the CPU is not recognised properly (for some reasons, this seems to me to be odd/improbable).
A similar patch is present in the 1.10 version of the T460 [2].

@Maxime: Have you noticed any patch on the NVMe subset?


[0] For instance, look at version 1.17 of the ThinkPad X250 https://download.lenovo.com/pccbbs/mobiles/n10ur10w.txt
[1] https://download.lenovo.com/pccbbs/mobiles/n1cur06w.txt
[2] https://download.lenovo.com/pccbbs/mobiles/r06uj38d.txt
Comment 13 Maxime Coquelin 2016-07-04 05:02:59 EDT
@Fabio, No, I haven't noticed any patch related to NVMe.

Looking at the comments history, it looks like I forgot to save a comment I added on Saturday afternoon.

What I was saying is that I retried mainline Kernel, and it didn't boot either.
This time the laptop was not into its dock station, and didn't boot either mainline v4.5.5 and v4.5.7.

I retried with the laptop docked, and this time, the boot was not 100% successful.
I succeed to boot mainline v4.5.5/v4.5.7 after about 4 trials.

So I think we should not focus only on patches that are on top of mainline Kernel,
even if 4.5.5-300.fc24.x86_64 is booting 100% of the times, with laptop docked or not docked.

Thanks,
Maxime
Comment 14 Simone 2016-07-04 05:20:05 EDT
Moreover, I point out that 4.5.7-300.fc24.x86_64 was working fine in my Lenovo 460s even before the bios update. We have the following:

T460
 4.5.5-300.fc24.x86_64 : OK
 4.5.7-300.fc24.x86_64 : OK
 4.6.3-300.fc24.x86_64 : KO
T460s
 4.5.5-300.fc24.x86_64 : OK
 4.5.7-300.fc24.x86_64 : KO
 4.6.3-300.fc24.x86_64 : KO

This may help in the debugging process...
Simo
Comment 15 Fabio Alessandro Locati 2016-07-04 06:36:41 EDT
@Simone: Do you know which version of the BIOS did you had before the update?
Comment 16 Fabio Alessandro Locati 2016-07-04 11:20:59 EDT
If anyone want to take a look at changes from 4.5.5 to 4.5.7, they can be found at: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/diff/?id=v4.5.7&id2=v4.5.5&context=3&ignorews=0&dt=1
Comment 17 Maxime Coquelin 2016-07-04 11:54:47 EDT
@Fabio,
as I mentionned in comment 13, I had troubles to boot both mainline v4.5.5 and v4.5.7, whereas Fedora Kernel 4.5.5-300.fc24.x86_64 was booting 100% of the trials I made...
Comment 18 Simone 2016-07-04 16:31:46 EDT
@Fabio, yes, it was the first one, i.e. v. 1.07
Comment 19 Thomas Moschny 2016-07-06 17:42:43 EDT
To add a datapoint, here a T460s with BIOS 1.11 boots 4.5.7-300.fc24.x86_64 fine and hangs with kernel-4.6.3-300.fc24.x86_64.
Comment 20 Josh Boyer 2016-07-06 18:20:53 EDT
For those that cannot boot, what version of microcode_ctl do you have installed?  We've identified that some skylake machines cannot boot with the most recent microcode_ctl update because their system's BIOS/UEFI firmware seems incompatible.

You can try booting with 'dis_ucode_ldr' on the kernel command line.  If that works and you have microcode_ctl-2.1-12.fc23 installed, then that is likely the problem.  Unfortunately, since it is a conflict between the CPU microcode and the system firmware, the only viable solutions are to either boot with ucode loading disabled or to update the firmware.
Comment 21 Thomas Moschny 2016-07-07 03:04:57 EDT
I can confirm that on this T460s with BIOS 1.11 booting 4.6.3-300.fc24.x86_64 works when adding 'dis_ucode_ldr' to the kernel cmdline, while microcode_ctl-2.1-12.fc24.x86_64 is installed.
Comment 22 Josh Boyer 2016-07-07 07:27:18 EDT
(In reply to Thomas Moschny from comment #21)
> I can confirm that on this T460s with BIOS 1.11 booting
> 4.6.3-300.fc24.x86_64 works when adding 'dis_ucode_ldr' to the kernel
> cmdline, while microcode_ctl-2.1-12.fc24.x86_64 is installed.

Thank you.  That confirms that for your machine at least, it is not a kernel issue but a problem with the microcode/firmware interaction.
Comment 23 Sandra Thieme 2016-07-07 07:55:25 EDT
I also have the described problem on my thinkpad x260 since kernel 4.6.3-300.fc24.x86_64 with microcode_ctl 2.1-12.fc24, and BIOS version 1.19.

I can confirm that booting with 'dis_unicode_ldr' added to the cmdline works well.
Comment 24 Thomas Moschny 2016-07-07 10:46:26 EDT
(In reply to Josh Boyer from comment #22)
> (In reply to Thomas Moschny from comment #21)
> > I can confirm that on this T460s with BIOS 1.11 booting
> > 4.6.3-300.fc24.x86_64 works when adding 'dis_ucode_ldr' to the kernel
> > cmdline, while microcode_ctl-2.1-12.fc24.x86_64 is installed.
> 
> Thank you.  That confirms that for your machine at least, it is not a kernel
> issue but a problem with the microcode/firmware interaction.

And after upgrading to BIOS 1.13, I can boot the 4.6.3-300.fc24.x86_64 without disabling the microcode loader.
Comment 25 Yogendra Jog 2016-07-08 04:09:11 EDT
I encountered same issue with F23 and F24 latest kernel + Lenovo T460s.

F23 = kernel-4.5.7-202.fc23.x86_64 

F24 = kernel-4.6.3-300.fc24.x86_64


After reading this bug I upgraded the BIOS to

$ sudo dmidecode -s bios-version
N1CET45W (1.13 )

$ sudo dmidecode -s bios-release-date
06/02/2016

I was able to boot with F23 kernel-4.5.7-202.fc23.x86_64.  

Now upgraded to F24 kernel-4.6.3-300.fc24.x86_64 and it boots without any issues.  

So BIOS upgrade worked perfectly.

Thank you very much.
Comment 26 Josh Boyer 2016-07-08 08:32:42 EDT
This is almost certainly the microcode_ctl issue we've recently discovered.  I'm going to duplicate it to that bug.

*** This bug has been marked as a duplicate of bug 1353103 ***
Comment 27 Jared Hocutt 2016-07-08 14:23:44 EDT
I have also been hitting this issue on my T460s with BIOS 1.11. Due to the known graphics instability I've been trying different kernels (even 4.7.x from rawhide) and here's been my experience.

I was able to boot without changes:
- 4.5.7-300.fc24.x86_64
- 4.7.0-0.rc4.git1.1.fc25.x86_64
- 4.7.0-0.rc4.git0.1.fc25.x86_64

I was hitting the problem described in this bug:
- 4.7.0-0.rc5.git3.1.fc25.x86_64
- 4.6.3-300.fc24.x86_64

For the kernels I was hitting this problem with, I was able to add "dis_ucode_ldr" and get them to boot successfully. I then updated my BIOS to 1.13 (the latest I could find on the Lenovo site) and am able to boot all of the kernels listed above without any modifications.

Note You need to log in before you can comment on or make changes to this bug.