Bug 1361183 - Kernel 4.6.4-301.fc24.x86_64 doesn't boot on Dell Precision 7510
Summary: Kernel 4.6.4-301.fc24.x86_64 doesn't boot on Dell Precision 7510
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: microcode_ctl
Version: 24
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Anton Arapov
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1353103
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-28 13:00 UTC by Donny Davis
Modified: 2017-03-10 14:52 UTC (History)
37 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1353103
Environment:
Last Closed: 2017-03-10 14:52:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Thu Jul 28 09:35:05 EDT 2016 - dmesg output (72.45 KB, text/plain)
2016-07-28 13:35 UTC, Donny Davis
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1360705 0 unspecified CLOSED [abrt] WARNING: CPU: 0 PID: 5075 at drivers/cpufreq/cpufreq.c:2173 cpufreq_update_policy+0x102/0x150 2021-02-22 00:41:40 UTC

Description Donny Davis 2016-07-28 13:00:10 UTC
Dell Precision 7510 produces blank screen with blinking cursor 3 out of 4 times. I must hard reboot the system (hold down power button), and try again. This happens on an off the dock, and also there is no output to capture from the kernel logs. The system has the latest bios update.

http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=7HGW6&fileId=3544296403&osCode=W764&productCode=precision-m7510-workstation&languageCode=en&categoryId=BI

Issue: System will not boot as described below in duplicate bug, I just get a blank screen with a blinking cursor.  

How reproducible:
Happens 3 out of 4 times.

Additional info:
UEFI is disabled, nvidia graphics switching is disabled

uname -r 

4.6.4-301.fc24.x86_64

Package microcode_ctl-2:2.1-13.fc24.x86_64 is already installed

cat /proc/cpuinfo

vendor_id	: GenuineIntel
cpu family	: 6
model		: 94
model name	: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping	: 3
microcode	: 0x8a
cpu MHz		: 1074.199
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs		:
bogomips	: 5427.89
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual


+++ This bug was initially created as a clone of Bug #1353103 +++

Description of problem:
Doesn't boot. I just get a blinking cursor even after removing 'rhgb quiet' from kernel cmdline.

This laptop is skylake with intel graphics, see dmi info.

4.5.7-200 works.

Version-Release number of selected component (if applicable):
kernel-4.5.7-202.x64

How reproducible:
always

Additional info:
UEFI issue? (Some failure mode where there's no boot/kernel output at all)

--- Additional comment from Andy Grover on 2016-07-06 02:39:07 EDT ---

installed 4.6.3-200 from koji and it doesn't work either.

--- Additional comment from Josh Boyer on 2016-07-06 08:02:36 EDT ---

If 4.5.7-200 works, that makes very little sense.  The changes between -200 and -202 are very very limited in scope to CVE fixes only.

What else was updated between the time -200 and -202 were installed?  It should be in your update log.

--- Additional comment from Andy Grover on 2016-07-06 13:04:44 EDT ---

This looks suspicious to me:

initramfs-4.5.6-200.fc23.x86_64.img:                     gzip compressed data, max compression, from Unix
initramfs-4.5.7-200.fc23.x86_64.img:                     gzip compressed data, max compression, from Unix
initramfs-4.5.7-202.fc23.x86_64.img:                     ASCII cpio archive (SVR4 with no CRC)
initramfs-4.6.3-200.fc23.x86_64.img:                     ASCII cpio archive (SVR4 with no CRC)

why would this have changed?

--- Additional comment from Andy Grover on 2016-07-06 13:06 EDT ---



--- Additional comment from Josh Boyer on 2016-07-06 13:12:22 EDT ---

(In reply to Andy Grover from comment #3)
> This looks suspicious to me:
> 
> initramfs-4.5.6-200.fc23.x86_64.img:                     gzip compressed
> data, max compression, from Unix
> initramfs-4.5.7-200.fc23.x86_64.img:                     gzip compressed
> data, max compression, from Unix
> initramfs-4.5.7-202.fc23.x86_64.img:                     ASCII cpio archive
> (SVR4 with no CRC)
> initramfs-4.6.3-200.fc23.x86_64.img:                     ASCII cpio archive
> (SVR4 with no CRC)
> 
> why would this have changed?

That's actually a good find.  So the file type indicated by 4.5.6 and 4.5.7-200 is indicative of an initramfs that lacks early microcode updates tacked on.  The latter two indicate that it does have microcode tacked one.

Looking at your dnf log, we find that microcode-ctl was updated between 4.5.7-200 and 4.5.7-202.

    Upgraded microcode_ctl-2:2.1-10.fc23.x86_64                       @updates

I'm now wondering if that is explicitly the problem here and the ucode that is loaded early (and it is very early in the boot process) is causing the issues.

--- Additional comment from Josh Boyer on 2016-07-06 13:15:38 EDT ---

Anton, have you heard anything about the new Intel microcode update causing boot issues on skylake platforms?

--- Additional comment from Josh Boyer on 2016-07-06 13:29:02 EDT ---

Andy, can you try booting with 'dis_ucode_ldr' added to the kernel command line?

--- Additional comment from Andy Grover on 2016-07-06 14:07:34 EDT ---

(In reply to Josh Boyer from comment #7)
> Andy, can you try booting with 'dis_ucode_ldr' added to the kernel command
> line?

Works!

--- Additional comment from Josh Boyer on 2016-07-06 14:14:58 EDT ---

(In reply to Andy Grover from comment #8)
> (In reply to Josh Boyer from comment #7)
> > Andy, can you try booting with 'dis_ucode_ldr' added to the kernel command
> > line?
> 
> Works!

Well, that's both good and bad.  It's good because we know the cause.  It's bad because if I'm understanding the bugs I found in Arch and Debian, the only way to fix it is via a BIOS/UEFI update for your machine and that has to come from your vendor.

So dis_ucode_ldr is the workaround, but I'm not sure there's going to be a solution beyond "update your firmware when the vendor fixes it."

--- Additional comment from Josh Boyer on 2016-07-06 14:17:19 EDT ---

https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1432557.html
https://bugs.archlinux.org/task/49806

Debian and Arch bugs.

--- Additional comment from Andy Grover on 2016-07-06 15:00:00 EDT ---

OK thanks for the info. FWIW my vendor did have a bios update, and it now reports:

microcode: CPU0 sig=0x406e3, pf=0x80, revision=0x88

whereas before revision was 0x82. Still needs the dis_ucode_ldr to boot.

OK I'll stay on top of future vendor updates.

--- Additional comment from Anton Arapov on 2016-07-07 05:00:05 EDT ---

Andy, Josh, ... there is indeed no way to fix this until fixed by Intel in microcode. Do we want to revert this change? Or can we temporary blacklist/disable it?

--- Additional comment from Josh Boyer on 2016-07-07 07:25:22 EDT ---

I'm not aware of anyway to blacklist it.  The only ways I know how to disable it are per-machine solutions, like the dis_ucode_ldr cmdline option or rebuilding the initramfs to not have the microcode included.

--- Additional comment from Joonas Kylmälä on 2016-07-07 09:05:38 EDT ---

I had this same problem on Lenovo ThinkPad x260 laptop (the same processor: skylake i5-6200U) with 4.6.3-300.fc24.x86_64 kernel. Upgrading BIOS/UEFI and reinstalling Fedora 24 with UEFI mode let me boot the system normally. Not sure if the BIOS/UEFI upgrade did the trick or changing the Fedora to UEFI mode but according to you it seems like the BIOS/UEFI upgrade did the trick. Also, to note, with the 4.5.5-300.fc24.x86_64 kernel Fedora booted normally with the old BIOS/UEFI version, so some regression has happened between 4.5.5-300 and 4.6.3-300.

--- Additional comment from Josh Boyer on 2016-07-07 09:10:29 EDT ---

(In reply to Joonas Kylmälä from comment #14)
> I had this same problem on Lenovo ThinkPad x260 laptop (the same processor:
> skylake i5-6200U) with 4.6.3-300.fc24.x86_64 kernel. Upgrading BIOS/UEFI and
> reinstalling Fedora 24 with UEFI mode let me boot the system normally. Not
> sure if the BIOS/UEFI upgrade did the trick or changing the Fedora to UEFI
> mode but according to you it seems like the BIOS/UEFI upgrade did the trick.
> Also, to note, with the 4.5.5-300.fc24.x86_64 kernel Fedora booted normally
> with the old BIOS/UEFI version, so some regression has happened between
> 4.5.5-300 and 4.6.3-300.

No, this is not a kernel problem.  What happened in your case is that you reinstalled.  The installation media uses an initramfs that does not contain the problematic microcode.

There is nothing we can do in the kernel to fix this.

--- Additional comment from Jonas Thiem on 2016-07-07 11:38:16 EDT ---

Lenovo Thinkpad Yoga 260 is also affected. What do I need to do to fix this? Upgrade the UEFI/BIOS firmware?

--- Additional comment from Josh Boyer on 2016-07-07 11:38:45 EDT ---



--- Additional comment from Josh Boyer on 2016-07-07 11:53:20 EDT ---

(In reply to Jonas Thiem from comment #16)
> Lenovo Thinkpad Yoga 260 is also affected. What do I need to do to fix this?
> Upgrade the UEFI/BIOS firmware?

If one is available, it is certainly worth a try updating it.

--- Additional comment from Andy Grover on 2016-07-07 13:09:06 EDT ---

ok let me get this straight:

1. CPUs have bugs
2. Intel fixes some of these bugs with microcode updates that need to be reloaded every time after poweroff
3. The system firmware can install a microcode update
4. microcode_ctl tries to install some (more recent?) microcode update

Josh, Anton: You're saying the reason for this problem is the microcode we're installing in step 4 is bad? Or the new version is assuming some precondition is met by the firmware so that I need a new firmware rev to work with the most current microcode rev?

The current solutions are to either
a. Add 'dis_ucode_ldr' to the kernel command line so #4 is skipped
b. uninstall microcode_ctl and rebuild initrd with 'dracut -f --kver 4.5.7-202.fc23.x86_64'

with the understanding that we now might not have the latest, greatest microcode (we're solely relying on our firmware to install it, which it may not)

Yes?

--- Additional comment from Josh Boyer on 2016-07-07 13:16:59 EDT ---

(In reply to Andy Grover from comment #19)
> ok let me get this straight:

Pretty close.  Some clarifications.

> 1. CPUs have bugs
> 2. Intel fixes some of these bugs with microcode updates that need to be
> reloaded every time after poweroff
> 3. The system firmware can install a microcode update
> 4. microcode_ctl tries to install some (more recent?) microcode update

Should be:

3. The system firmware is often shipped with microcode included, and loads it during system initialization before even starting the bootloader, etc.
4. Future microcode can be released stand-alone, which can then be loaded by the kernel very early in kernel boot which (normally) allows a machine to get the latest microcode without having to do a full system firmware update.
5. microcode_ctl is the package that distributes said microcode releases
6. Dracut will include microcode in the initramfs if it is present at the time of initramfs creation, and the kernel will load it from that extremely early in boot.

> Josh, Anton: You're saying the reason for this problem is the microcode
> we're installing in step 4 is bad? Or the new version is assuming some
> precondition is met by the firmware so that I need a new firmware rev to
> work with the most current microcode rev?

It is difficult to tell which one of those scenarios is true.  It might be a bit of both, but the latter is given more credence since the new ucode seems to work with some firmware updates from some vendors. 

> The current solutions are to either
> a. Add 'dis_ucode_ldr' to the kernel command line so #4 is skipped
> b. uninstall microcode_ctl and rebuild initrd with 'dracut -f --kver
> 4.5.7-202.fc23.x86_64'

(or downgrade microcode_ctl rather than uninstall it)

and

c. find a system firmware update from your vendor and apply that to see if it works.

> with the understanding that we now might not have the latest, greatest
> microcode (we're solely relying on our firmware to install it, which it may
> not)
> 
> Yes?

You essentially have that all correct, yes.

--- Additional comment from Erik van Pienbroek on 2016-07-07 14:46:36 EDT ---

I can confirm that this issue also exists for the HP Elitebook 850 G3.
BIOS versions 1.04 and 1.05 are affected by this bug. Updating to BIOS version 1.07 resolves the issue

--- Additional comment from Adam Williamson on 2016-07-07 17:15:46 EDT ---



--- Additional comment from Martin Horauer on 2016-07-08 04:03:23 EDT ---

A BIOS update for my Lenovo T460s fixed this issue.

http://thinkwiki.de/BIOS-Update_ohne_optisches_Laufwerk_unter_Linux

--- Additional comment from mlaverdiere on 2016-07-08 08:23:39 EDT ---

On an Asus UX305CA, upgrading the BIOS to version 300 has solved the non-booting problem with kernel 4.6.3 on Fedora 24 (I have always been able to boot with kernel 4.5.7 though).

--- Additional comment from Josh Boyer on 2016-07-08 08:32:42 EDT ---



--- Additional comment from Josh Boyer on 2016-07-15 09:20:26 EDT ---



--- Additional comment from Richard Chan on 2016-07-17 22:47:29 EDT ---



--- Additional comment from Richard Chan on 2016-07-17 23:14:11 EDT ---

On an Asus UX305UA, removing "load_video" from grub menu works.
Curious: why does load_video trigger the failure?

UX305UA does not have an updated BIOS :-(

BOOT_IMAGE=/vmlinuz-4.6.4-301.fc24.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap video=1920x1080 LANG=en_US.UTF-8


microcode_ctl-2.1-12.fc24.x86_64


[    0.000000] microcode: microcode updated early to revision 0x8a, date = 2016-04-06
[    0.724234] microcode: CPU0 sig=0x406e3, pf=0x80, revision=0x8a
[    0.724608] microcode: CPU1 sig=0x406e3, pf=0x80, revision=0x8a
[    0.724976] microcode: CPU2 sig=0x406e3, pf=0x80, revision=0x8a
[    0.725475] microcode: CPU3 sig=0x406e3, pf=0x80, revision=0x8a
[    0.725857] microcode: Microcode Update Driver: v2.01 <tigran.co.uk>, Peter Oruba

--- Additional comment from Richard Chan on 2016-07-18 00:28:45 EDT ---

Sorry for the noise, load_video, was a red herring.

This "success" was achieved by booting into 4.5.7-300 and warm booting into 4.6.4-301 worked. Then the early microcode seemed to work.

--- Additional comment from Rich Jankowski on 2016-07-18 20:24:17 EDT ---

The BIOS update fixed this issue on my 2016 X1 Carbon.

--- Additional comment from Josh Boyer on 2016-07-19 10:08:00 EDT ---



--- Additional comment from Sandro Bonazzola on 2016-07-20 15:27:18 EDT ---

(In reply to Martin Horauer from comment #23)
> A BIOS update for my Lenovo T460s fixed this issue.
> 
> http://thinkwiki.de/BIOS-Update_ohne_optisches_Laufwerk_unter_Linux

Can you provide english instructions?
I've a T460s and have the same exact issue.

BIOS Information
        Vendor: LENOVO
        Version: N1CET37W (1.05 )
        Release Date: 01/15/2016

--- Additional comment from Martin Horauer on 2016-07-20 15:47:01 EDT ---

If you have Windows on your T460s go to the official Lenovo page and download the latest BIOS update along with their update utility.

If not (as in my case) you can use the commands listed on the above german wiki page. The steps are:

(1) Download the latest BIOS Update, e.g.: 

https://download.lenovo.com/pccbbs/mobiles/n1cur06w.iso

(2) Obtain the update tool for Linux:

wget https://userpages.uni-koblenz.de/~krienke/ftp/noarch/geteltorito/geteltorito.pl

(3) Create a bootable image:

geteltorito -o thinkpadbios.img n1cur06w.iso

(4) Place an empty USB stick in your computer and perform the following command (you'll need to replace sdX with the device name for your USB stick showing up).

sudo dd if=thinkpadbios.img of=/dev/sdX bs=1M
sync

(5) Boot from the USB stick and do the BIOS update.

Cross your fingers and you are (hopefully) done.

--- Additional comment from Martin Horauer on 2016-07-20 15:57:53 EDT ---

(In reply to Sandro Bonazzola from comment #32)
> (In reply to Martin Horauer from comment #23)
> > A BIOS update for my Lenovo T460s fixed this issue.
> > 
> > http://thinkwiki.de/BIOS-Update_ohne_optisches_Laufwerk_unter_Linux
> 
> Can you provide english instructions?
> I've a T460s and have the same exact issue.
> 
> BIOS Information
>         Vendor: LENOVO
>         Version: N1CET37W (1.05 )
>         Release Date: 01/15/2016

Sorry I should have replied. See my comment 33.

--- Additional comment from Stefan Midjich on 2016-07-21 03:32:10 EDT ---

I had the same problem om Thinkpad x260 but I followed the instructions of  Martin Horauer, replacing the lenovo BIOS update ISO with the one for my own laptop model. 

After the BIOS upgrade I deleted the workaround dis_ucode_ldr from the boot params and everything worked fine even with latest kernel.

--- Additional comment from Christian Horn on 2016-07-21 04:04:14 EDT ---

(In reply to Martin Horauer from comment #23)
> A BIOS update for my Lenovo T460s fixed this issue.

+1
Installing the currently available Bios 1.13 on the T460s fixes the issue.  The T460s here was shipped just 2 weeks ago, but still with a Bios from March which had the issue.

--- Additional comment from Fedora Update System on 2016-07-21 06:17:29 EDT ---

microcode_ctl-2.1-13.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-17e40fd8da

--- Additional comment from Fedora Update System on 2016-07-21 06:17:54 EDT ---

microcode_ctl-2.1-13.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-a596f3c268

--- Additional comment from Fedora Update System on 2016-07-21 14:48:28 EDT ---

microcode_ctl-2.1-13.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-a596f3c268

--- Additional comment from Fedora Update System on 2016-07-21 14:52:16 EDT ---

microcode_ctl-2.1-13.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-17e40fd8da

--- Additional comment from Andreas Tunek on 2016-07-22 02:39:54 EDT ---

microcode_ctl-2.1-13.fc24 and Linux 4.6.4-301.fc24 works together.

--- Additional comment from Andreas Tunek on 2016-07-22 02:40:57 EDT ---

On my Asus UX305CA with old bios.

--- Additional comment from Fedora Update System on 2016-07-22 14:21:24 EDT ---

microcode_ctl-2.1-13.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

--- Additional comment from  on 2016-07-23 06:51:11 EDT ---

Just to clarify, can I safely update via dnf now or do I need to apply the microcode first?

--- Additional comment from Edgar Hoch on 2016-07-23 19:20:09 EDT ---

(In reply to imbacen from comment #44)
> Just to clarify, can I safely update via dnf now or do I need to apply the
> microcode first?

Yes, you can safely update via dnf.

Package microcode_ctl-2.1-13.fc24 contains no package specific scriptlet so an update only changes the files on disk. Microcode is only load on boot, so nothing changes to the cpu on package installation.

To use the new microcode, initrd for the kernel to boot needs to be (re)created after the new microcode_ctl package is installed. This is done automatically by a new installed kernel package, or you can do it manually using dracut (see "man dracut"). For example, for the current running kernel, run

sudo dracut --force

Comment 1 Donny Davis 2016-07-28 13:27:06 UTC
I switched to the rawhide repo, and installed kernel 4.7.0-0.rc7.git4.1.fc25.x86_64

and installed microcode_ctl 
Package microcode_ctl-2:2.1-13.fc25.x86_64

I also disabled sleep states in the BIOS. 

I have to shut the machine down for a little while to consistently produce the error, seems to be 100% when the machine in cold. 

I will report back with findings. 

If the issue is resolved with the new kernel please close. 

I will also downgrade the kernel back to the reported one in the bug report

4.6.4-301.fc24.x86_64

and try to reproduce with sleep states disabled to isolate the error to either sleep states or kernel. 

Thanks
 Donny D

Comment 2 Anton Arapov 2016-07-28 13:30:53 UTC
Donny, could you attach your dmesg output here please.

Comment 3 Donny Davis 2016-07-28 13:35:21 UTC
Created attachment 1185111 [details]
Thu Jul 28 09:35:05 EDT 2016 - dmesg output

Comment 4 Donny Davis 2016-07-28 14:01:50 UTC
I shut the machine down for 30 minutes (or so) and the issue is still present on kernel 4.7.0. However its now 50% that it will lock up. 

Thank you

Donny D

Comment 5 Donny Davis 2016-07-28 14:26:15 UTC
After a downgrade to the default kernel in F24, the issue remains with sleep states disabled (i was really hoping that was the issue)

Freeze at blinking cursor is now back to 3 out of 4 times. 

There is no output that I can find anywhere that points to the issue, or shows what is failing. 

quiet is turned off, and there is still no output. 

If there is anything else I can gather to help with this bug report, please let me know and I will promptly provide.

Thanks

Donny D

Comment 6 Andy Lutomirski 2016-07-30 15:27:08 UTC
Can you add earlyprintk=efi,keep and remove quiet and try booting again?

Comment 7 Suyash Chouhan 2017-03-10 12:05:40 UTC
To know how to [url=http://androidtrickz.com/]download Android apk Free[/url] Free visit here

Comment 8 Donny Davis 2017-03-10 14:21:35 UTC
I have moved on to F25, so this is no longer an issue for me. the F25 kernel does not have this problem


Note You need to log in before you can comment on or make changes to this bug.