Bug 1846119

Summary: [rhel-8.3.0] skylake (06-4e-03) microcode update hangs
Product: Red Hat Enterprise Linux 8 Reporter: Jeff Bastian <jbastian>
Component: microcode_ctlAssignee: Eugene Syromiatnikov <esyr>
Status: CLOSED ERRATA QA Contact: Jeff Bastian <jbastian>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.2CC: skozina
Target Milestone: rcKeywords: ZStream
Target Release: 8.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: microcode_ctl-20200609-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1846133 1846134 1848438 1848439 1848440 (view as bug list) Environment:
Last Closed: 2020-11-04 01:45:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1819241, 1848438, 1848439, 1848440    

Description Jeff Bastian 2020-06-10 20:00:03 UTC
Description of problem:
Microcode revision 0xdc for CPU 06-4e-03 (Skylake) can cause the system to freeze when applying the early microcode update.  Late updates seem to be ok, at least on this system.

[root@intel-skylake-y-01 ~]# lscpu | egrep -i -e family -e model -e stepping
CPU family:          6
Model:               78
Model name:          Intel(R) Core(TM) m5-6Y54 CPU @ 1.10GHz
Stepping:            3

[root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/cpu0/microcode/version 
0x2d

[root@intel-skylake-y-01 ~]# yum -y update microcode_ctl
...

[root@intel-skylake-y-01 ~]# rpm -q microcode_ctl
microcode_ctl-20191115-4.20200602.2.el8_2.x86_64

[root@intel-skylake-y-01 ~]# rpm -q --provides microcode_ctl | grep iucode_rev.*06-4e-03
iucode_rev(fname:intel/06-4e-03;cpuid:000406e3;pf_mask:0xc0) = 0xdc
iucode_rev(fname:intel/06-4e-03;platform_id:0x40) = 0xdc
iucode_rev(fname:intel/06-4e-03;platform_id:0x80) = 0xdc

[root@intel-skylake-y-01 ~]# echo 1 > /sys/devices/system/cpu/microcode/reload

[root@intel-skylake-y-01 ~]# cat /sys/devices/system/cpu/cpu0/microcode/version 
0xdc

[root@intel-skylake-y-01 ~]# dracut --force --early-microcode

[root@intel-skylake-y-01 ~]# reboot
...

<<<HANG AFTER GRUB SCREEN>>>


Version-Release number of selected component (if applicable):
microcode_ctl-20191115-4.20200602.2.el8_2.x86_64

How reproducible:
always

Steps to Reproduce:
1. install RHEL-8.2.0
2. apply z-stream update to microcode_ctl-20191115-4.20200602.2.el8_2.x86_64
3. dracut --force --early-ucode
4. reboot

Actual results:
hang after grub

Expected results:
normal boot

Additional info:
Upstream bug report:
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/31

Comment 1 Eugene Syromiatnikov 2020-06-16 17:44:19 UTC
06-5e-03 is to be blacklisted as well per [1].

[1] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/31#issuecomment-644885826

Comment 6 Jeff Bastian 2020-07-01 20:47:06 UTC
Verified with microcode_ctl-20200609-2.el8

By default, the microcode update is blacklisted on 06-4e-03 and 06-5e-03 systems and thus the tasks fail (since they're not running the latest microcode):
https://beaker.engineering.redhat.com/recipes/8508203
https://beaker.engineering.redhat.com/recipes/8508205

If the microcode update is forced (with /etc/microcode_ctl/ucode_with_caveats/force-intel-06-4e-03 and force-intel-5e-03 files) then the 06-4e-03 system hangs (which is why it's blacklisted), and the 06-5e-03 system gets lucky and the update works and the tasks pass:

https://beaker.engineering.redhat.com/recipes/8508204
https://beaker.engineering.redhat.com/recipes/8508206

Comment 7 Jeff Bastian 2020-07-01 20:51:33 UTC
(In reply to Jeff Bastian from comment #6)
> If the microcode update is forced (with
> /etc/microcode_ctl/ucode_with_caveats/force-intel-06-4e-03 and
> force-intel-5e-03 files) then the 06-4e-03 system hangs

Note: interrupting grub and adding dis_ucode_ldr to the kernel command line args allows the system to boot

Comment 10 errata-xmlrpc 2020-11-04 01:45:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (microcode_ctl bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4489