Bug 1630681

Summary: Installing microcode_ctl-2:2.1-29.16.el7_5.x86_64 locks my X1 Carbon 6th
Product: Red Hat Enterprise Linux 7 Reporter: W. Trevor King <wking>
Component: microcode_ctlAssignee: Eugene Syromiatnikov <esyr>
Status: CLOSED WORKSFORME QA Contact: Rachel Sibley <rasibley>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: dexter, skozina, wking
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-07 19:46:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
KBL-U/Y microcode, rev. 0x8e
none
CFL-U43e microcode file, rev. 0x96
none
KBL-U/Y microcode, rev. 0x8e
none
CFL-U43e microcode file, rev. 0x96
none
license, as it requires doing so none

Description W. Trevor King 2018-09-19 05:35:42 UTC
Description of problem:

Updating with yum hung my RHEL 7.5 CSB .  The screen froze, but still displayed it's previous content.  The fan spun up briefly and then settled back down.  They keyboard and mouse were unresponsive.  Rebooting got me a working system back, but attempting to redo the failed yum transaction hung me up again.  So I'm giving up on the update and filing this bug ;).

Version-Release number of selected component (if applicable):

$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux Workstation"
VERSION="7.5 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Workstation"
VARIANT_ID="workstation"
VERSION_ID="7.5"
PRETTY_NAME="Red Hat Enterprise Linux Workstation 7.5 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.5:GA:workstation"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.5
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.5"
$ sudo yum history info 46
Loaded plugins: changelog, fs-snapshot, priorities, refresh-packagekit, rhnplugin, rpm-warm-cache, verify
This system is receiving updates from RHN Classic or Red Hat Satellite.
Transaction ID : 46
Begin time     : Tue Sep 18 21:43:39 2018
Begin rpmdb    : 1893:5858c214524454e24e6ea5907cbb0e7a0c2000f8
User           : root <root>
Return-Code    : ** Aborted **
Command Line   : -y -e 0 update
Transaction performed with:
    Installed     rpm-4.11.3-32.el7.x86_64                @CSB-RHEL75-updates/7.5
    Installed     yum-3.4.3-158.el7.noarch                @CSB-RHEL75-updates/7.5
    Installed     yum-metadata-parser-1.1.4-10.el7.x86_64 @CSB-RHEL75-updates/7.5
    Installed     yum-rhn-plugin-2.0.1-10.el7.noarch      @CSB-RHEL75-updates/7.5
Packages Altered:
 ** Updated SpiderOakGroups-1:7.1.0-1.x86_64       @CSB-RHEL75-updates/7.5
    Update                  1:7.3.0-1.el7.x86_64   installed
 ** Updated firefox-52.8.0-1.el7_5.x86_64          @CSB-RHEL75-updates/7.5
    Update          60.2.0-1.el7_5.x86_64          installed
 ** Updated microcode_ctl-2:2.1-29.2.el7_5.x86_64  @CSB-RHEL75-updates/7.5
 ** Update                2:2.1-29.16.el7_5.x86_64 installed
history info
$ rpm -q microcode_ctl
microcode_ctl-2.1-29.2.el7_5.x86_64
microcode_ctl-2.1-29.16.el7_5.x86_64

I've been able to get the SpiderOakGroups and Firefox updates through since, which is why I suspect the error is the microcode update (also, it seems unlikely that either of the other two could hang my system).

How reproducible:

It happened twice in two attempts at the upgrade, so so far it's 100% reproducible.

Steps to Reproduce:
1. $ sudo yum remove microcode_ctl-2.1-29.16.el7_5.x86_64  # to convince yum that only 2.1-29.2 is installed
2. $ sudo yum upgrade -y  # to trigger an attempt to re-install 2.1.29.16

Actual results:

System hangs.

Expected results:

Successful update.

Additional notes:

$ uname -a
Linux trking.remote.csb 3.10.0-891.el7.x86_64 #1 SMP Mon May 21 14:10:11 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
$ grep 'model\|microcode' /proc/cpuinfo  | sort | uniq
microcode	: 0x84
model		: 142
model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
$ sudo lshw | head -n5
trking.remote.csb           
    description: Notebook
    product: 20KGS23S00 (LENOVO_MT_20KG_BU_Think_FM_ThinkPad X1 Carbon 6th)
    vendor: LENOVO
    version: ThinkPad X1 Carbon 6th

Comment 2 Eugene Syromiatnikov 2018-09-20 18:28:24 UTC
That's pretty strange, the only non-trivial thing that microcode_ctl does during installation is calling dracut and triggering microcode reload, may I ask to try to run it ("dracut -f -v") manually?

And since microcode_ctl-2.1-29.16.el7_5.x86_64 contains updated versions of 06-8e-09 and 06-8e-0a microcode files, may I ask to try supplying them manually (by copying to /lib/firmware/updates, for example) and triggering late microcode load (sudo sh -c "echo 1 > /sys/devices/system/cpu/microcode/reload")?

The other thing that might help nailing down the cause of the issue is to switch to VT console and try to performing actions there, in case it is a kernel panic.

BTW, kernel-3.10.0-891.el7 has some issues related to microcode loading, so I would suggest updating it to 3.10.0-862.14.1 (or newer) or 3.10.930 (or newer) at some point.

Comment 3 Eugene Syromiatnikov 2018-09-20 18:29:51 UTC
Created attachment 1485257 [details]
KBL-U/Y microcode, rev. 0x8e

Comment 4 Eugene Syromiatnikov 2018-09-20 18:30:43 UTC
Created attachment 1485259 [details]
CFL-U43e microcode file, rev. 0x96

Comment 5 Eugene Syromiatnikov 2018-09-20 18:31:37 UTC
Created attachment 1485260 [details]
KBL-U/Y microcode, rev. 0x8e

Comment 6 Eugene Syromiatnikov 2018-09-20 18:32:02 UTC
Created attachment 1485261 [details]
CFL-U43e microcode file, rev. 0x96

Comment 7 Eugene Syromiatnikov 2018-09-20 18:32:33 UTC
Created attachment 1485262 [details]
license, as it requires doing so

Comment 8 W. Trevor King 2018-11-08 23:31:28 UTC
Sorry for the delay, I forgot to setup Bugzilla notifications :/.

> ... may I ask to try to run it ("dracut -f -v") manually?

$ sudo dracut -f -v
[sudo] password for trking: 
Executing: /sbin/dracut -f -v
dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
dracut module 'dmsquash-live-ntfs' will not be installed, because command 'ntfs-3g' could not be found!
dracut module 'cifs' will not be installed, because command 'mount.cifs' could not be found!
dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
dracut module 'dmsquash-live-ntfs' will not be installed, because command 'ntfs-3g' could not be found!
dracut module 'cifs' will not be installed, because command 'mount.cifs' could not be found!
*** Including module: bash ***
*** Including module: nss-softokn ***
*** Including module: i18n ***
*** Including module: network ***
*** Including module: ifcfg ***
*** Including module: drm ***
*** Including module: plymouth ***
*** Including module: crypt ***
*** Including module: dm ***
Skipping udev rule: 64-device-mapper.rules
Skipping udev rule: 60-persistent-storage-dm.rules
Skipping udev rule: 55-dm.rules
*** Including module: kernel-modules ***
*** Including module: lvm ***
Skipping udev rule: 64-device-mapper.rules
Skipping udev rule: 56-lvm.rules
Skipping udev rule: 60-persistent-storage-lvm.rules
*** Including module: resume ***
*** Including module: rootfs-block ***
*** Including module: terminfo ***
*** Including module: udev-rules ***
Skipping udev rule: 40-redhat-cpu-hotplug.rules
Skipping udev rule: 91-permissions.rules
*** Including module: biosdevname ***
*** Including module: systemd ***
*** Including module: usrmount ***
*** Including module: base ***
*** Including module: fs-lib ***
*** Including module: shutdown ***
*** Including modules done ***
*** Installing kernel module dependencies and firmware ***
*** Installing kernel module dependencies and firmware done ***
*** Resolving executable dependencies ***
*** Resolving executable dependencies done***
*** Hardlinking files ***
*** Hardlinking files done ***
*** Stripping files ***
*** Stripping files done ***
*** Generating early-microcode cpio image contents ***
*** Constructing GenuineIntel.bin ****
*** Store current command line parameters ***
*** Creating image file ***
*** Creating microcode section ***
*** Created microcode section ***
*** Creating image file done ***
*** Creating initramfs image file '/boot/initramfs-3.10.0-891.el7.x86_64.img' done ***

> And since microcode_ctl-2.1-29.16.el7_5.x86_64 contains updated versions of 06-8e-09 and 06-8e-0a microcode files, may I ask to try supplying them manually (by copying to /lib/firmware/updates, for example) and triggering late microcode load (sudo sh -c "echo 1 > /sys/devices/system/cpu/microcode/reload")?

$ rpm2cpio /var/cache/yum/x86_64/7Workstation/stage-rhel-x86_64-workstation-7.5/packages/microcode_ctl-2.1-29.el7.x86_64.rpm | cpio -idm
3155 blocks
$ sudo cp -v lib/firmware/intel-ucode/06-8e-0* /lib/firmware/updates/
‘lib/firmware/intel-ucode/06-8e-09’ -> ‘/lib/firmware/updates/06-8e-09’
‘lib/firmware/intel-ucode/06-8e-0a’ -> ‘/lib/firmware/updates/06-8e-0a’
$ sudo sh -c "echo 1 > /sys/devices/system/cpu/microcode/reload"

> BTW, kernel-3.10.0-891.el7 has some issues related to microcode loading, so I would suggest updating it to 3.10.0-862.14.1 (or newer) or 3.10.930 (or newer) at some point.

I'm just on the stock CSB pipeline.  I expect newer kernels will just come down those pipes to me later?

I don't have the old microcode_ctl-2.1-29.16.el7_5.x86_64 RPM, but the current microcode_ctl-2.1-29.el7.x86_64 seems to install without problems.  Probably not worth worrying about the .16.el7_5 anymore.

Comment 9 Arjen Heidinga 2019-05-20 12:44:13 UTC
I am seeing exactly this on my Fedora laptop. Kernel version seems not to matter. The machine failed to boot somewhere last week, I was left clueless.
Directly after grub I was left with a black (dark-purple) screen, and a hysteric fan. Tried all tricks from the book, but I was unable to get characters printed on my screen.

Only this morning (after a reinstall) I had the idea that very maybe it was the microcode. Booted older kernel, did the --no-early-microcode and presto.
Running 'echo 1 > /sys/devices/system/cpu/microcode/reload' hangs the system instantly. Even before the terminal actually prints the line when pasting with a newline.

$ uname -a
Linux arjenbook 5.0.16-300.fc30.x86_64 #1 SMP Tue May 14 19:33:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ grep '^model name' /proc/cpuinfo | head -1
model name	: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz

$ rpm -qa | grep microcode
microcode_ctl-2.1-29.fc30.x86_64

Comment 10 Eugene Syromiatnikov 2019-05-20 14:34:25 UTC
(In reply to Arjen Heidinga from comment #9)
> $ uname -a
> Linux arjenbook 5.0.16-300.fc30.x86_64 #1 SMP Tue May 14 19:33:09 UTC 2019
> x86_64 x86_64 x86_64 GNU/Linux
> 
> $ grep '^model name' /proc/cpuinfo | head -1
> model name	: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
> 
> $ rpm -qa | grep microcode
> microcode_ctl-2.1-29.fc30.x86_64

Please note that this is a RHEL 7 microcode_ctl late microcode loading-related bug, not a early/late microcode loading Fedora bug.

Regarding your issue, the model name mentioned has CPUID of 0x806eb (family 0x6, model 0x8e, stepping 0xb), its microcode first appeared in microcode-20190312.tar.gz[1] and packaged in microcode_ctl-2.1-28.fc30.x86_64[2] at revision 0xa4 for CPUID 0x806eb, and then updated to revision 0xb8 in microcode-20190514.tar.gz[3], packaged in microcode_ctl-2.1-28.fc30.x86_64[4]. The last of these updates may have bumped shipped microcode revision to the one greater that the one stored and applied in the system's firmware.

I would suggest updating your system's firmware (in case it includes a newer microcode revision for the aforementioned CPUID) and reporting the issue to your system vendor (Lenovo) and/or your CPU vendor (Intel) along with the firmware-provided microcode revision, especially considering the presence of a bug that mentions the same CPU model[5], since OS-driven microcode update may lead to issues in some cases and is provided on "best effort" basis as a convenience.

[1] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20190312
[2] https://koji.fedoraproject.org/koji/buildinfo?buildID=1265831
[3] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20190514
[4] https://koji.fedoraproject.org/koji/buildinfo?buildID=1267845
[4] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/1

Comment 11 Arjen Heidinga 2019-05-21 09:55:05 UTC
I see now that my issue indeed is unrelated. I thought it was very similar, however I appear to have thought wrong.
Thank you for the pointers, they are much appreciated. Apologies for the noise I created.

Comment 12 Eugene Syromiatnikov 2019-11-18 02:12:01 UTC
Hello, there was a new microcode_ctl release recently (the latest revision of 06-8e-09 and 06-8e-0a microcode files is 0xc6 now), has the new microcode affected the observed behaviour, by chance?

Comment 13 Eugene Syromiatnikov 2020-05-07 19:46:16 UTC
Closing per comment 8.

Comment 14 Red Hat Bugzilla 2023-09-14 04:38:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days