Bug 1833037

Summary: [RHEL 7] Narrow down the SKX (SKL-SP/X/W/D) microcode blacklist
Product: Red Hat Enterprise Linux 7 Reporter: Eugene Syromiatnikov <esyr>
Component: microcode_ctlAssignee: Eugene Syromiatnikov <esyr>
Status: CLOSED ERRATA QA Contact: Jeff Bastian <jbastian>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.9CC: skozina
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: microcode_ctl-2.1-65.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-29 20:12:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1788592    

Description Eugene Syromiatnikov 2020-05-07 17:14:55 UTC
Currently, due to [1], all microcode updates past revision 0x2000064 are disabled on SKX (1st generation Xeon Skalable Platform, Skylake Scalable Platform, SKL-SP, SKL-X, SKL-W, SKL-D; FF-MM-SS 06-55-04, CPUID 0x50654) by default.  This, in fact, too broad blacklist, as the issue affects only Workstation (SKL-W, SKX-W) and HEDT (SKL-X, SKX-X, Basin Falls) segments.  As it was suggested quite some time ago, these can be differentiated by referring to bits 5..3 of CAPID0 field of PCU registers device (Bus 1, Device 30, Function 3, VID:DID 8086:2083)[2] (there's some information regarding the device's fields available for BDX[3], but not for SKX[4], and they are indeed different).  By implementing a more precise filter, it is possible to avoid issues with the latest CPU errata/CVEs not being mitigated by default in a more common SKL-SP server segment CPU case.

Steps to Reproduce:
1. Install microcode_ctl package, version 2.1-61.el7 or higher on a system with SKL-SP Server segment CPU (Intel Xeon 81xx).
2. Check the dmesg/syslog for disclaimers
3. Check the microcode version populated in /lib/firmware/KERNEL_VERSION/inte-ucode and the early initramfs

Actual results:
Disclaimer is present, older (0x2000064) microcode revision is used.

Expected results:
No disclaimer present, newer (0x2000065) microcode revision is used.

[1] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/21
[2] https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf#page
[3] https://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-datasheet-vol-2.html , page 82
[4] https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/2nd-gen-xeon-scalable-spec-update.pdf , page 57

Comment 6 Jeff Bastian 2020-06-29 19:30:10 UTC
Revision 0x2006906 of the microcode for 06-55-04 CPUs fixes the hanging problems, so this change was undone [*] and this CPU is no longer blacklisted.  The updated microcode revision is available in microcode_ctl-2.1-73.el7; see bug 1826589.

[*] http://pkgs.devel.redhat.com/cgit/rpms/microcode_ctl/commit/?h=rhel-7.9&id=2e60d157a329

Comment 7 Jeff Bastian 2020-06-29 19:31:50 UTC
Oops, forgot the link to the Beaker job where 06-55-04 was successfully updated to microcode revision 0x2006906 by default:

https://beaker.engineering.redhat.com/recipes/8478436#task112216361

Comment 9 errata-xmlrpc 2020-09-29 20:12:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (microcode_ctl bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3968