Bug 1758382
Summary: | Dell PowerEdge M820 servers panicking after microcode / firmware upgrades | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Joshua Baker <jobaker> |
Component: | microcode_ctl | Assignee: | Eugene Syromiatnikov <esyr> |
Status: | CLOSED ERRATA | QA Contact: | Jeff Bastian <jbastian> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.10 | CC: | ionut.jula, ionutjula, jmario, kwalker, sjohnsto, skozina, toneata |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | microcode_ctl-1.17-33.17.el6_10 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 08:54:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Joshua Baker
2019-10-04 00:56:24 UTC
> microcode_ctl-1.17-33.9.el6_10.x86_64 having firmware revision=0x42d
Judging by the server model (Dell PowerEdge M820) and microcode_ctl-1.17-33.14.el6_10.x86_64-provided microcode revision being 0x718, I would presume that the CPU model is Intel Xeon E5-46xx (CPUID 0x206d7, FF-MM-SS 06-2d-07, codename Sandy Bridge-EP) and the respective microcode version for it in microcode_ctl-1.17-33.9.el6_10.x86_64 is 0x714.
Otherwise this microcode revision looks like the one for Ivy Bridge-EP (CPUID 0x306e4, FF-MM-SS 06-3e-04), possibly Intel Xeon E5-46xx v2.
So, considering the above, there are the following questions:
* Are the issues observed on kernel-2.6.32-754.15.1+ with microcode_ctl-1.17-33.11+ (that's the microcode_ctl RPM release that brings MDS-enabled IVB-EP microcode revision 0x42e)?
* Considering both the kernel and microcode updates are MDS-related, are the issues observed on SNB-EP machines with kernel-2.6.32-754.15.1+, microcode_ctl-1.17-33.14, and mds=off kernel parameter?
After reading through the lenghtly customer case, I agree with Eugene's triage steps posted in the case, (appended below). This feels like the verw instruction's flushing behavior is the trigger. Comments from Eugene Syromiatnikov on next steps: microcode_ctl-1.17-33.14.el6_10.x86_64 that brings 0x718 microcode release. So, since both the microcode_ctl and kernel updates are MDS-related, I would suggest to check the following cases: * Downgraded microcode_ctl package (that only makes sense only if OS-driven upgrades are actually used and system firmware doesn't have 0x718 microcode revision already) and check kernel-2.6.32-754.15.1+ * Updated microcode_ctl and pre-2.6.32-754.15.1 kernel. * Updated microcode_ctl and 2.6.32-754.15.1+ kernel with mds=off If all these cases do not lead to hangs, I would suspect issues in VMWERV instruction implementation on SNB-EP. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3090 *** Bug 1774134 has been marked as a duplicate of this bug. *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |