Description of problem:
Host xxx moved to Non-Operational state as host does not meet the cluster's minimum CPU level. Missing CPU features : spec_ctrl
Version-Release number of selected component (if applicable):
100% at cu
Steps to Reproduce:
1. Upgrade host to rhvh-220.127.116.11
2. Activate the host.
3. Host goes non-operational in cluster with IBRS.
Host goes to Non-Operational state due to missing CPU features : spec_ctrl
Host should be active.
Issue was reported in Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1624453 but only one flag issue was resolved. spec_ctrl flag is still missing. As mentioned in it, I am opening new Bugzilla here.
I can also see one upstream Bugzilla reporting the same issue: https://bugzilla.redhat.com/show_bug.cgi?id=1595378
You need to add same details and investigate similarly to the original bug. Otherwise this report is kind of unhelpful.
Also please add sql dump of the corresponding host and cluster tables. Thanks
To follow up, please provide, at a minimum, the output of lscpu, a sql dump or screenshot of the cluster CPU level, and the host CPU details as reported by RHVM
it's still the same thing, the CPU is plain Haswell-noTSX without any microcode update (record 3 in your list seems to be updated, record 4 does not)
rhvh 4.2.7 ships with microcode_ctl-2.1-47.el7.x86_64, which should be up to date unless some change in 7.6 disabled default mitigation.
Please post dmesg and the output of /proc/cmdline
Additionally, please ensure that 'rpm -q microcode_ctl' shows the version above
you still do not have the latest microcode applied (running 0x38), AFAICT the latest in that microcode_ctl is 0x3d.
Ryan, is it possible it's wrong in the dracut image of rhvh? The microcode_ctl version look ok, it's just not getting applied it seems.
It's possible, but would require user intervention.
Kumar, can the custoemr please try unpacking the initrd to verify the firmware files present?
Moving back to Node, since the engine is doing what it's supposed to be doing
cshao@ reproduced this issue in https://bugzilla.redhat.com/show_bug.cgi?id=1624453#c17
But with the same rhvh version(from rhvh-18.104.22.168-0.20180622 to rhvh-22.214.171.124-0.20180813), I did not reproduce this issue with rhvm-126.96.36.199-0.1.el7ev, maybe related to the old rhvm-4.2.5 then.
According to https://bugzilla.redhat.com/show_bug.cgi?id=1624453#c17, QE will flag qa_ack+
Moving to 4.3.2 not being identified as blocker for 4.3.1
The bug is fixed in rhvh-188.8.131.52-0.20190225.0 with rhvm-184.108.40.206-0.1.el7ev
# imgbase layout
1. Install rhvh-220.127.116.11-0.20180622.0
2. Add rhvh to rhvm
3. Upgrade rhvh to rhvh-18.104.22.168-0.20190225.0 from rhvm side
After step 3, rhvh is active in rhvm
Moving status to VERIFIED.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.