Bug 1652519

Summary: host does not meet the cluster's minimum CPU level. Missing CPU features : spec_ctrl
Product: Red Hat Enterprise Virtualization Manager Reporter: Kumar Mashalkar <kmashalk>
Component: redhat-virtualization-hostAssignee: Yuval Turgeman <yturgema>
Status: CLOSED ERRATA QA Contact: Huijuan Zhao <huzhao>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.7CC: cshao, dfediuck, huzhao, kmashalk, lsvaty, michal.skrivanek, nlevy, qiyuan, rbarry, rdlugyhe, rhodain, sbonazzo, sirao, weiwang, yaniwang, ycui, yturgema
Target Milestone: ovirt-4.3.1   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, during an upgrade, dracut running inside chroot did not detect the cpuinfo and the kernel config files because /proc was not mounted and /boot was bindmounted. As a result, the correct microcode was missing from the initramfs. The current release bindmounts /proc to the chroot and removes the --hostonly flag. This change inserts both AMD and Intel microcodes into the initramfs and boots the host after an upgrade.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-08 12:32:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kumar Mashalkar 2018-11-22 09:38:57 UTC
Description of problem:
Host xxx moved to Non-Operational state as host does not meet the cluster's minimum CPU level. Missing CPU features : spec_ctrl

Version-Release number of selected component (if applicable):
imgbased-1.0.29-1.el7ev.noarch

How reproducible:
100% at cu

Steps to Reproduce:
1. Upgrade host to rhvh-4.2.7.3
2. Activate the host.
3. Host goes non-operational in cluster with IBRS.

Actual results:
Host goes to Non-Operational state due to missing CPU features : spec_ctrl

Expected results:
Host should be active.

Additional info:
Issue was reported in Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1624453 but only one flag issue was resolved. spec_ctrl flag is still missing. As mentioned in it, I am opening new Bugzilla here.

I can also see one upstream Bugzilla reporting the same issue: https://bugzilla.redhat.com/show_bug.cgi?id=1595378

Comment 1 Michal Skrivanek 2018-11-23 06:04:37 UTC
You need to add same details and investigate similarly to the original bug. Otherwise this report is kind of unhelpful. 

Also please add sql dump of the corresponding host and cluster tables. Thanks

Comment 2 Ryan Barry 2018-11-23 06:49:27 UTC
To follow up, please provide, at a minimum, the output of lscpu, a sql dump or screenshot of the cluster CPU level, and the host CPU details as reported by RHVM

Comment 4 Michal Skrivanek 2018-11-23 08:47:21 UTC
it's still the same thing, the CPU is plain Haswell-noTSX without any microcode update (record 3 in your list seems to be updated, record 4 does not)

Comment 5 Ryan Barry 2018-11-28 23:25:01 UTC
rhvh 4.2.7 ships with microcode_ctl-2.1-47.el7.x86_64, which should be up to date unless some change in 7.6 disabled default mitigation.

Please post dmesg and the output of /proc/cmdline

Additionally, please ensure that 'rpm -q microcode_ctl' shows the version above

Comment 7 Michal Skrivanek 2018-12-10 11:30:43 UTC
you still do not have the latest microcode applied (running 0x38), AFAICT the latest in that microcode_ctl is 0x3d.
Ryan, is it possible it's wrong in the dracut image of rhvh? The microcode_ctl version look ok, it's just not getting applied it seems.

Comment 8 Ryan Barry 2018-12-10 11:53:07 UTC
It's possible, but would require user intervention.

Kumar, can the custoemr please try unpacking the initrd  to verify the firmware files present?

Comment 9 Ryan Barry 2018-12-10 13:39:41 UTC
Moving back to Node, since the engine is doing what it's supposed to be doing

Comment 16 Huijuan Zhao 2018-12-20 06:44:01 UTC
cshao@ reproduced this issue in https://bugzilla.redhat.com/show_bug.cgi?id=1624453#c17
But with the same rhvh version(from rhvh-4.2.4.3-0.20180622 to rhvh-4.2.5.2-0.20180813), I did not reproduce this issue with rhvm-4.2.7.5-0.1.el7ev, maybe related to the old rhvm-4.2.5 then.

According to https://bugzilla.redhat.com/show_bug.cgi?id=1624453#c17, QE will flag qa_ack+

Comment 21 Sandro Bonazzola 2019-02-18 07:57:56 UTC
Moving to 4.3.2 not being identified as blocker for 4.3.1

Comment 23 Huijuan Zhao 2019-02-26 06:10:20 UTC
The bug is fixed in rhvh-4.3.0.5-0.20190225.0 with rhvm-4.2.8.2-0.1.el7ev

Test version:
# imgbase layout
rhvh-4.2.4.3-0.20180622.0
 +- rhvh-4.2.4.3-0.20180622.0+1
rhvh-4.3.0.5-0.20190225.0
 +- rhvh-4.3.0.5-0.20190225.0+1

Test steps:
1. Install rhvh-4.2.4.3-0.20180622.0
2. Add rhvh to rhvm
3. Upgrade rhvh to rhvh-4.3.0.5-0.20190225.0 from rhvm side

Test results:
After step 3, rhvh is active in rhvm


Moving status to VERIFIED.

Comment 26 errata-xmlrpc 2019-05-08 12:32:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1053

Comment 27 Daniel Gur 2019-08-28 13:14:06 UTC
sync2jira

Comment 28 Daniel Gur 2019-08-28 13:18:22 UTC
sync2jira