Bug 1595378
| Summary: | hypervisor host non operational after yum update due to missing CPU feature SPEC_CTRL | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Linus <linus+rh> |
| Component: | General | Assignee: | bugs <bugs> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | meital avital <mavital> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.2.4 | CC: | bugs, cshao, linus+rh, michal.skrivanek, rbarry |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-28 22:13:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Linus
2018-06-26 19:05:46 UTC
Changing the cluster CPU type from "Intel Skylake Client IBRS Family" to "Intel Skylake Client Family" allowed me to activate host 1 again Changing the cluster CPU type also prevented oVirt from switching the next host (test-ovirt-2) into maintenance mode, since it was unable to migrate the VMs running on host test-ovirt-2 to any other host. The reason was that all running VMs were still using CPU type "Intel Skylake Client IBRS Family" and there was no longer any detination host with that CPU type available in the cluster. Had to shudown and run again all VMs started before switching the cluster CPU type. this could happen with an early 4.2.4(or late 4.2.3 IIRC) engines. Since you've used appliance - is it possible it hasn't been updated to current 4.2.4.x before adding the first host? If that's the case - indeed just use the workaround you mentioned in comment #1 Hi Michal, we did not update before adding the first host. As I tried to describe in the section "Steps to Reproduce:", we set up an oVirt hosted engine setup with three hypervisor hosts and glusterfs based on the package versions mentioned in that section. The engine was fully provisioned and all three host deployed/integrated, managed glusterfs volumes created for data, iso and export purposes and virtual machines provisioned and running before applying the 4.2.4.x and yum updates to the hosted engine VM. After updating the hosted engine VM we applied yum updates to the first hypervisor host in local maintenance mode and that host was switched to "Non Operational" status by the engine as described in the ticket. Yeah, that explains the behavior then. You wouldn’t have hit that if you had updated the engine before adding the first host. We do not release updates to appliance that often, relying on yum updates of a single baseline. It is not ideal, but we do not have capacity to rebuild completely all the time I would cpnsider it fixed now in latest 4.2.4.x, if that’s fine with you We did deploy the hosted engine VM before the 4.2.4.x updates where available from the ovirt 4.2 repo. From my point of view, this change in host CPU detection/classification causes a service degradation from a seamless update procedure based on VM live migrations to a pretty disruptive update procedure requiring a change of cluster CPU type and a stop and restart of all VMs running within the cluster. This issue in the update procedure will hit every oVirt setup deployed before the 4.2.4.x updates were released. Additionally, according to the provided dmesg output, the Skylake CPUs actually provide the feature SPEC_CTRL (after microcode updates during CentOS boot procedure), so the CPU is not missing this feature as claimed by the oVirt engine error message. Additionally, it seems that our CPUs did not provide IBRS ever, neither before the host update, not afterwards. So the classification as a "IBRS" CPU type seems questionable as well. (In reply to Linus from comment #6) > We did deploy the hosted engine VM before the 4.2.4.x updates where > available from the ovirt 4.2 repo. > > From my point of view, this change in host CPU detection/classification > causes a service degradation from a seamless update procedure based on VM > live migrations to a pretty disruptive update procedure requiring a change > of cluster CPU type and a stop and restart of all VMs running within the > cluster. > > This issue in the update procedure will hit every oVirt setup deployed > before the 4.2.4.x updates were released. Only those using IBRS CPUs, and it is fixed even for those by bug 1582483. The host should stay Operational if you use the up to date ovirt-emgine version. Perhaps a mistake in update procedure? > > Additionally, according to the provided dmesg output, the Skylake CPUs > actually provide the feature SPEC_CTRL (after microcode updates during > CentOS boot procedure), so the CPU is not missing this feature as claimed by > the oVirt engine error message. It’s not missing it, it’s just a consequence of a change in how flags are reported on oVirt side > Additionally, it seems that our CPUs did not provide IBRS ever, neither > before the host update, not afterwards. So the classification as a "IBRS" > CPU type seems questionable as well. They do, that’s the meaning of spec_ctrl flag. If you care about security please use the ssbd ones now, if not you can as well change that to the base type and avoid all the issues above (In reply to Michal Skrivanek from comment #7) > (In reply to Linus from comment #6) > > We did deploy the hosted engine VM before the 4.2.4.x updates where > > available from the ovirt 4.2 repo. > > > > From my point of view, this change in host CPU detection/classification > > causes a service degradation from a seamless update procedure based on VM > > live migrations to a pretty disruptive update procedure requiring a change > > of cluster CPU type and a stop and restart of all VMs running within the > > cluster. > > > > This issue in the update procedure will hit every oVirt setup deployed > > before the 4.2.4.x updates were released. > > Only those using IBRS CPUs, and it is fixed even for those by bug 1582483. > The host should stay Operational if you use the up to date ovirt-emgine > version. Perhaps a mistake in update procedure? As far as I know, all current Intel CPUs are affected by all Spectre related bugs, so with "IBRS CPUs" you mean those Intel CPUs that Intel provides a firmware update for to add features that allow controlling the impact of Spectre bugs? That would probably be all Intel servers bought within the last 3 to four years? :) Code 1582483 describes a workaround to allow scheduling of VMs requiring an IBRS CPU type on hosts that do not provide a IBRS type CPU according to ovirt/libvirt reporting mechanisms. Should we have been able to find the errata describing this workaround in the release notes of oVirt 4.2.4? As I already mentioned, we installed the hypervisor hosts and hosted engine VM before ovirt 4.2.4.x updates were available from the oVirt 4.2 repo. The IBRS CPU type was set automatically. We updated the ovirt engine VM before updating the first oVirt host. Is there a way to update an existing setup to ovirt 4.2.4 without having the hypervisor host CPU type changed from IBRS to the base type? > > > > > Additionally, according to the provided dmesg output, the Skylake CPUs > > actually provide the feature SPEC_CTRL (after microcode updates during > > CentOS boot procedure), so the CPU is not missing this feature as claimed by > > the oVirt engine error message. > > It’s not missing it, it’s just a consequence of a change in how flags are > reported on oVirt side > > > Additionally, it seems that our CPUs did not provide IBRS ever, neither > > before the host update, not afterwards. So the classification as a "IBRS" > > CPU type seems questionable as well. > > They do, that’s the meaning of spec_ctrl flag. If you care about security > please use the ssbd ones now, if not you can as well change that to the base > type and avoid all the issues above Ok, I did some searches and found that "SPEC_CTRL" is the Linux kernel label for using the MSR toggles of SPECTRE related CPU features. IBRS is a CPU mitigation feature to protect privileged code from any speculation influence resulting from user space code, right? |