Bug 2167244
| Summary: | Windows VM with Reenlightenment flag becomes non-migratable after upgrade CNV from 4.10 to 4.11 | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Denys Shchedrivyi <dshchedr> |
| Component: | Virtualization | Assignee: | Antonio Cardace <acardace> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Denys Shchedrivyi <dshchedr> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.11.3 | CC: | acardace, ailan, apinnick, dholler, fdeutsch, mtessun, pelauter, sgott, sradco, ycui |
| Target Milestone: | --- | ||
| Target Release: | 4.10.8 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | virt-operator-container-v4.10.8-9 hco-bundle-registry-container-v4.10.8-37 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-08 10:27:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Denys Shchedrivyi
2023-02-06 02:22:28 UTC
This is behaving as expected, if the cluster does not expose the TSC frequency then migration of re-enlightenment Windows VM is not supported because of changes introduced by QEMU (for safety measures). This will happen on virtualized nodes if the invtsc CPU model feature is not set (PSI clusters). restarting VMs is a disruptive and unexpected operation during an OCP upgrade. This blocks completion of the upgrade of the OCP cluster. How can we identify or warn against this problem BEFORE a customer starts an OCP upgrade? Did this happen on bare metal nodes or virtualized nodes? BM node > This will happen on virtualized nodes if the invtsc CPU model feature is not set (PSI clusters)
Please advise how a customer can diagnose what physical hosts will have this problem, and which ones will not experience this problem?
To clarify concerns, this is indeed a bug caused by breaking changes introduced by QEMU and KubeVirt in the 4.11.1 release.
@pelauter the reenlightenment VMI when created will have a node selector that will schedule the VM only on nodes that support the lowest TSC frequency available on the cluster or nodes that have the 'cpu-timer.node.kubevirt.io/tsc-scalable' label set to true as they support TSC frequency scaling.
In practice on the cluster on which Denys found the bug these are the nodes:
name: monster01.lab.eng.tlv2.redhat.com
cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000'
cpu-timer.node.kubevirt.io/tsc-scalable: 'false'
name: monster02.lab.eng.tlv2.redhat.com
cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000'
cpu-timer.node.kubevirt.io/tsc-scalable: 'false'
name: monster04.lab.eng.tlv2.redhat.com
cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000'
cpu-timer.node.kubevirt.io/tsc-scalable: 'false'
name: zeus08.lab.eng.tlv2.redhat.com
cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000'
cpu-timer.node.kubevirt.io/tsc-scalable: 'false'
name: zeus10.lab.eng.tlv2.redhat.com
cpu-timer.node.kubevirt.io/tsc-frequency: '2095078000'
cpu-timer.node.kubevirt.io/tsc-scalable: 'true'
name: zeus11.lab.eng.tlv2.redhat.com
cpu-timer.node.kubevirt.io/tsc-frequency: '2095077000'
cpu-timer.node.kubevirt.io/tsc-scalable: 'true'
The VMI will have a node selector that points to the lowest frequency, which is 1699998000, so this VMI will not be able to run on the first 2 nodes because they have a TSC frequency of 2099998000 and they are non-scalable.
With all this said, the Live-Migration condition turning to false is a real bug and will be fixed in 4.10.8.
verified, Windows VM with Reenlightenment can be migrated after upgrade from 4.10.8 to 4.11.3 |