Description of problem: Windows VM was migratable in CNV 4.10.7, but after upgrade to 4.11.3 it becomes non-migratable, here the events from VMI: # VM Created and successfully migrated for the first time: > Normal SuccessfulCreate 49m disruptionbudget-controller Created PodDisruptionBudget kubevirt-disruption-budget-56ncf > Normal SuccessfulCreate 49m virtualmachine-controller Created virtual machine pod virt-launcher-windows-vm-1675645478-4397972-dsmmn > Normal Created 48m virt-handler VirtualMachineInstance defined. > Normal Started 48m virt-handler VirtualMachineInstance started. > Normal SuccessfulCreate 14m disruptionbudget-controller Created Migration kubevirt-evacuation-7frcx > Normal SuccessfulUpdate 14m virtualmachine-controller Expanded PodDisruptionBudget kubevirt-disruption-budget-56ncf > Normal PreparingTarget 11m (x2 over 11m) virt-handler VirtualMachineInstance Migration Target Prepared. > Normal PreparingTarget 11m virt-handler Migration Target is listening at 10.131.0.119, on ports: 35055,36327 > Normal Migrating 11m (x5 over 11m) virt-handler VirtualMachineInstance is migrating. > Normal Migrated 11m virt-handler The VirtualMachineInstance migrated to node cnv-qe-infra-07.cnvqe2.lab.eng.rdu2.redhat.com. > Normal Deleted 11m virt-handler Signaled Deletion > Normal SuccessfulUpdate 11m disruptionbudget-controller shrank PodDisruptionBudget%!(EXTRA string=kubevirt-disruption-budget-56ncf) # When CNV upgraded - this message appeared: > Warning Migrated 5m32s virt-handler EvictionStrategy is set but vmi is not migratable; HyperV Reenlightenment VMIs cannot migrate when TSC Frequency is not exposed on the cluster: guest timers might be inconsistent > Warning Migrated 47s (x8 over 5m32s) virt-handler EvictionStrategy is set but vmi is not migratable; HyperV Reenlightenment VMIs cannot migrate when TSC Frequency is not exposed on the cluster: guest timers might be inconsistent And VM is not migratable anymore. Version-Release number of selected component (if applicable): CNV 4.11.3 Steps to Reproduce: 1. Install CNV 4.10.7 2. Create Windows VM with HyperV Reenlightenment flag enabled 3. Upgrade CNV to 4.11.3 Actual results: VM is not migratable after upgrade Expected results: VM should be migratable Additional info: Restarting VM after upgrade (`virtctl restart`) helps fix that
This is behaving as expected, if the cluster does not expose the TSC frequency then migration of re-enlightenment Windows VM is not supported because of changes introduced by QEMU (for safety measures). This will happen on virtualized nodes if the invtsc CPU model feature is not set (PSI clusters).
restarting VMs is a disruptive and unexpected operation during an OCP upgrade. This blocks completion of the upgrade of the OCP cluster. How can we identify or warn against this problem BEFORE a customer starts an OCP upgrade?
Did this happen on bare metal nodes or virtualized nodes?
BM node
> This will happen on virtualized nodes if the invtsc CPU model feature is not set (PSI clusters) Please advise how a customer can diagnose what physical hosts will have this problem, and which ones will not experience this problem?
To clarify concerns, this is indeed a bug caused by breaking changes introduced by QEMU and KubeVirt in the 4.11.1 release. @pelauter the reenlightenment VMI when created will have a node selector that will schedule the VM only on nodes that support the lowest TSC frequency available on the cluster or nodes that have the 'cpu-timer.node.kubevirt.io/tsc-scalable' label set to true as they support TSC frequency scaling. In practice on the cluster on which Denys found the bug these are the nodes: name: monster01.lab.eng.tlv2.redhat.com cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000' cpu-timer.node.kubevirt.io/tsc-scalable: 'false' name: monster02.lab.eng.tlv2.redhat.com cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000' cpu-timer.node.kubevirt.io/tsc-scalable: 'false' name: monster04.lab.eng.tlv2.redhat.com cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000' cpu-timer.node.kubevirt.io/tsc-scalable: 'false' name: zeus08.lab.eng.tlv2.redhat.com cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000' cpu-timer.node.kubevirt.io/tsc-scalable: 'false' name: zeus10.lab.eng.tlv2.redhat.com cpu-timer.node.kubevirt.io/tsc-frequency: '2095078000' cpu-timer.node.kubevirt.io/tsc-scalable: 'true' name: zeus11.lab.eng.tlv2.redhat.com cpu-timer.node.kubevirt.io/tsc-frequency: '2095077000' cpu-timer.node.kubevirt.io/tsc-scalable: 'true' The VMI will have a node selector that points to the lowest frequency, which is 1699998000, so this VMI will not be able to run on the first 2 nodes because they have a TSC frequency of 2099998000 and they are non-scalable. With all this said, the Live-Migration condition turning to false is a real bug and will be fixed in 4.10.8.
verified, Windows VM with Reenlightenment can be migrated after upgrade from 4.10.8 to 4.11.3