Description of problem: vm with HPP storage class(Non Migratable) fails to restart on another node during node drain and stuck with error message: error when evicting pods/"virt-launcher-vm-rhel88-source-hpp-z5lbw" -n "default" (will retry after 5s): admission webhook "virt-launcher-eviction-interceptor.kubevirt.io" denied the request: VMI vm-rhel88-source-hpp is configured with an eviction strategy but is not live-migratable Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. create a VM with HPP storage class 2. Do node drain 3. Actual results: VM fails to restart on another node Expected results: Additional info:
*** Bug 2219784 has been marked as a duplicate of this bug. ***
evictionStrategy:LiveMigrateIfPossible is explained as part of this PR Description, https://github.com/kubevirt/kubevirt/pull/9798 HCO default should adopt using evictionStrategy:LiveMigrateIfPossible as the default eviction strategy. QE will be testing this and then would move it to HCO, for updating the default value. Also currently directly updating evictionStrategy:LiveMigrateIfPossible as the default eviction strategy, is not supported via HCO, even though KV CR support it. We are using JSON Patch annotation to update the value in KV CR. --- 1) Updating KV CR via HCO CR using JSON Patch. ]$ oc annotate --overwrite -n openshift-cnv hyperconverged kubevirt-hyperconverged kubevirt.kubevirt.io/jsonpatch='[{ "op": "add", "path": "/spec/configuration/evictionStrategy", "value": "LiveMigrateIfPossible" }]' 2) KV CR ]$ oc get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o yaml | grep evictionStrategy evictionStrategy: LiveMigrateIfPossible
Also currently directly updating evictionStrategy:LiveMigrateIfPossible as the default eviction strategy, is not supported via HCO, even though KV CR support it. We need to update HCO CR with "evictionStrategy:LiveMigrateIfPossible" as the default eviction strategy.
QE has tested with evictionStrategy: LiveMigrateIfPossible , which addresses the problem stated in the description / comment1, 1) Node drain was successful for, VM with HPP StorageClass, with evictionStrategy: LiveMigrateIfPossible ; Due to HPP being the local storage, the VM was in Scheduling state and got back into running state, as soon as the Node was un-cordoned, after Node Drain. 2) GPU VM restarts on new node with evictionStrategy: LiveMigrateIfPossible, when node drained 3) normal VM Live Migrated to different node with evictionStrategy: LiveMigrateIfPossible, when node drained 4) No alerts when evictionStrategy: LiveMigrateIfPossible 5) alert fires for non-migratable vm when evictionStrategy: LiveMigrate (edited)
We have to consider that we have exactly the same problem on two different fronts (with a slightly different semantic). On the HCO CR we have: /spec/workloadUpdateStrategy/workloadUpdateMethods where we can set an ordered list (in terms of preference) of methods choosing between LiveMigrate, Evict and none. Then we have the cluster level eviction strategy (/spec/evictionStrategy) where we can choose between [None;LiveMigrate;External;LiveMigrateIfPossible(*)] (*in a near future as the next step). /spec/evictionStrategy is going to influence what is going to happen on node drains (and this is going to affect the last step of OCP upgrades or in general MCO actions): - if we have LiveMigrate there and the VM could not be livemigrated, the drain will be blocked by the corresponding PDB. This mean that during OCP upgrade the final reboot of a node running a not migratable VM will be eventually blocked, MCO will report it and the cluster admin will have eventually to explicitly shutdown the unmigratable VM in order to continue. - with LiveMigrateIfPossible, unmigratable VMs will be automatically evicted to favour platform processes (upgrade/drains...) over workload stability. /spec/workloadUpdateStrategy/workloadUpdateMethods is instead going to influence only what is going to virt-launcher pods after a CNV upgrade. When we initially introduced it with https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1348the default value was: {"LiveMigrate", "Evict"} meaning that VMs should be tried first to be live migrated and if not possible eventually evicted. Which is exactly the behaviour of LiveMigrateIfPossible. But then we removed it as for: https://bugzilla.redhat.com/show_bug.cgi?id=2017394 and later on, when we finally really introduced it, we chose to have as the default only {"LiveMigrate"} to privilege by default workload stability over platform upgrades. So we introduced OutdatedVirtualMachineInstanceWorkloads alert to track that at least one VMI is running with an outdated virt-launcher and the kubevirt.io/outdatedLauncherImage label to let the cluster admin easily track it. Now setting by default LiveMigrateIfPossible on /spec/evictionStrategy and only {"LiveMigrate"} on /spec/workloadUpdateStrategy/workloadUpdateMethods is going to introduce two different behaviours on OCP (and in general node drains) upgrades (and in general on MCO actions) from CNV upgrades and this can become really confusing at cluster admin eyes. For the sake of consistency I see two options here: 1. choose to continue privileging workload stability and continuity over platform upgrades as we did in the past; if not already there, we will probably have to introduce: - a new alert (with its runbook) on our side clearly stating that a not live migratable VM is preventing the node drain and that it should be shut down (when possible) - a specific label on the VM to let the cluster admin easily identify the problematic objects 2. amend /spec/workloadUpdateStrategy/workloadUpdateMethods setting {"LiveMigrate", "Evict"} as the default there to get a consistent behaviour (but we have to properly explain this is the release note because it's a relevant change). Personally I still think that we should aim to guarantee (by default!) workload business continuity and simply let the cluster admin shutdown unmigratable VMs only when he think its a good moment for that. Then he can still explicitly choose `{"LiveMigrate", "Evict"}` and `LiveMigrateIfPossible` if he thinks that platform/product upgrades are more important than workload stability.
Simone, thanks for the ... summary (?) :) """ Personally I still think that we should aim to guarantee (by default!) workload business continuity and simply let the cluster admin shutdown unmigratable VMs only when he think its a good moment for that. """ tl;dr Yes. And support in HCO to set LiveMigrateIfPossible More nuanced. I think we are abusing PDBs in KubeVirt, we don't protect the workload, but we use them as a signaling mechanism for LMs. PDBs do not really indicate the criticality of a workload. Why is this relevant: Because today we have set the expectation that VMs will be protected, and for the mid future we must keep this behavior (therefore the tl;dr Yes). If we change it, then we need to do it backwards compatible without risking workloads. And this mechanism for protecting workloads could be PDBs. But this means, we need to remove our general reliance on PDBs for LM signaling. And this is covered in https://issues.redhat.com/browse/CNV-28528 Thus I'm seeing the following steps: 1. VMs can block cluster upgrades, Admins need to kill VMs if required. Admin must have the ability to change the policy cluster wide to LiveMigrateIfPossible 2. Address https://issues.redhat.com/browse/CNV-28528 to not rely on PDBs 3. After #3 VMs will by default not (permanently) block drains anymore, we need to allow using PDBs for protecting selected workloads. 4. Find a migration path from todays state (all VMs are protected) to the future state (no VM is protected by default)
OK, so now we have LiveMigrateIfPossible as an allowed option while LiveMigrate is still the default. Moving this to ON_QA.
Verified on : v4.14.0.rhel9-1259 evictionStrategy LiveMigrateIfPossible can be set through HCO [akriti@fedora ~]$ oc edit hco kubevirt-hyperconverged -n openshift-cnv -o yaml [akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep evictionStrategy evictionStrategy: LiveMigrateIfPossible Both HPP and GPU VMs restarted on another node when evictionStrategy:LiveMigrateIfPossible in HCO [akriti@fedora ~]$ oc get vm NAME AGE STATUS READY vm-rhel88-source-hpp 2m45s Stopped False vm1-rhel88-ocs 11m Stopped False [akriti@fedora ~]$ virtctl start vm-rhel88-source-hpp VM vm-rhel88-source-hpp was scheduled to start [akriti@fedora ~]$ virtctl start vm1-rhel88-ocs VM vm1-rhel88-ocs was scheduled to start [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-fedora-with-pvc 19h Running 10.130.1.226 cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com True vm-rhel88-source-hpp 36s Running 10.129.1.90 cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com True vm1-rhel88-ocs 28s Running 10.129.1.89 cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm-rhel88-source-hpp Successfully connected to vm-rhel88-source-hpp console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm-rhel88-source-hpp login: cloud-user Password: [cloud-user@vm-rhel88-source-hpp ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ virtctl console vm1-rhel88-ocs Successfully connected to vm1-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm1-rhel88-ocs login: cloud-user Password: Last login: Wed Jul 19 05:23:21 on ttyS0 [cloud-user@vm1-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ oc adm drain cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true node/cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com cordoned . . node/cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com drained [akriti@fedora ~]$ oc adm uncordon cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com node/cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com uncordoned [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-fedora-with-pvc 19h Running 10.130.1.226 cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com True vm-rhel88-source-hpp 7m52s Running 10.129.1.97 cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com True vm1-rhel88-ocs 7m51s Running 10.130.0.63 cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm1-rhel88-ocs Successfully connected to vm1-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm1-rhel88-ocs login: cloud-user Password: Last login: Wed Jul 19 05:31:58 on ttyS0 [cloud-user@vm1-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ virtctl console vm-rhel88-source-hpp Successfully connected to vm-rhel88-source-hpp console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm-rhel88-source-hpp login: cloud-user Password: Last login: Wed Jul 19 05:31:18 on ttyS0 [cloud-user@vm-rhel88-source-hpp ~]$ [akriti@fedora ~]$
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817