Description of problem: Initially a VM running with HCO evictionStrategy:LiveMigrate , when we update HCO evictionStrategy:None and without restsrting the vm do node drain , Virt-launcher pod does not drain , and node drain is stuck with following error: error when evicting pods/"virt-launcher-vm2-rhel88-ocs-nxl4s" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s Version-Release number of selected component (if applicable): How reproducible: 100% on a bm cluster Steps to Reproduce: 1.initially at HCO evictionStrategy:LiveMigrate 2.create a vm (VM is running) (no evictionStrategy field in VM spec) 3.edit hco with evictionStrategy: None 4.do not restart vm 5.do node drain Actual results: Node drain is stuck while draining virt-launcher pod error when evicting pods/"virt-launcher-vm2-rhel88-ocs-nxl4s" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s Expected results:Node drain is successful and VM Restarted on another node Additional info:
[akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: LiveMigrate [akriti@fedora ~]$ oc apply -f vm_rhel_ocs.yaml Warning: kubevirt.io/v1alpha3 is now deprecated and will be removed in a future release. virtualmachine.kubevirt.io/vm2-rhel88-ocs created [akriti@fedora ~]$ oc get vm NAME AGE STATUS READY vm2-rhel88-ocs 35s Stopped False [akriti@fedora ~]$ virtctl start vm2-rhel88-ocs VM vm2-rhel88-ocs was scheduled to start [akriti@fedora ~]$ oc get vm vm2-rhel88-ocs -o yaml | grep eviction [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm2-rhel88-ocs 38s Running 10.128.0.173 cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm2-rhel88-ocs Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm2-rhel88-ocs login: cloud-user Password: [cloud-user@vm2-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ oc edit hco kubevirt-hyperconverged -n openshift-cnv -o yaml [akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: None [akriti@fedora ~]$ oc adm drain cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true node/cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com cordoned . evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s error when evicting pods/"virt-launcher-vm2-rhel88-ocs-nxl4s" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s error when evicting pods/"virt-launcher-vm2-rhel88-ocs-n —-------------------- ***Node drain stays stuck here until vm is stopped ***If we restart the vm after updating HCO - VM Restarts and node is drained [akriti@fedora ~]$ virtctl stop vm2-rhel88-ocs VM vm2-rhel88-ocs was scheduled to stop [akriti@fedora ~]$ virtctl start vm2-rhel88-ocs VM vm2-rhel88-ocs was scheduled to start [akriti@fedora ~]$ oc get vm NAME AGE STATUS READY vm2-rhel88-ocs 19m Running True [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm2-rhel88-ocs 22s Running 10.129.0.136 cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm2-rhel88-ocs Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm2-rhel88-ocs login: cloud-user Password: Last login: Mon Jul 31 06:43:47 on ttyS0 [cloud-user@vm2-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ oc adm drain cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true node/cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com cordoned . . node/cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com drained [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm2-rhel88-ocs 84s Running 10.128.0.203 cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm2-rhel88-ocs Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm2-rhel88-ocs login: cloud-user Password: Last login: Mon Jul 31 07:01:36 on ttyS0 [cloud-user@vm2-rhel88-ocs ~]$
*** Bug 2228027 has been marked as a duplicate of this bug. ***
@akrgupta To verify this just make sure that the eviction strategy the VM was started with is always stored in the VMI in the `.spec.evictionStrategy` field.
verified on v4.14.0.rhel9-1709 VMI had eviction strategy defined under which is same as what was in HCO when VM was started, On updateing HCO Vm follows then new eviction strategy value only on restart [akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: None [akriti@fedora ~]$ oc get vmi vm-rhel88-ocs -o json | jq .spec.evictionStrategy "None" [akriti@fedora ~]$ oc get vm vm-rhel88-ocs -o yaml | grep eviction [akriti@fedora ~]$ virtctl console vm-rhel88-ocs Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm-rhel88-ocs login: cloud-user Password: [cloud-user@vm-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: LiveMigrate [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 6m47s Running 10.131.0.192 virt-akr-414-jd9ft-worker-0-g6xl6 True [akriti@fedora ~]$ oc adm drain virt-akr-414-jd9ft-worker-0-g6xl6 --force=true --ignore-daemonsets=true --delete-emptydir-data=true node/virt-akr-414-jd9ft-worker-0-g6xl6 cordoned . . node/virt-akr-414-jd9ft-worker-0-g6xl6 drained [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 42s Scheduled virt-akr-414-jd9ft-worker-0-qsgrz False [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 48s Scheduled virt-akr-414-jd9ft-worker-0-qsgrz False [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 53s Running 10.129.2.38 virt-akr-414-jd9ft-worker-0-qsgrz True [akriti@fedora ~]$ virtctl restart vm-rhel88-ocs VM vm-rhel88-ocs was scheduled to restart [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 52s Running 10.129.2.39 virt-akr-414-jd9ft-worker-0-qsgrz True [akriti@fedora ~]$ oc get vmi vm-rhel88-ocs -o yaml | grep eviction evictionStrategy: LiveMigrate
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817