Bug 2228036
Summary: | Virt-Launcher Pod Node Drain stuck when HCO evictionStrategy is set "None" and VM is not restarted | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Akriti Gupta <akrgupta> |
Component: | Virtualization | Assignee: | Antonio Cardace <acardace> |
Status: | CLOSED ERRATA | QA Contact: | Kedar Bidarkar <kbidarka> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.14.0 | Flags: | akrgupta:
needinfo+
|
Target Milestone: | --- | ||
Target Release: | 4.14.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | v4.14.0.rhel9-1706 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-08 14:06:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Akriti Gupta
2023-08-01 07:54:35 UTC
[akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: LiveMigrate [akriti@fedora ~]$ oc apply -f vm_rhel_ocs.yaml Warning: kubevirt.io/v1alpha3 is now deprecated and will be removed in a future release. virtualmachine.kubevirt.io/vm2-rhel88-ocs created [akriti@fedora ~]$ oc get vm NAME AGE STATUS READY vm2-rhel88-ocs 35s Stopped False [akriti@fedora ~]$ virtctl start vm2-rhel88-ocs VM vm2-rhel88-ocs was scheduled to start [akriti@fedora ~]$ oc get vm vm2-rhel88-ocs -o yaml | grep eviction [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm2-rhel88-ocs 38s Running 10.128.0.173 cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm2-rhel88-ocs Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm2-rhel88-ocs login: cloud-user Password: [cloud-user@vm2-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ oc edit hco kubevirt-hyperconverged -n openshift-cnv -o yaml [akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: None [akriti@fedora ~]$ oc adm drain cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true node/cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com cordoned . evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s error when evicting pods/"virt-launcher-vm2-rhel88-ocs-nxl4s" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s error when evicting pods/"virt-launcher-vm2-rhel88-ocs-n —-------------------- ***Node drain stays stuck here until vm is stopped ***If we restart the vm after updating HCO - VM Restarts and node is drained [akriti@fedora ~]$ virtctl stop vm2-rhel88-ocs VM vm2-rhel88-ocs was scheduled to stop [akriti@fedora ~]$ virtctl start vm2-rhel88-ocs VM vm2-rhel88-ocs was scheduled to start [akriti@fedora ~]$ oc get vm NAME AGE STATUS READY vm2-rhel88-ocs 19m Running True [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm2-rhel88-ocs 22s Running 10.129.0.136 cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm2-rhel88-ocs Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm2-rhel88-ocs login: cloud-user Password: Last login: Mon Jul 31 06:43:47 on ttyS0 [cloud-user@vm2-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ oc adm drain cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true node/cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com cordoned . . node/cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com drained [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm2-rhel88-ocs 84s Running 10.128.0.203 cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com True [akriti@fedora ~]$ virtctl console vm2-rhel88-ocs Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm2-rhel88-ocs login: cloud-user Password: Last login: Mon Jul 31 07:01:36 on ttyS0 [cloud-user@vm2-rhel88-ocs ~]$ *** Bug 2228027 has been marked as a duplicate of this bug. *** @akrgupta To verify this just make sure that the eviction strategy the VM was started with is always stored in the VMI in the `.spec.evictionStrategy` field. verified on v4.14.0.rhel9-1709 VMI had eviction strategy defined under which is same as what was in HCO when VM was started, On updateing HCO Vm follows then new eviction strategy value only on restart [akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: None [akriti@fedora ~]$ oc get vmi vm-rhel88-ocs -o json | jq .spec.evictionStrategy "None" [akriti@fedora ~]$ oc get vm vm-rhel88-ocs -o yaml | grep eviction [akriti@fedora ~]$ virtctl console vm-rhel88-ocs Red Hat Enterprise Linux 8.8 (Ootpa) Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm-rhel88-ocs login: cloud-user Password: [cloud-user@vm-rhel88-ocs ~]$ [akriti@fedora ~]$ [akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: LiveMigrate [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 6m47s Running 10.131.0.192 virt-akr-414-jd9ft-worker-0-g6xl6 True [akriti@fedora ~]$ oc adm drain virt-akr-414-jd9ft-worker-0-g6xl6 --force=true --ignore-daemonsets=true --delete-emptydir-data=true node/virt-akr-414-jd9ft-worker-0-g6xl6 cordoned . . node/virt-akr-414-jd9ft-worker-0-g6xl6 drained [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 42s Scheduled virt-akr-414-jd9ft-worker-0-qsgrz False [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 48s Scheduled virt-akr-414-jd9ft-worker-0-qsgrz False [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 53s Running 10.129.2.38 virt-akr-414-jd9ft-worker-0-qsgrz True [akriti@fedora ~]$ virtctl restart vm-rhel88-ocs VM vm-rhel88-ocs was scheduled to restart [akriti@fedora ~]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel88-ocs 52s Running 10.129.2.39 virt-akr-414-jd9ft-worker-0-qsgrz True [akriti@fedora ~]$ oc get vmi vm-rhel88-ocs -o yaml | grep eviction evictionStrategy: LiveMigrate Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817 |