Bug 2228036 - Virt-Launcher Pod Node Drain stuck when HCO evictionStrategy is set "None" and VM is not restarted
Summary: Virt-Launcher Pod Node Drain stuck when HCO evictionStrategy is set "None" an...
Keywords:
Status: POST
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.14.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.14.0
Assignee: Antonio Cardace
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
: 2228027 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-01 07:54 UTC by Akriti Gupta
Modified: 2023-08-11 21:43 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 10255 0 None Merged virt-controller: always store evictionStrategy in Spec 2023-08-16 11:13:01 UTC
Red Hat Issue Tracker CNV-31577 0 None None None 2023-08-01 07:55:54 UTC

Description Akriti Gupta 2023-08-01 07:54:35 UTC
Description of problem: Initially a VM running with HCO evictionStrategy:LiveMigrate , when we update HCO evictionStrategy:None and without restsrting the vm do node drain , Virt-launcher pod does not drain , and node drain is stuck with following error:

error when evicting pods/"virt-launcher-vm2-rhel88-ocs-nxl4s" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s

Version-Release number of selected component (if applicable):


How reproducible:
100% on a bm cluster

Steps to Reproduce:
1.initially at HCO evictionStrategy:LiveMigrate
2.create a vm (VM is running) (no evictionStrategy field in VM spec)
3.edit hco with evictionStrategy: None
4.do not restart vm
5.do node drain


Actual results: Node drain is stuck while draining virt-launcher pod 
error when evicting pods/"virt-launcher-vm2-rhel88-ocs-nxl4s" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s


Expected results:Node drain is successful and VM Restarted on another node


Additional info:

Comment 1 Akriti Gupta 2023-08-01 07:58:03 UTC
[akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction
  evictionStrategy: LiveMigrate
[akriti@fedora ~]$ oc apply -f vm_rhel_ocs.yaml 
Warning: kubevirt.io/v1alpha3 is now deprecated and will be removed in a future release.
virtualmachine.kubevirt.io/vm2-rhel88-ocs created
[akriti@fedora ~]$ oc get vm
NAME             AGE   STATUS    READY
vm2-rhel88-ocs   35s   Stopped   False
[akriti@fedora ~]$ virtctl start vm2-rhel88-ocs
VM vm2-rhel88-ocs was scheduled to start
[akriti@fedora ~]$ oc get vm vm2-rhel88-ocs -o yaml | grep eviction
[akriti@fedora ~]$ oc get vmi
NAME             AGE   PHASE     IP             NODENAME                                         READY
vm2-rhel88-ocs   38s   Running   10.128.0.173   cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com   True
[akriti@fedora ~]$ virtctl console vm2-rhel88-ocs
Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^]

Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm2-rhel88-ocs login: cloud-user
Password: 
[cloud-user@vm2-rhel88-ocs ~]$ [akriti@fedora ~]$
[akriti@fedora ~]$ oc edit hco kubevirt-hyperconverged -n openshift-cnv -o yaml
[akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep eviction evictionStrategy: None 
[akriti@fedora ~]$ oc adm drain cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true
node/cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com cordoned
.
evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s
error when evicting pods/"virt-launcher-vm2-rhel88-ocs-nxl4s" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/virt-launcher-vm2-rhel88-ocs-nxl4s
error when evicting pods/"virt-launcher-vm2-rhel88-ocs-n
—--------------------
***Node drain stays stuck here until vm is stopped


***If we restart the vm after updating HCO - VM Restarts and node is drained

[akriti@fedora ~]$ virtctl stop vm2-rhel88-ocs
VM vm2-rhel88-ocs was scheduled to stop
[akriti@fedora ~]$ virtctl start vm2-rhel88-ocs
VM vm2-rhel88-ocs was scheduled to start
[akriti@fedora ~]$ oc get vm
NAME             AGE   STATUS    READY
vm2-rhel88-ocs   19m   Running   True
[akriti@fedora ~]$ oc get vmi
NAME             AGE   PHASE     IP             NODENAME                                         READY
vm2-rhel88-ocs   22s   Running   10.129.0.136   cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com   True
[akriti@fedora ~]$ virtctl console vm2-rhel88-ocs
Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^]
Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64
Activate the web console with: systemctl enable --now cockpit.socket

vm2-rhel88-ocs login: cloud-user
Password: 
Last login: Mon Jul 31 06:43:47 on ttyS0
[cloud-user@vm2-rhel88-ocs ~]$ [akriti@fedora ~]$ 
[akriti@fedora ~]$ oc adm drain cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true
node/cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com cordoned
.
.
node/cnv-qe-infra-02.cnvqe3.lab.eng.rdu2.redhat.com drained
[akriti@fedora ~]$ oc get vmi
NAME             AGE   PHASE     IP             NODENAME                                         READY
vm2-rhel88-ocs   84s   Running   10.128.0.203   cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com   True
[akriti@fedora ~]$ virtctl console vm2-rhel88-ocs
Successfully connected to vm2-rhel88-ocs console. The escape sequence is ^]

Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm2-rhel88-ocs login: cloud-user
Password: 
Last login: Mon Jul 31 07:01:36 on ttyS0
[cloud-user@vm2-rhel88-ocs ~]$

Comment 2 Kedar Bidarkar 2023-08-02 12:23:34 UTC
*** Bug 2228027 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.