2005695 – With descheduler during multiple VMIs migrations, some VMs are restarted

Bug 2005695 - With descheduler during multiple VMIs migrations, some VMs are restarted

Summary: With descheduler during multiple VMIs migrations, some VMs are restarted

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Antonio Cardace
QA Contact:	Sarah Bennert
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2012890
TreeView+	depends on / blocked

Reported:	2021-09-19 15:42 UTC by Ruth Netser
Modified:	2021-11-02 16:01 UTC (History)
CC List:	8 users (show)
Fixed In Version:	virt-operator-container-v4.9.0-57 hco-bundle-registry-container-v4.9.0-244
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2012890 (view as bug list)
Environment:
Last Closed:	2021-11-02 16:01:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:4104	0	None	None	None	2021-11-02 16:01:30 UTC

Description Ruth Netser 2021-09-19 15:42:15 UTC

Description of problem:
During tests with descheduler, VMIs were migrated continuously from node to node.
At some point some VMs were restarted.


Version-Release number of selected component (if applicable):
CNV 4.9.0
virt-operator v4.9.0-43
hco-operator v4.9.0-52

How reproducible:
Happened in each run (if waiting long enough)

Steps to Reproduce:
1. Deploy deschduler operator with LifecycleAndUtilization profile
2. Configure the thresholds to 
- under utilization" CPU=30 Mem=50 Pods=30
- above target utilization" CPU=60 Mem=60 Pods=60
3. Create and start enough VMs to not reach the upper bound (in this case - 10 VMs)
4. Drain a node (which does not run the deschduler pod)
5. Once the VMIs are migrated from this node, uncordon the node


Actual results:
After step #5, the VMIs are constantly migrating from node to node. Each time a different node is reported as overutilized.

After about 12-30 minutes (each time happened after a different time), VMIs are stopped and these VMs are restarted.During this entire period the nodes are Ready. 
Happened also on a cluster with constantly migrating VMIs (each time a different node was loaded) without node drain.


Expected results:
Even if VMIs are constantly migrated, as long as the nodes are up, the VMs should not be restarted.

Additional info:
Relevant VM in this case - vm-9-1632063772-7819707

++++++++++++++++++++++ VMI stopped +++++++++++++++++++++++++++++++++++++
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    12m     Running   10.129.3.27    ruty-490-9f-8jvg4-worker-0-vl8wr   False
ssp-descheduler-test-descheduler   vm-5-1632063770-1284065    12m     Running   10.131.0.239   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-4-1632063769-2268672    12m     Running   10.131.0.221   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-1-1632063766-807854     13m     Running   10.131.0.240   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-10-1632063773-6094723   12m     Running   10.131.0.232   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-8-1632063772-0740237    12m     Running   10.131.0.220   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    12m     Succeeded   10.129.3.27    ruty-490-9f-8jvg4-worker-0-vl8wr   False
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    12m     Succeeded   10.129.3.27    ruty-490-9f-8jvg4-worker-0-vl8wr   False
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    12m     Succeeded   10.129.3.27    ruty-490-9f-8jvg4-worker-0-vl8wr   False
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    12m     Succeeded   10.129.3.27    ruty-490-9f-8jvg4-worker-0-vl8wr   False
ssp-descheduler-test-descheduler   vm-10-1632063773-6094723   12m     Running     10.131.0.232   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-10-1632063773-6094723   12m     Running     10.131.0.232   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-2-1632063767-553122     13m     Running     10.129.3.30    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-2-1632063767-553122     13m     Running     10.129.3.30    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-7-1632063771-4180167    13m     Running     10.129.3.28    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-7-1632063771-4180167    13m     Running     10.129.3.28    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-7-1632063771-4180167    13m     Running     10.129.3.28    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    13m     Succeeded   10.129.3.27    ruty-490-9f-8jvg4-worker-0-vl8wr   False
ssp-descheduler-test-descheduler   vm-7-1632063771-4180167    13m     Running     10.129.3.28    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-10-1632063773-6094723   13m     Running     10.131.0.232   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-2-1632063767-553122     13m     Running     10.129.3.30    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-10-1632063773-6094723   13m     Running     10.131.0.232   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-10-1632063773-6094723   13m     Running     10.131.0.232   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-2-1632063767-553122     13m     Running     10.129.3.30    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-2-1632063767-553122     13m     Running     10.129.3.30    ruty-490-9f-8jvg4-worker-0-vl8wr   True
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    13m     Succeeded   10.129.3.27    ruty-490-9f-8jvg4-worker-0-vl8wr   False
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    0s      Pending                                                       
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    0s      Pending                                                       
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    1s      Pending                                                       False
ssp-descheduler-test-descheduler   vm-10-1632063773-6094723   13m     Running     10.131.0.232   ruty-490-9f-8jvg4-worker-0-77dk5   True
ssp-descheduler-test-descheduler   vm-9-1632063772-7819707    1s      Scheduling                                                     False


++++++++++++++++++++++ VMI stopped +++++++++++++++++++++++++++++++++++++
$ oc logs -n openshift-cnv virt-controller-6cb978548b-m7tsg |grep "vm-9-1632063772-7819707"
{"component":"virt-controller","kind":"","level":"info","msg":"creating pdb for VMI ssp-descheduler-test-descheduler/vm-9-1632063772-7819707 to protect migration kubevirt-evacuation-wl7fr","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"migration.go:561","timestamp":"2021-09-19T15:15:44.687337Z","uid":"ce4d765b-9893-40b8-921e-bde985d1a2ae"}
{"component":"virt-controller","kind":"","level":"info","msg":"arguments for container-disk \"volumecontainerdisk\": --copy-path /var/run/kubevirt-ephemeral-disks/container-disk-data/ce4d765b-9893-40b8-921e-bde985d1a2ae/disk_0","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"container-disk.go:294","timestamp":"2021-09-19T15:15:44.687641Z","uid":"ce4d765b-9893-40b8-921e-bde985d1a2ae"}
{"component":"virt-controller","kind":"","level":"info","msg":"Created migration target pod ssp-descheduler-test-descheduler/virt-launcher-vm-9-1632063772-7819707-vt5sl with uuid c192b416-3554-4453-b5d0-04ddecceb1ad for migration kubevirt-evacuation-wl7fr with uuid 3434d4d2-a4bc-4361-b660-46534c56cc49","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"migration.go:532","timestamp":"2021-09-19T15:15:44.764279Z","uid":"ce4d765b-9893-40b8-921e-bde985d1a2ae"}
{"component":"virt-controller","level":"info","msg":"reenqueuing VirtualMachineInstance ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","pos":"vmi.go:230","reason":"patching of vmi conditions and activePods failed: the server rejected our request due to an error in our request","timestamp":"2021-09-19T15:15:44.950192Z"}
{"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:15:53.738137Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","kind":"","level":"info","msg":"Dispatching delete event for vmi ssp-descheduler-test-descheduler/vm-9-1632063772-7819707 with phase Succeeded","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:936","timestamp":"2021-09-19T15:15:53.821580Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:15:53.921576Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","level":"error","msg":"Clobbering existing delete keys: map[ssp-descheduler-test-descheduler/virt-launcher-vm-9-1632063772-7819707-vt5sl:{}]","pos":"expectations.go:327","timestamp":"2021-09-19T15:15:53.964327Z"}
{"component":"virt-controller","level":"info","msg":"reenqueuing VirtualMachineInstance ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","pos":"vmi.go:230","reason":"Operation cannot be fulfilled on virtualmachineinstances.kubevirt.io \"vm-9-1632063772-7819707\": the object has been modified; please apply your changes to the latest version and try again","timestamp":"2021-09-19T15:15:54.091689Z"}
{"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:15:54.246137Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:16:00.200776Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","kind":"","level":"info","msg":"Starting VM due to runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:605","timestamp":"2021-09-19T15:16:09.865919Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","kind":"","level":"info","msg":"creating pdb for VMI ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"disruptionbudget.go:512","timestamp":"2021-09-19T15:16:09.935669Z","uid":"b13e536d-77f7-4865-b644-a1ec51d1ad2f"}
{"component":"virt-controller","kind":"","level":"info","msg":"Started VM by creating the new virtual machine instance vm-9-1632063772-7819707","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:777","timestamp":"2021-09-19T15:16:09.936094Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","kind":"","level":"info","msg":"arguments for container-disk \"volumecontainerdisk\": --copy-path /var/run/kubevirt-ephemeral-disks/container-disk-data/b13e536d-77f7-4865-b644-a1ec51d1ad2f/disk_0","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"container-disk.go:294","timestamp":"2021-09-19T15:16:09.960291Z","uid":"b13e536d-77f7-4865-b644-a1ec51d1ad2f"}
{"component":"virt-controller","kind":"","level":"error","msg":"Updating the VirtualMachine status failed.","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:307","reason":"Operation cannot be fulfilled on virtualmachines.kubevirt.io \"vm-9-1632063772-7819707\": the object has been modified; please apply your changes to the latest version and try again","timestamp":"2021-09-19T15:16:10.841025Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"}
{"component":"virt-controller","level":"info","msg":"re-enqueuing VirtualMachine ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","pos":"vm.go:175","reason":"Operation cannot be fulfilled on virtualmachines.kubevirt.io \"vm-9-1632063772-7819707\": the object has been modified; please apply your changes to the latest version and try again","timestamp":"2021-09-19T15:16:10.841103Z"}


+++++++++++++++++++++++ VM ++++++++++++++++++++++++++++++++++++
$ oc describe vm -n ssp-descheduler-test-descheduler vm-9-1632063772-7819707
Name:         vm-9-1632063772-7819707
Namespace:    ssp-descheduler-test-descheduler
Labels:       kubevirt.io/vm=vm-9-1632063772-7819707
Annotations:  kubemacpool.io/transaction-timestamp: 2021-09-19T15:02:54.773105644Z
              kubevirt.io/latest-observed-api-version: v1
              kubevirt.io/storage-observed-api-version: v1alpha3
API Version:  kubevirt.io/v1
Kind:         VirtualMachine
Metadata:
  Creation Timestamp:  2021-09-19T15:02:53Z
  Generation:          2
  Managed Fields:
    API Version:  kubevirt.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:kubevirt.io/vm:
      f:spec:
        .:
        f:template:
          .:
          f:metadata:
            .:
            f:annotations:
            f:creationTimestamp:
            f:labels:
          f:spec:
            .:
            f:domain:
              .:
              f:cpu:
                .:
                f:cores:
                f:model:
              f:devices:
                .:
                f:disks:
                f:interfaces:
                f:rng:
              f:machine:
                .:
                f:type:
              f:resources:
                .:
                f:requests:
                  .:
                  f:memory:
            f:evictionStrategy:
            f:networks:
            f:terminationGracePeriodSeconds:
            f:volumes:
    Manager:      OpenAPI-Generator
    Operation:    Update
    Time:         2021-09-19T15:02:53Z
    API Version:  kubevirt.io/v1alpha3
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kubevirt.io/latest-observed-api-version:
          f:kubevirt.io/storage-observed-api-version:
      f:status:
        .:
        f:volumeSnapshotStatuses:
    Manager:      Go-http-client
    Operation:    Update
    Time:         2021-09-19T15:02:54Z
    API Version:  kubevirt.io/v1alpha3
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
        f:created:
        f:printableStatus:
        f:ready:
    Manager:         Go-http-client
    Operation:       Update
    Subresource:     status
    Time:            2021-09-19T15:16:23Z
  Resource Version:  14258198
  UID:               d8e4dc0f-d7ea-4045-9110-e5e4edb61dff
Spec:
  Running:  true
  Template:
    Metadata:
      Annotations:
        descheduler.alpha.kubernetes.io/evict:  true
      Creation Timestamp:                       <nil>
      Labels:
        kubevirt.io/domain:  vm-9-1632063772-7819707
        kubevirt.io/vm:      vm-9-1632063772-7819707
    Spec:
      Domain:
        Cpu:
          Cores:  1
          Model:  Haswell-noTSX
        Devices:
          Disks:
            Disk:
              Bus:  virtio
            Name:   containerdisk
            Disk:
              Bus:  virtio
            Name:   cloudinitdisk
          Interfaces:
            Mac Address:  02:bc:d2:00:02:98
            Masquerade:
            Name:  default
          Rng:
        Machine:
          Type:  pc-q35-rhel8.1.0
        Resources:
          Requests:
            Memory:       1Gi
      Eviction Strategy:  LiveMigrate
      Networks:
        Name:  default
        Pod:
      Termination Grace Period Seconds:  30
      Volumes:
        Container Disk:
          Image:  quay.io/openshift-cnv/qe-cnv-tests-fedora:34
        Name:     containerdisk
        Cloud Init No Cloud:
          User Data:  #cloud-config
chpasswd:
  expire: false
password: password
user: fedora

ssh_authorized_keys:
 [ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCj47ubVnxR16JU7ZfDli3N5QVBAwJBRh2xMryyjk5dtfugo5JIPGB2cyXTqEDdzuRmI+Vkb/A5duJyBRlA+9RndGGmhhMnj8and3wu5/cEb7DkF6ZJ25QV4LQx3K/i57LStUHXRTvruHOZ2nCuVXWqi7wSvz5YcvEv7O8pNF5uGmqHlShBdxQxcjurXACZ1YY0YDJDr3AJai1KF9zehVJODuSbrnOYpThVWGjFuFAnNxbtuZ8EOSougN2aYTf2qr/KFGDHtewIkzZmP6cjzKO5bN3pVbXxmb2Gces/BYHntY4MXBTUqwsmsCRC5SAz14bEP/vsLtrNhjq9vCS+BjMT root]
runcmd: ["sudo sed -i '/^PubkeyAccepted/ s/$/,ssh-rsa/' /etc/crypto-policies/back-ends/opensshserver.config", "sudo sed -i 's/^#\\?PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config", 'sudo systemctl enable sshd', 'sudo systemctl restart sshd']
        Name:  cloudinitdisk
Status:
  Conditions:
    Last Probe Time:       <nil>
    Last Transition Time:  2021-09-19T15:16:18Z
    Status:                True
    Type:                  Ready
  Created:                 true
  Printable Status:        Running
  Ready:                   true
  Volume Snapshot Statuses:
    Enabled:  false
    Name:     containerdisk
    Reason:   Snapshot is not supported for this volumeSource type [containerdisk]
    Enabled:  false
    Name:     cloudinitdisk
    Reason:   Snapshot is not supported for this volumeSource type [cloudinitdisk]
Events:
  Type    Reason            Age    From                       Message
  ----    ------            ----   ----                       -------
  Normal  SuccessfulCreate  19m    virtualmachine-controller  Started the virtual machine by creating the new virtual machine instance vm-9-1632063772-7819707
  Normal  SuccessfulDelete  6m56s  virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance ce4d765b-9893-40b8-921e-bde985d1a2ae
  Normal  SuccessfulCreate  6m40s  virtualmachine-controller  Started the virtual machine by creating the new virtual machine instance vm-9-1632063772-7819707


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Comment 6 Kedar Bidarkar 2021-09-23 16:40:30 UTC

I tried to reproduce this today,

Observation:
0) descheduler pod was running on node-14 [1] , node-13 was drained for testing.
1) The descheduler was constantly evicting the VMI pods in a loop, since draining the node-13 and uncordon it.
2) See that "vm-15-1632401643-0371675" got restarted, that is failed to LiveMigrate.
3) From the above log comment, 
a) descheduler tried to evict , vm-2, vm-16, vm-4 and vm-15
b) As seen above, PDB appears to have kicked in for vm-2, vm-16 and vm-4 [2]
c) But for vm-15 [3]  we see no message as " Cannot evict pod as it would violate the pod's disruption budget."

d) It is this vm-15 vm which got restarted.


[1]:
]$ oc get pods -o wide -n openshift-kube-descheduler-operator
NAME                       READY   STATUS    RESTARTS   AGE     IP             NODE                               NOMINATED NODE   READINESS GATES
cluster-66dc45556d-hz9r6   1/1     Running   0          3h41m   <ip-address>   node-14.redhat.com   <none>           <none>

[2]:
E0923 16:11:42.130745       1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-2-1632401447-6913495-hdv64\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-2-1632401447-6913495-hdv64" reason="LowNodeUtilization"
E0923 16:11:42.309071       1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-16-1632401657-969292-s4s2k\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-16-1632401657-969292-s4s2k" reason="LowNodeUtilization"
E0923 16:11:43.159743       1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-4-1632401477-5879517-rd8c8\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-4-1632401477-5879517-rd8c8" reason="LowNodeUtilization"

[3]:
I0923 16:11:43.677481       1 evictions.go:130] "Evicted pod" pod="ssp-descheduler-test-descheduler/virt-launcher-vm-15-1632401643-0371675-xcjb7" reason="LowNodeUtilization"

Comment 12 Fabian Deutsch 2021-09-29 12:15:48 UTC

Stu, could this be related to bug #2008511?

Comment 16 Dominik Holler 2021-10-06 10:16:31 UTC

Hello Antonio,
can you help us to understand what is happening, and if the related behavior might be considered as a regression?

Comment 17 Antonio Cardace 2021-10-06 12:40:09 UTC

Hi all,

Dominik, I will take a look at the logs and see if I can spot anything.

@sbennert I don't think the PDB situation is causing this behaviour as if you look here https://github.com/kubernetes/kubernetes/blob/d7123a65248e25b86018ba8220b671cd483d6797/pkg/registry/core/pod/storage/eviction.go#L256 (the function exits early here) you will see that when k8s finds 2 overlapping PDBs the eviction will not go through and no pod will actually be deleted.

Anyway, this has been recently reworked upstream by https://github.com/kubevirt/kubevirt/pull/6297 which you might want to backport in 4.9, but as I said the eviction doesn't go through when there are multiple PDBs so this should not be a problem but you can repeat the test with that PR in just to make sure (maybe it's the cause anyways but for another reason).

Comment 18 sgott 2021-10-06 13:03:48 UTC

Given the course of this discussion so far, it appears this is likely a Virtualization (vice SSP) bug, thus re-assigning the component to Virt.

Kedar, can we try to create a loop to repeatedly evict all VMs to see if this can be reproduced without the descheduler?
Also, can you try to evict all VMs of a node one-by-one?

Comment 20 Antonio Cardace 2021-10-06 13:40:41 UTC

@dholler I've looked at the logs and saw nothing strange in terms of the VMI lifecycle, I just saw that the VMI was signaled to terminate as if the eviction request went through and 'escaped' the PDB.

I think this might be caused by a race between when the migration PDB is created and when is actually created by the API server and processed by the k8s PDB controller, so if the following events happen in this order:

- migration is created
- eviction request is created
- migration-controller requests to create an additional (migration) PDB
- target pod is created
- k8s pdb controller runs and updates the VMI's PDB status doing pdb.Status.CurrentHealthy++
- eviction is processed by k8s which finds 1 PDB with pdb.Status.CurrentHealthy > pdb.Status.DesiredHealthy
- migration PDB is created by the API server
- k8s pdb controller runs and updates the migration PDB status

One of the 2 pods of the VMI could actually be killed.

Fortunately, https://github.com/kubevirt/kubevirt/pull/6297 fixes this whole mess and this race is no longer possible as the migration controller actually waits for k8s PDB controller to process the PDB creation and only afterward it creates the migration target pod.

In short, my guess is that if you backport PR#6297 everything should be fine.

Comment 21 Antonio Cardace 2021-10-06 14:35:47 UTC

I just created https://github.com/kubevirt/kubevirt/pull/6532 to backport the patch to CNV 4.9.

Comment 24 sgott 2021-10-08 21:22:29 UTC

To verify: repeat steps in description

Comment 25 Sarah Bennert 2021-10-09 18:05:04 UTC

Verified VMIs remain in a running state during evict actions of descheduler.

OCP 4.9.0-rc.5
CNV 4.9.0-246

Comment 28 errata-xmlrpc 2021-11-02 16:01:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104

Note You need to log in before you can comment on or make changes to this bug.