Description of problem: During tests with descheduler, VMIs were migrated continuously from node to node. At some point some VMs were restarted. Version-Release number of selected component (if applicable): CNV 4.9.0 virt-operator v4.9.0-43 hco-operator v4.9.0-52 How reproducible: Happened in each run (if waiting long enough) Steps to Reproduce: 1. Deploy deschduler operator with LifecycleAndUtilization profile 2. Configure the thresholds to - under utilization" CPU=30 Mem=50 Pods=30 - above target utilization" CPU=60 Mem=60 Pods=60 3. Create and start enough VMs to not reach the upper bound (in this case - 10 VMs) 4. Drain a node (which does not run the deschduler pod) 5. Once the VMIs are migrated from this node, uncordon the node Actual results: After step #5, the VMIs are constantly migrating from node to node. Each time a different node is reported as overutilized. After about 12-30 minutes (each time happened after a different time), VMIs are stopped and these VMs are restarted.During this entire period the nodes are Ready. Happened also on a cluster with constantly migrating VMIs (each time a different node was loaded) without node drain. Expected results: Even if VMIs are constantly migrated, as long as the nodes are up, the VMs should not be restarted. Additional info: Relevant VM in this case - vm-9-1632063772-7819707 ++++++++++++++++++++++ VMI stopped +++++++++++++++++++++++++++++++++++++ ssp-descheduler-test-descheduler vm-9-1632063772-7819707 12m Running 10.129.3.27 ruty-490-9f-8jvg4-worker-0-vl8wr False ssp-descheduler-test-descheduler vm-5-1632063770-1284065 12m Running 10.131.0.239 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-4-1632063769-2268672 12m Running 10.131.0.221 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-1-1632063766-807854 13m Running 10.131.0.240 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-10-1632063773-6094723 12m Running 10.131.0.232 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-8-1632063772-0740237 12m Running 10.131.0.220 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-9-1632063772-7819707 12m Succeeded 10.129.3.27 ruty-490-9f-8jvg4-worker-0-vl8wr False ssp-descheduler-test-descheduler vm-9-1632063772-7819707 12m Succeeded 10.129.3.27 ruty-490-9f-8jvg4-worker-0-vl8wr False ssp-descheduler-test-descheduler vm-9-1632063772-7819707 12m Succeeded 10.129.3.27 ruty-490-9f-8jvg4-worker-0-vl8wr False ssp-descheduler-test-descheduler vm-9-1632063772-7819707 12m Succeeded 10.129.3.27 ruty-490-9f-8jvg4-worker-0-vl8wr False ssp-descheduler-test-descheduler vm-10-1632063773-6094723 12m Running 10.131.0.232 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-10-1632063773-6094723 12m Running 10.131.0.232 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-2-1632063767-553122 13m Running 10.129.3.30 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-2-1632063767-553122 13m Running 10.129.3.30 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-7-1632063771-4180167 13m Running 10.129.3.28 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-7-1632063771-4180167 13m Running 10.129.3.28 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-7-1632063771-4180167 13m Running 10.129.3.28 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-9-1632063772-7819707 13m Succeeded 10.129.3.27 ruty-490-9f-8jvg4-worker-0-vl8wr False ssp-descheduler-test-descheduler vm-7-1632063771-4180167 13m Running 10.129.3.28 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-10-1632063773-6094723 13m Running 10.131.0.232 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-2-1632063767-553122 13m Running 10.129.3.30 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-10-1632063773-6094723 13m Running 10.131.0.232 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-10-1632063773-6094723 13m Running 10.131.0.232 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-2-1632063767-553122 13m Running 10.129.3.30 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-2-1632063767-553122 13m Running 10.129.3.30 ruty-490-9f-8jvg4-worker-0-vl8wr True ssp-descheduler-test-descheduler vm-9-1632063772-7819707 13m Succeeded 10.129.3.27 ruty-490-9f-8jvg4-worker-0-vl8wr False ssp-descheduler-test-descheduler vm-9-1632063772-7819707 0s Pending ssp-descheduler-test-descheduler vm-9-1632063772-7819707 0s Pending ssp-descheduler-test-descheduler vm-9-1632063772-7819707 1s Pending False ssp-descheduler-test-descheduler vm-10-1632063773-6094723 13m Running 10.131.0.232 ruty-490-9f-8jvg4-worker-0-77dk5 True ssp-descheduler-test-descheduler vm-9-1632063772-7819707 1s Scheduling False ++++++++++++++++++++++ VMI stopped +++++++++++++++++++++++++++++++++++++ $ oc logs -n openshift-cnv virt-controller-6cb978548b-m7tsg |grep "vm-9-1632063772-7819707" {"component":"virt-controller","kind":"","level":"info","msg":"creating pdb for VMI ssp-descheduler-test-descheduler/vm-9-1632063772-7819707 to protect migration kubevirt-evacuation-wl7fr","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"migration.go:561","timestamp":"2021-09-19T15:15:44.687337Z","uid":"ce4d765b-9893-40b8-921e-bde985d1a2ae"} {"component":"virt-controller","kind":"","level":"info","msg":"arguments for container-disk \"volumecontainerdisk\": --copy-path /var/run/kubevirt-ephemeral-disks/container-disk-data/ce4d765b-9893-40b8-921e-bde985d1a2ae/disk_0","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"container-disk.go:294","timestamp":"2021-09-19T15:15:44.687641Z","uid":"ce4d765b-9893-40b8-921e-bde985d1a2ae"} {"component":"virt-controller","kind":"","level":"info","msg":"Created migration target pod ssp-descheduler-test-descheduler/virt-launcher-vm-9-1632063772-7819707-vt5sl with uuid c192b416-3554-4453-b5d0-04ddecceb1ad for migration kubevirt-evacuation-wl7fr with uuid 3434d4d2-a4bc-4361-b660-46534c56cc49","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"migration.go:532","timestamp":"2021-09-19T15:15:44.764279Z","uid":"ce4d765b-9893-40b8-921e-bde985d1a2ae"} {"component":"virt-controller","level":"info","msg":"reenqueuing VirtualMachineInstance ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","pos":"vmi.go:230","reason":"patching of vmi conditions and activePods failed: the server rejected our request due to an error in our request","timestamp":"2021-09-19T15:15:44.950192Z"} {"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:15:53.738137Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","kind":"","level":"info","msg":"Dispatching delete event for vmi ssp-descheduler-test-descheduler/vm-9-1632063772-7819707 with phase Succeeded","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:936","timestamp":"2021-09-19T15:15:53.821580Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:15:53.921576Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","level":"error","msg":"Clobbering existing delete keys: map[ssp-descheduler-test-descheduler/virt-launcher-vm-9-1632063772-7819707-vt5sl:{}]","pos":"expectations.go:327","timestamp":"2021-09-19T15:15:53.964327Z"} {"component":"virt-controller","level":"info","msg":"reenqueuing VirtualMachineInstance ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","pos":"vmi.go:230","reason":"Operation cannot be fulfilled on virtualmachineinstances.kubevirt.io \"vm-9-1632063772-7819707\": the object has been modified; please apply your changes to the latest version and try again","timestamp":"2021-09-19T15:15:54.091689Z"} {"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:15:54.246137Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","kind":"","level":"info","msg":"Stopping VM with VMI in phase Succeeded and VM runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:581","timestamp":"2021-09-19T15:16:00.200776Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","kind":"","level":"info","msg":"Starting VM due to runStrategy: Always","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:605","timestamp":"2021-09-19T15:16:09.865919Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","kind":"","level":"info","msg":"creating pdb for VMI ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"disruptionbudget.go:512","timestamp":"2021-09-19T15:16:09.935669Z","uid":"b13e536d-77f7-4865-b644-a1ec51d1ad2f"} {"component":"virt-controller","kind":"","level":"info","msg":"Started VM by creating the new virtual machine instance vm-9-1632063772-7819707","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:777","timestamp":"2021-09-19T15:16:09.936094Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","kind":"","level":"info","msg":"arguments for container-disk \"volumecontainerdisk\": --copy-path /var/run/kubevirt-ephemeral-disks/container-disk-data/b13e536d-77f7-4865-b644-a1ec51d1ad2f/disk_0","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"container-disk.go:294","timestamp":"2021-09-19T15:16:09.960291Z","uid":"b13e536d-77f7-4865-b644-a1ec51d1ad2f"} {"component":"virt-controller","kind":"","level":"error","msg":"Updating the VirtualMachine status failed.","name":"vm-9-1632063772-7819707","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:307","reason":"Operation cannot be fulfilled on virtualmachines.kubevirt.io \"vm-9-1632063772-7819707\": the object has been modified; please apply your changes to the latest version and try again","timestamp":"2021-09-19T15:16:10.841025Z","uid":"d8e4dc0f-d7ea-4045-9110-e5e4edb61dff"} {"component":"virt-controller","level":"info","msg":"re-enqueuing VirtualMachine ssp-descheduler-test-descheduler/vm-9-1632063772-7819707","pos":"vm.go:175","reason":"Operation cannot be fulfilled on virtualmachines.kubevirt.io \"vm-9-1632063772-7819707\": the object has been modified; please apply your changes to the latest version and try again","timestamp":"2021-09-19T15:16:10.841103Z"} +++++++++++++++++++++++ VM ++++++++++++++++++++++++++++++++++++ $ oc describe vm -n ssp-descheduler-test-descheduler vm-9-1632063772-7819707 Name: vm-9-1632063772-7819707 Namespace: ssp-descheduler-test-descheduler Labels: kubevirt.io/vm=vm-9-1632063772-7819707 Annotations: kubemacpool.io/transaction-timestamp: 2021-09-19T15:02:54.773105644Z kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 API Version: kubevirt.io/v1 Kind: VirtualMachine Metadata: Creation Timestamp: 2021-09-19T15:02:53Z Generation: 2 Managed Fields: API Version: kubevirt.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:kubevirt.io/vm: f:spec: .: f:template: .: f:metadata: .: f:annotations: f:creationTimestamp: f:labels: f:spec: .: f:domain: .: f:cpu: .: f:cores: f:model: f:devices: .: f:disks: f:interfaces: f:rng: f:machine: .: f:type: f:resources: .: f:requests: .: f:memory: f:evictionStrategy: f:networks: f:terminationGracePeriodSeconds: f:volumes: Manager: OpenAPI-Generator Operation: Update Time: 2021-09-19T15:02:53Z API Version: kubevirt.io/v1alpha3 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: f:kubevirt.io/latest-observed-api-version: f:kubevirt.io/storage-observed-api-version: f:status: .: f:volumeSnapshotStatuses: Manager: Go-http-client Operation: Update Time: 2021-09-19T15:02:54Z API Version: kubevirt.io/v1alpha3 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:created: f:printableStatus: f:ready: Manager: Go-http-client Operation: Update Subresource: status Time: 2021-09-19T15:16:23Z Resource Version: 14258198 UID: d8e4dc0f-d7ea-4045-9110-e5e4edb61dff Spec: Running: true Template: Metadata: Annotations: descheduler.alpha.kubernetes.io/evict: true Creation Timestamp: <nil> Labels: kubevirt.io/domain: vm-9-1632063772-7819707 kubevirt.io/vm: vm-9-1632063772-7819707 Spec: Domain: Cpu: Cores: 1 Model: Haswell-noTSX Devices: Disks: Disk: Bus: virtio Name: containerdisk Disk: Bus: virtio Name: cloudinitdisk Interfaces: Mac Address: 02:bc:d2:00:02:98 Masquerade: Name: default Rng: Machine: Type: pc-q35-rhel8.1.0 Resources: Requests: Memory: 1Gi Eviction Strategy: LiveMigrate Networks: Name: default Pod: Termination Grace Period Seconds: 30 Volumes: Container Disk: Image: quay.io/openshift-cnv/qe-cnv-tests-fedora:34 Name: containerdisk Cloud Init No Cloud: User Data: #cloud-config chpasswd: expire: false password: password user: fedora ssh_authorized_keys: [ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCj47ubVnxR16JU7ZfDli3N5QVBAwJBRh2xMryyjk5dtfugo5JIPGB2cyXTqEDdzuRmI+Vkb/A5duJyBRlA+9RndGGmhhMnj8and3wu5/cEb7DkF6ZJ25QV4LQx3K/i57LStUHXRTvruHOZ2nCuVXWqi7wSvz5YcvEv7O8pNF5uGmqHlShBdxQxcjurXACZ1YY0YDJDr3AJai1KF9zehVJODuSbrnOYpThVWGjFuFAnNxbtuZ8EOSougN2aYTf2qr/KFGDHtewIkzZmP6cjzKO5bN3pVbXxmb2Gces/BYHntY4MXBTUqwsmsCRC5SAz14bEP/vsLtrNhjq9vCS+BjMT root] runcmd: ["sudo sed -i '/^PubkeyAccepted/ s/$/,ssh-rsa/' /etc/crypto-policies/back-ends/opensshserver.config", "sudo sed -i 's/^#\\?PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config", 'sudo systemctl enable sshd', 'sudo systemctl restart sshd'] Name: cloudinitdisk Status: Conditions: Last Probe Time: <nil> Last Transition Time: 2021-09-19T15:16:18Z Status: True Type: Ready Created: true Printable Status: Running Ready: true Volume Snapshot Statuses: Enabled: false Name: containerdisk Reason: Snapshot is not supported for this volumeSource type [containerdisk] Enabled: false Name: cloudinitdisk Reason: Snapshot is not supported for this volumeSource type [cloudinitdisk] Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 19m virtualmachine-controller Started the virtual machine by creating the new virtual machine instance vm-9-1632063772-7819707 Normal SuccessfulDelete 6m56s virtualmachine-controller Stopped the virtual machine by deleting the virtual machine instance ce4d765b-9893-40b8-921e-bde985d1a2ae Normal SuccessfulCreate 6m40s virtualmachine-controller Started the virtual machine by creating the new virtual machine instance vm-9-1632063772-7819707 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I tried to reproduce this today, Observation: 0) descheduler pod was running on node-14 [1] , node-13 was drained for testing. 1) The descheduler was constantly evicting the VMI pods in a loop, since draining the node-13 and uncordon it. 2) See that "vm-15-1632401643-0371675" got restarted, that is failed to LiveMigrate. 3) From the above log comment, a) descheduler tried to evict , vm-2, vm-16, vm-4 and vm-15 b) As seen above, PDB appears to have kicked in for vm-2, vm-16 and vm-4 [2] c) But for vm-15 [3] we see no message as " Cannot evict pod as it would violate the pod's disruption budget." d) It is this vm-15 vm which got restarted. [1]: ]$ oc get pods -o wide -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-66dc45556d-hz9r6 1/1 Running 0 3h41m <ip-address> node-14.redhat.com <none> <none> [2]: E0923 16:11:42.130745 1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-2-1632401447-6913495-hdv64\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-2-1632401447-6913495-hdv64" reason="LowNodeUtilization" E0923 16:11:42.309071 1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-16-1632401657-969292-s4s2k\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-16-1632401657-969292-s4s2k" reason="LowNodeUtilization" E0923 16:11:43.159743 1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-4-1632401477-5879517-rd8c8\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-4-1632401477-5879517-rd8c8" reason="LowNodeUtilization" [3]: I0923 16:11:43.677481 1 evictions.go:130] "Evicted pod" pod="ssp-descheduler-test-descheduler/virt-launcher-vm-15-1632401643-0371675-xcjb7" reason="LowNodeUtilization"
Stu, could this be related to bug #2008511?
Hello Antonio, can you help us to understand what is happening, and if the related behavior might be considered as a regression?
Hi all, Dominik, I will take a look at the logs and see if I can spot anything. @sbennert I don't think the PDB situation is causing this behaviour as if you look here https://github.com/kubernetes/kubernetes/blob/d7123a65248e25b86018ba8220b671cd483d6797/pkg/registry/core/pod/storage/eviction.go#L256 (the function exits early here) you will see that when k8s finds 2 overlapping PDBs the eviction will not go through and no pod will actually be deleted. Anyway, this has been recently reworked upstream by https://github.com/kubevirt/kubevirt/pull/6297 which you might want to backport in 4.9, but as I said the eviction doesn't go through when there are multiple PDBs so this should not be a problem but you can repeat the test with that PR in just to make sure (maybe it's the cause anyways but for another reason).
Given the course of this discussion so far, it appears this is likely a Virtualization (vice SSP) bug, thus re-assigning the component to Virt. Kedar, can we try to create a loop to repeatedly evict all VMs to see if this can be reproduced without the descheduler? Also, can you try to evict all VMs of a node one-by-one?
@dholler I've looked at the logs and saw nothing strange in terms of the VMI lifecycle, I just saw that the VMI was signaled to terminate as if the eviction request went through and 'escaped' the PDB. I think this might be caused by a race between when the migration PDB is created and when is actually created by the API server and processed by the k8s PDB controller, so if the following events happen in this order: - migration is created - eviction request is created - migration-controller requests to create an additional (migration) PDB - target pod is created - k8s pdb controller runs and updates the VMI's PDB status doing pdb.Status.CurrentHealthy++ - eviction is processed by k8s which finds 1 PDB with pdb.Status.CurrentHealthy > pdb.Status.DesiredHealthy - migration PDB is created by the API server - k8s pdb controller runs and updates the migration PDB status One of the 2 pods of the VMI could actually be killed. Fortunately, https://github.com/kubevirt/kubevirt/pull/6297 fixes this whole mess and this race is no longer possible as the migration controller actually waits for k8s PDB controller to process the PDB creation and only afterward it creates the migration target pod. In short, my guess is that if you backport PR#6297 everything should be fine.
I just created https://github.com/kubevirt/kubevirt/pull/6532 to backport the patch to CNV 4.9.
To verify: repeat steps in description
Verified VMIs remain in a running state during evict actions of descheduler. OCP 4.9.0-rc.5 CNV 4.9.0-246
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104