Bug 2005695
Summary: | With descheduler during multiple VMIs migrations, some VMs are restarted | |||
---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Ruth Netser <rnetser> | |
Component: | Virtualization | Assignee: | Antonio Cardace <acardace> | |
Status: | CLOSED ERRATA | QA Contact: | Sarah Bennert <sbennert> | |
Severity: | high | Docs Contact: | ||
Priority: | urgent | |||
Version: | 4.9.0 | CC: | acardace, cnv-qe-bugs, dholler, fdeutsch, kbidarka, ksimon, sbennert, sgott | |
Target Milestone: | --- | |||
Target Release: | 4.9.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | virt-operator-container-v4.9.0-57 hco-bundle-registry-container-v4.9.0-244 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2012890 (view as bug list) | Environment: | ||
Last Closed: | 2021-11-02 16:01:09 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2012890 |
Description
Ruth Netser
2021-09-19 15:42:15 UTC
I tried to reproduce this today, Observation: 0) descheduler pod was running on node-14 [1] , node-13 was drained for testing. 1) The descheduler was constantly evicting the VMI pods in a loop, since draining the node-13 and uncordon it. 2) See that "vm-15-1632401643-0371675" got restarted, that is failed to LiveMigrate. 3) From the above log comment, a) descheduler tried to evict , vm-2, vm-16, vm-4 and vm-15 b) As seen above, PDB appears to have kicked in for vm-2, vm-16 and vm-4 [2] c) But for vm-15 [3] we see no message as " Cannot evict pod as it would violate the pod's disruption budget." d) It is this vm-15 vm which got restarted. [1]: ]$ oc get pods -o wide -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-66dc45556d-hz9r6 1/1 Running 0 3h41m <ip-address> node-14.redhat.com <none> <none> [2]: E0923 16:11:42.130745 1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-2-1632401447-6913495-hdv64\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-2-1632401447-6913495-hdv64" reason="LowNodeUtilization" E0923 16:11:42.309071 1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-16-1632401657-969292-s4s2k\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-16-1632401657-969292-s4s2k" reason="LowNodeUtilization" E0923 16:11:43.159743 1 evictions.go:121] "Error evicting pod" err="error when evicting pod (ignoring) \"virt-launcher-vm-4-1632401477-5879517-rd8c8\": Cannot evict pod as it would violate the pod's disruption budget." pod="ssp-descheduler-test-descheduler/virt-launcher-vm-4-1632401477-5879517-rd8c8" reason="LowNodeUtilization" [3]: I0923 16:11:43.677481 1 evictions.go:130] "Evicted pod" pod="ssp-descheduler-test-descheduler/virt-launcher-vm-15-1632401643-0371675-xcjb7" reason="LowNodeUtilization" Stu, could this be related to bug #2008511? Hello Antonio, can you help us to understand what is happening, and if the related behavior might be considered as a regression? Hi all, Dominik, I will take a look at the logs and see if I can spot anything. @sbennert I don't think the PDB situation is causing this behaviour as if you look here https://github.com/kubernetes/kubernetes/blob/d7123a65248e25b86018ba8220b671cd483d6797/pkg/registry/core/pod/storage/eviction.go#L256 (the function exits early here) you will see that when k8s finds 2 overlapping PDBs the eviction will not go through and no pod will actually be deleted. Anyway, this has been recently reworked upstream by https://github.com/kubevirt/kubevirt/pull/6297 which you might want to backport in 4.9, but as I said the eviction doesn't go through when there are multiple PDBs so this should not be a problem but you can repeat the test with that PR in just to make sure (maybe it's the cause anyways but for another reason). Given the course of this discussion so far, it appears this is likely a Virtualization (vice SSP) bug, thus re-assigning the component to Virt. Kedar, can we try to create a loop to repeatedly evict all VMs to see if this can be reproduced without the descheduler? Also, can you try to evict all VMs of a node one-by-one? @dholler I've looked at the logs and saw nothing strange in terms of the VMI lifecycle, I just saw that the VMI was signaled to terminate as if the eviction request went through and 'escaped' the PDB. I think this might be caused by a race between when the migration PDB is created and when is actually created by the API server and processed by the k8s PDB controller, so if the following events happen in this order: - migration is created - eviction request is created - migration-controller requests to create an additional (migration) PDB - target pod is created - k8s pdb controller runs and updates the VMI's PDB status doing pdb.Status.CurrentHealthy++ - eviction is processed by k8s which finds 1 PDB with pdb.Status.CurrentHealthy > pdb.Status.DesiredHealthy - migration PDB is created by the API server - k8s pdb controller runs and updates the migration PDB status One of the 2 pods of the VMI could actually be killed. Fortunately, https://github.com/kubevirt/kubevirt/pull/6297 fixes this whole mess and this race is no longer possible as the migration controller actually waits for k8s PDB controller to process the PDB creation and only afterward it creates the migration target pod. In short, my guess is that if you backport PR#6297 everything should be fine. I just created https://github.com/kubevirt/kubevirt/pull/6532 to backport the patch to CNV 4.9. To verify: repeat steps in description Verified VMIs remain in a running state during evict actions of descheduler. OCP 4.9.0-rc.5 CNV 4.9.0-246 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104 |