Description of problem: Unable to delete failed VMIM after VM deleted Version-Release number of selected component (if applicable): Bundle v4.10.0-551 How reproducible: Intermittent. Steps to Reproduce: 0. With descheduler installed: 1. Create VMs on nodes 2. Drain single node, causing evictions 3. Uncordon same node, causing descheduler evictions 4. Delete VMs Actual results: $ oc -n ssp-descheduler-test-descheduler delete vmim kubevirt-evacuation-p2wbc --timeout=60s virtualmachineinstancemigration.kubevirt.io "kubevirt-evacuation-p2wbc" deleted error: timed out waiting for the condition on virtualmachineinstancemigrations/kubevirt-evacuation-p2wbc does not delete, hits timeout. virt-controller error: {"component":"virt-controller","kind":"","level":"error","msg":"Updating the VirtualMachine status failed.","name":"vm-1-1642014474-2693942","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:311","reason":"Operation cannot be fulfilled on virtualmachines.{"component":"virt-controller","kind":"","level":"error","msg":"Unable to migrate vmi because vmi is shutdown.","name":"kubevirt-evacuation-p2wbc","namespace":"ssp-descheduler-test-descheduler","pos":"migration.go:394","timestamp":"2022-01-12T19:24:28.323305Z","uid":"154d9a41-5338-44e7-8445-cc66fc18852d"} Expected results: VMIM deleted Additional info: Failed VMIM: apiVersion: kubevirt.io/v1 kind: VirtualMachineInstanceMigration metadata: annotations: kubevirt.io/evacuationMigration: c01-sb410a-5zp8b-worker-0-c85cq kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 creationTimestamp: "2022-01-12T19:24:22Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2022-01-12T19:24:31Z" finalizers: - kubevirt.io/migrationJobFinalize generateName: kubevirt-evacuation- generation: 2 labels: kubevirt.io/vmi-name: vm-3-1642014475-2945707 name: kubevirt-evacuation-p2wbc namespace: ssp-descheduler-test-descheduler resourceVersion: "11431506" uid: 154d9a41-5338-44e7-8445-cc66fc18852d spec: vmiName: vm-3-1642014475-2945707 status: phase: Failed
The reason the VMIM cannot be deleted is that it still has a finalizer. Can you confirm that the VMIM stays in a failed state with a finalizer for a while? It might be that the controller just hasn't had a chance to remove the finalizer yet.
Yes, the only way I was able to get past this was to remove the finalizer after a period of time. Had attempted to run "oc delete" without timeout flag, left sitting for well over 5 minutes and was not deleted.
Looking closely, is this basically the same as: https://bugzilla.redhat.com/show_bug.cgi?id=1719190 Yes one is about a VMI that's not yet scheduled and the other is about a VMI that just ceased to exist, but in both cases there's a VMIM that's lacking a VMI.
The main difference here is there is no pending virt-launcher pod, all resources were deleted except for VMIM.
Due to capacity this bug is being moved to 4.12.
*** Bug 2105031 has been marked as a duplicate of this bug. ***
Is this still reproducible on the current build?
To verify, repeat the steps in the description.
verified with v4.12.0-548 [akrgupta@fedora auth]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-example 28s Running 10.131.0.167 virt-den-412-pqfpv-worker-0-bg7n4 True [akrgupta@fedora auth]$ oc get nodes NAME STATUS ROLES AGE VERSION virt-den-412-pqfpv-master-0 Ready control-plane,master 8d v1.24.0+8c7c967 virt-den-412-pqfpv-master-1 Ready control-plane,master 8d v1.24.0+8c7c967 virt-den-412-pqfpv-master-2 Ready control-plane,master 8d v1.24.0+8c7c967 virt-den-412-pqfpv-worker-0-4tgwh Ready,SchedulingDisabled worker 8d v1.24.0+8c7c967 virt-den-412-pqfpv-worker-0-bg7n4 Ready worker 8d v1.24.0+8c7c967 virt-den-412-pqfpv-worker-0-mjg69 Ready,SchedulingDisabled worker 8d v1.24.0+8c7c967 [akrgupta@fedora auth]$ oc get vmim No resources found in default namespace. [akrgupta@fedora auth]$ virtctl migrate vm-example VM vm-example was scheduled to migrate [akrgupta@fedora auth]$ oc get vmim NAME PHASE VMI kubevirt-migrate-vm-xnqff Scheduling vm-example [akrgupta@fedora auth]$ oc delete vm vm-example virtualmachine.kubevirt.io "vm-example" deleted [akrgupta@fedora auth]$ oc get vmim No resources found in default namespace. vmim object is deleted by deleting the vm
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:0408
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days