Bug 2040377 - Unable to delete failed VMIM after VM deleted
Summary: Unable to delete failed VMIM after VM deleted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.12.0
Assignee: Prita Narayan
QA Contact: Akriti Gupta
URL:
Whiteboard:
: 2105031 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-13 15:36 UTC by Sarah Bennert
Modified: 2023-09-18 04:30 UTC (History)
5 users (show)

Fixed In Version: hco-bundle-registry-container- v4.12.0-479
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-24 13:36:05 UTC
Target Upstream Version:
Embargoed:
sbennert: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 7599 0 None Merged [Live-Migrations / Bugfix]: Allow aborting non-running migrations 2022-09-09 15:06:40 UTC
Red Hat Issue Tracker CNV-15835 0 None None None 2022-12-15 08:43:53 UTC

Description Sarah Bennert 2022-01-13 15:36:37 UTC
Description of problem:

Unable to delete failed VMIM after VM deleted


Version-Release number of selected component (if applicable):

Bundle
v4.10.0-551


How reproducible:

Intermittent.


Steps to Reproduce:
0. With descheduler installed:
1. Create VMs on nodes
2. Drain single node, causing evictions
3. Uncordon same node, causing descheduler evictions
4. Delete VMs



Actual results:

$ oc -n ssp-descheduler-test-descheduler delete vmim kubevirt-evacuation-p2wbc --timeout=60s
virtualmachineinstancemigration.kubevirt.io "kubevirt-evacuation-p2wbc" deleted
error: timed out waiting for the condition on virtualmachineinstancemigrations/kubevirt-evacuation-p2wbc

does not delete, hits timeout.

virt-controller error:

{"component":"virt-controller","kind":"","level":"error","msg":"Updating the VirtualMachine status failed.","name":"vm-1-1642014474-2693942","namespace":"ssp-descheduler-test-descheduler","pos":"vm.go:311","reason":"Operation cannot be fulfilled on virtualmachines.{"component":"virt-controller","kind":"","level":"error","msg":"Unable to migrate vmi because vmi is shutdown.","name":"kubevirt-evacuation-p2wbc","namespace":"ssp-descheduler-test-descheduler","pos":"migration.go:394","timestamp":"2022-01-12T19:24:28.323305Z","uid":"154d9a41-5338-44e7-8445-cc66fc18852d"}



Expected results:

VMIM deleted



Additional info:

Failed VMIM:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
  annotations:
    kubevirt.io/evacuationMigration: c01-sb410a-5zp8b-worker-0-c85cq
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1alpha3
  creationTimestamp: "2022-01-12T19:24:22Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2022-01-12T19:24:31Z"
  finalizers:
  - kubevirt.io/migrationJobFinalize
  generateName: kubevirt-evacuation-
  generation: 2
  labels:
    kubevirt.io/vmi-name: vm-3-1642014475-2945707
  name: kubevirt-evacuation-p2wbc
  namespace: ssp-descheduler-test-descheduler
  resourceVersion: "11431506"
  uid: 154d9a41-5338-44e7-8445-cc66fc18852d
spec:
  vmiName: vm-3-1642014475-2945707
status:
  phase: Failed

Comment 1 sgott 2022-01-19 13:24:24 UTC
The reason the VMIM cannot be deleted is that it still has a finalizer.

Can you confirm that the VMIM stays in a failed state with a finalizer for a while? It might be that the controller just hasn't had a chance to remove the finalizer yet.

Comment 2 Sarah Bennert 2022-01-19 14:06:11 UTC
Yes, the only way I was able to get past this was to remove the finalizer after a period of time.

Had attempted to run "oc delete" without timeout flag, left sitting for well over 5 minutes and was not deleted.

Comment 3 sgott 2022-01-28 22:36:11 UTC
Looking closely, is this basically the same as:

https://bugzilla.redhat.com/show_bug.cgi?id=1719190

Yes one is about a VMI that's not yet scheduled and the other is about a VMI that just ceased to exist, but in both cases there's a VMIM that's lacking a VMI.

Comment 4 Sarah Bennert 2022-02-07 15:42:42 UTC
The main difference here is there is no pending virt-launcher pod, all resources were deleted except for VMIM.

Comment 5 Kedar Bidarkar 2022-05-10 09:57:12 UTC
Due to capacity this bug is being moved to 4.12.

Comment 6 Roni Kishner 2022-07-13 05:32:42 UTC
*** Bug 2105031 has been marked as a duplicate of this bug. ***

Comment 7 Prita Narayan 2022-08-09 13:14:56 UTC
Is this still reproducible on the current build?

Comment 8 sgott 2022-09-09 15:06:41 UTC
To verify, repeat the steps in the description.

Comment 9 Akriti Gupta 2022-10-10 15:04:09 UTC
verified with v4.12.0-548
[akrgupta@fedora auth]$ oc get vmi
NAME         AGE   PHASE     IP             NODENAME                            READY
vm-example   28s   Running   10.131.0.167   virt-den-412-pqfpv-worker-0-bg7n4   True
[akrgupta@fedora auth]$ oc get nodes
NAME                                STATUS                     ROLES                  AGE   VERSION
virt-den-412-pqfpv-master-0         Ready                      control-plane,master   8d    v1.24.0+8c7c967
virt-den-412-pqfpv-master-1         Ready                      control-plane,master   8d    v1.24.0+8c7c967
virt-den-412-pqfpv-master-2         Ready                      control-plane,master   8d    v1.24.0+8c7c967
virt-den-412-pqfpv-worker-0-4tgwh   Ready,SchedulingDisabled   worker                 8d    v1.24.0+8c7c967
virt-den-412-pqfpv-worker-0-bg7n4   Ready                      worker                 8d    v1.24.0+8c7c967
virt-den-412-pqfpv-worker-0-mjg69   Ready,SchedulingDisabled   worker                 8d    v1.24.0+8c7c967
[akrgupta@fedora auth]$ oc get vmim
No resources found in default namespace.
[akrgupta@fedora auth]$ virtctl migrate vm-example
VM vm-example was scheduled to migrate
[akrgupta@fedora auth]$ oc get vmim
NAME                        PHASE        VMI
kubevirt-migrate-vm-xnqff   Scheduling   vm-example
[akrgupta@fedora auth]$ oc delete vm vm-example
virtualmachine.kubevirt.io "vm-example" deleted
[akrgupta@fedora auth]$ oc get vmim
No resources found in default namespace.

vmim object is deleted by deleting the vm

Comment 13 errata-xmlrpc 2023-01-24 13:36:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408

Comment 14 Red Hat Bugzilla 2023-09-18 04:30:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.