Bug 1900631 - [CNV 2.4.3] oc vm delete doesn't complete sometimes
Summary: [CNV 2.4.3] oc vm delete doesn't complete sometimes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.4.3
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
: 4.8.1
Assignee: aschuett
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-23 12:57 UTC by Benjamin Schmaus
Modified: 2021-10-01 15:59 UTC (History)
7 users (show)

Fixed In Version: hco-bundle-registry-container-v4.8.1-14 virt-operator-container-v4.8.1-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-24 12:48:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 5691 0 None open allow multiple calls to graceful shutdown in case acpi did not recieve call 2021-05-27 07:13:52 UTC
Github kubevirt kubevirt pull 5723 0 None open add force stop for VM to virtctl 2021-05-27 07:13:52 UTC
Red Hat Product Errata RHSA-2021:3259 0 None None None 2021-08-24 12:49:36 UTC

Description Benjamin Schmaus 2020-11-23 12:57:14 UTC
Description of problem: 
oc vm delete <vmname> may not complete
-pod will be in terminating state
-vmi stays in running status until guest vm initiates shutdown
-sometimes vm deletes but not vmi
-could be related to finalizer

Might be related to BZ1883875 where guest agent is not running but not always the case


Version-Release number of selected component (if applicable):
CNV 2.4.3
OCP 4.5.17

How reproducible:
Sporadic - need more details on how to reproduce

Steps to Reproduce:
1. Create many VMs through automation using same source vm pv
2. Delete vms
3.

Actual results:
Sporadic deletes of vms

Expected results:
All vms should be deleted

Additional info:

Comment 1 sgott 2020-11-23 19:09:19 UTC
Is it possible to get some more information? in particular must-gather ouput would give us some more context here.

Comment 6 sgott 2020-12-02 13:41:25 UTC
Without any other context, this sounds like it might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1896387

That BZ is likely due to an issue in CRI-O which is being addressed in CNV 4.7. Thus I'm deferring this pending a fix in OCP.

Comment 7 sgott 2020-12-16 14:57:44 UTC
To complete the loop here, https://bugzilla.redhat.com/show_bug.cgi?id=1896387#c8 mentioned this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1883991 which is what I was referring to in Comment #6

Comment 8 Benjamin Schmaus 2021-04-21 17:52:53 UTC
I have been able to reproduce this in 4.7.3 with a Windows guest VM.  Start Windows 2019 VM and then from OCP console try to stop the VM - seems to hang.  Now if I repeat the same steps but go into Windows 2019 VM and shutdown - and then before the Windows VM shuts down stop it in OCP console it will stop properly.

Given my statement above do we believe this is still related to CRI-O as indicated in comment 6?

Comment 15 sgott 2021-04-26 17:51:10 UTC
Ben,

There exists a BZ where the reporter created a VM and then deleted it immediately--on a windows VM.

https://bugzilla.redhat.com/show_bug.cgi?id=1933043

In some cases, this causes graceful shutdown to fail--at which point the VMI will wait for terminationGracePeriodSeconds to be deleted. This is especially noticeable on Windows because the grace period is quite long (to ensure we don't break Windows updates).

Does this appear similar to what you're experiencing?

What were the TerminationGracePeriodSeconds for those that are able to terminate immediately vs those that hang?

Comment 16 Benjamin Schmaus 2021-05-06 15:06:14 UTC
It seems that when doing ephemeral VMs they used 60 seconds otherwise 3600 seconds for VMs that might get created but stay up awhile.

Comment 22 zhe peng 2021-08-18 09:32:42 UTC
verify with build 
HCO:[v4.8.1-18]

step:
1. create 50 vms with same dv source
2. start all vms, waiting vm all in running status
3. destroy all vms, check vm and vmi status

all vm and vmi deleted.

Comment 29 errata-xmlrpc 2021-08-24 12:48:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.1 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3259


Note You need to log in before you can comment on or make changes to this bug.