Bug 1900631

Summary:	[CNV 2.4.3] oc vm delete doesn't complete sometimes
Product:	Container Native Virtualization (CNV)	Reporter:	Benjamin Schmaus <bschmaus>
Component:	Virtualization	Assignee:	aschuett <aschuett>
Status:	CLOSED ERRATA	QA Contact:	Israel Pinto <ipinto>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	2.4.3	CC:	bschmaus, cnv-qe-bugs, fdeutsch, kbidarka, sgott, spuranam, zpeng
Target Milestone:	---
Target Release:	4.8.1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	hco-bundle-registry-container-v4.8.1-14 virt-operator-container-v4.8.1-2	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-08-24 12:48:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Benjamin Schmaus 2020-11-23 12:57:14 UTC

Description of problem: 
oc vm delete <vmname> may not complete
-pod will be in terminating state
-vmi stays in running status until guest vm initiates shutdown
-sometimes vm deletes but not vmi
-could be related to finalizer

Might be related to BZ1883875 where guest agent is not running but not always the case


Version-Release number of selected component (if applicable):
CNV 2.4.3
OCP 4.5.17

How reproducible:
Sporadic - need more details on how to reproduce

Steps to Reproduce:
1. Create many VMs through automation using same source vm pv
2. Delete vms
3.

Actual results:
Sporadic deletes of vms

Expected results:
All vms should be deleted

Additional info:

Comment 1 sgott 2020-11-23 19:09:19 UTC

Is it possible to get some more information? in particular must-gather ouput would give us some more context here.

Comment 6 sgott 2020-12-02 13:41:25 UTC

Without any other context, this sounds like it might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1896387

That BZ is likely due to an issue in CRI-O which is being addressed in CNV 4.7. Thus I'm deferring this pending a fix in OCP.

Comment 7 sgott 2020-12-16 14:57:44 UTC

To complete the loop here, https://bugzilla.redhat.com/show_bug.cgi?id=1896387#c8 mentioned this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1883991 which is what I was referring to in Comment #6

Comment 8 Benjamin Schmaus 2021-04-21 17:52:53 UTC

I have been able to reproduce this in 4.7.3 with a Windows guest VM.  Start Windows 2019 VM and then from OCP console try to stop the VM - seems to hang.  Now if I repeat the same steps but go into Windows 2019 VM and shutdown - and then before the Windows VM shuts down stop it in OCP console it will stop properly.

Given my statement above do we believe this is still related to CRI-O as indicated in comment 6?

Comment 15 sgott 2021-04-26 17:51:10 UTC

Ben,

There exists a BZ where the reporter created a VM and then deleted it immediately--on a windows VM.

https://bugzilla.redhat.com/show_bug.cgi?id=1933043

In some cases, this causes graceful shutdown to fail--at which point the VMI will wait for terminationGracePeriodSeconds to be deleted. This is especially noticeable on Windows because the grace period is quite long (to ensure we don't break Windows updates).

Does this appear similar to what you're experiencing?

What were the TerminationGracePeriodSeconds for those that are able to terminate immediately vs those that hang?

Comment 16 Benjamin Schmaus 2021-05-06 15:06:14 UTC

It seems that when doing ephemeral VMs they used 60 seconds otherwise 3600 seconds for VMs that might get created but stay up awhile.

Comment 22 zhe peng 2021-08-18 09:32:42 UTC

verify with build 
HCO:[v4.8.1-18]

step:
1. create 50 vms with same dv source
2. start all vms, waiting vm all in running status
3. destroy all vms, check vm and vmi status

all vm and vmi deleted.

Comment 29 errata-xmlrpc 2021-08-24 12:48:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.1 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3259