1900631 – [CNV 2.4.3] oc vm delete doesn't complete sometimes

Bug 1900631 - [CNV 2.4.3] oc vm delete doesn't complete sometimes

Summary: [CNV 2.4.3] oc vm delete doesn't complete sometimes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.4.3
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.8.1
Assignee:	aschuett
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-23 12:57 UTC by Benjamin Schmaus
Modified:	2021-10-01 15:59 UTC (History)
CC List:	7 users (show)
Fixed In Version:	hco-bundle-registry-container-v4.8.1-14 virt-operator-container-v4.8.1-2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-24 12:48:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 5691	None	open	allow multiple calls to graceful shutdown in case acpi did not recieve call	2021-05-27 07:13:52 UTC
Github	kubevirt kubevirt pull 5723	None	open	add force stop for VM to virtctl	2021-05-27 07:13:52 UTC
Red Hat Product Errata	RHSA-2021:3259	None	None	None	2021-08-24 12:49:36 UTC

Description Benjamin Schmaus 2020-11-23 12:57:14 UTC

Description of problem: 
oc vm delete <vmname> may not complete
-pod will be in terminating state
-vmi stays in running status until guest vm initiates shutdown
-sometimes vm deletes but not vmi
-could be related to finalizer

Might be related to BZ1883875 where guest agent is not running but not always the case


Version-Release number of selected component (if applicable):
CNV 2.4.3
OCP 4.5.17

How reproducible:
Sporadic - need more details on how to reproduce

Steps to Reproduce:
1. Create many VMs through automation using same source vm pv
2. Delete vms
3.

Actual results:
Sporadic deletes of vms

Expected results:
All vms should be deleted

Additional info:

Comment 1 sgott 2020-11-23 19:09:19 UTC

Is it possible to get some more information? in particular must-gather ouput would give us some more context here.

Comment 6 sgott 2020-12-02 13:41:25 UTC

Without any other context, this sounds like it might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1896387

That BZ is likely due to an issue in CRI-O which is being addressed in CNV 4.7. Thus I'm deferring this pending a fix in OCP.

Comment 7 sgott 2020-12-16 14:57:44 UTC

To complete the loop here, https://bugzilla.redhat.com/show_bug.cgi?id=1896387#c8 mentioned this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1883991 which is what I was referring to in Comment #6

Comment 8 Benjamin Schmaus 2021-04-21 17:52:53 UTC

I have been able to reproduce this in 4.7.3 with a Windows guest VM.  Start Windows 2019 VM and then from OCP console try to stop the VM - seems to hang.  Now if I repeat the same steps but go into Windows 2019 VM and shutdown - and then before the Windows VM shuts down stop it in OCP console it will stop properly.

Given my statement above do we believe this is still related to CRI-O as indicated in comment 6?

Comment 15 sgott 2021-04-26 17:51:10 UTC

Ben,

There exists a BZ where the reporter created a VM and then deleted it immediately--on a windows VM.

https://bugzilla.redhat.com/show_bug.cgi?id=1933043

In some cases, this causes graceful shutdown to fail--at which point the VMI will wait for terminationGracePeriodSeconds to be deleted. This is especially noticeable on Windows because the grace period is quite long (to ensure we don't break Windows updates).

Does this appear similar to what you're experiencing?

What were the TerminationGracePeriodSeconds for those that are able to terminate immediately vs those that hang?

Comment 16 Benjamin Schmaus 2021-05-06 15:06:14 UTC

It seems that when doing ephemeral VMs they used 60 seconds otherwise 3600 seconds for VMs that might get created but stay up awhile.

Comment 22 zhe peng 2021-08-18 09:32:42 UTC

verify with build 
HCO:[v4.8.1-18]

step:
1. create 50 vms with same dv source
2. start all vms, waiting vm all in running status
3. destroy all vms, check vm and vmi status

all vm and vmi deleted.

Comment 29 errata-xmlrpc 2021-08-24 12:48:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.1 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3259

Note You need to log in before you can comment on or make changes to this bug.