2128808 – [4.9.z] Virt-launcher Pods are slow to terminate

Bug 2128808 - [4.9.z] Virt-launcher Pods are slow to terminate

Summary: [4.9.z] Virt-launcher Pods are slow to terminate

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	4.9.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.14.0
Assignee:	Itamar Holder
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-09-21 14:37 UTC by lpivarc
Modified:	2023-07-13 11:00 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-07-13 11:00:06 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-21398	0	None	None	None	2022-10-27 08:49:04 UTC

Description lpivarc 2022-09-21 14:37:07 UTC

Description of problem:
Virt-launcher pods are slow to terminate in some cases which are not yet well known. This is what is observed:
1. Launcher is notified to gracefully shut down. (Note it seems we are not trying to forcefully shut down the domain after graceful shut down)
2. "gracefully closed notify pipe connection for vmi" is observed after the domain shut down
3. Lot of loops follows with "detected unresponsive virt-launcher command socket"
<- This is the main issue why we don't clean up
4. Final clean-up is performed and Pod is terminated shortly

The most notable change in this area was safepath handling which might be a cause of different paths of clean up.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 awax 2022-09-21 23:19:43 UTC

We saw this bug in several different network tests, which all include a linux_bridge, NAD and VM.
Steps to reproduce (for example, from 'test_veth_removed_from_host_after_vm_deleted'):
1. Create a NAD with the type "linux bridge":
oc create -f br1test_nad.yaml

2. Create a linux bridge policy (NNCP) on worker 1:
oc create -f br1test_nncp.yaml

3. Create a VM (fedora) connected to the NAD:
oc create -f vma.yaml

4. wait for the VM to be Running:
oc get VM -w

5. Delete the VM:
oc delete vm vma

The virt-launcher pod is stuck in Terminating status for about 8 minutes.

In a similar scenario, with bond NNCP, the pods don't behave the same way and are terminated fast.

Comment 8 Antonio Cardace 2022-10-28 12:51:00 UTC

Deferring to 4.13 due to capacity and lack of clarity about the root cause.

Comment 10 Antonio Cardace 2023-03-03 16:47:35 UTC

Deferring to 4.14 due to capacity.

Note You need to log in before you can comment on or make changes to this bug.