Bug 2128808 - [4.9.z] Virt-launcher Pods are slow to terminate
Summary: [4.9.z] Virt-launcher Pods are slow to terminate
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.9.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.14.0
Assignee: Itamar Holder
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-21 14:37 UTC by lpivarc
Modified: 2023-07-13 11:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-13 11:00:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-21398 0 None None None 2022-10-27 08:49:04 UTC

Description lpivarc 2022-09-21 14:37:07 UTC
Description of problem:
Virt-launcher pods are slow to terminate in some cases which are not yet well known. This is what is observed:
1. Launcher is notified to gracefully shut down. (Note it seems we are not trying to forcefully shut down the domain after graceful shut down)
2. "gracefully closed notify pipe connection for vmi" is observed after the domain shut down
3. Lot of loops follows with "detected unresponsive virt-launcher command socket"
<- This is the main issue why we don't clean up
4. Final clean-up is performed and Pod is terminated shortly

The most notable change in this area was safepath handling which might be a cause of different paths of clean up.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 awax 2022-09-21 23:19:43 UTC
We saw this bug in several different network tests, which all include a linux_bridge, NAD and VM.
Steps to reproduce (for example, from 'test_veth_removed_from_host_after_vm_deleted'):
1. Create a NAD with the type "linux bridge":
oc create -f br1test_nad.yaml

2. Create a linux bridge policy (NNCP) on worker 1:
oc create -f br1test_nncp.yaml

3. Create a VM (fedora) connected to the NAD:
oc create -f vma.yaml

4. wait for the VM to be Running:
oc get VM -w

5. Delete the VM:
oc delete vm vma

The virt-launcher pod is stuck in Terminating status for about 8 minutes.

In a similar scenario, with bond NNCP, the pods don't behave the same way and are terminated fast.

Comment 8 Antonio Cardace 2022-10-28 12:51:00 UTC
Deferring to 4.13 due to capacity and lack of clarity about the root cause.

Comment 10 Antonio Cardace 2023-03-03 16:47:35 UTC
Deferring to 4.14 due to capacity.


Note You need to log in before you can comment on or make changes to this bug.