Bug 1996660

Summary: [4.8] Goroutine count and memory remains high after VMIs are removed
Product: Container Native Virtualization (CNV) Reporter: Kevin Wiesmueller <kwiesmul>
Component: SSPAssignee: Kevin Wiesmueller <kwiesmul>
Status: CLOSED ERRATA QA Contact: Sarah Bennert <sbennert>
Severity: high Docs Contact:
Priority: high    
Version: 4.8.0CC: cnv-qe-bugs, iholder, rnetser, sbennert
Target Milestone: ---   
Target Release: 4.8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-21 11:08:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kevin Wiesmueller 2021-08-23 12:27:35 UTC
This bug was initially created as a copy of Bug #1996658

I am copying this bug to track the fix for 4.8 as well.


Description of problem:
Originally tracked in https://github.com/kubevirt/kubevirt/issues/6056
When creating VMIs and deleting them, virt-handler does not clean up goroutines properly.

How reproducible:
Create n VMIs, wait until they are all running, delete n VMIs, wait until they are all deleted, sleep 5s, and repeat. Scenarios for 10, 20, 40, 60, 80, 100, 200, 300 VMIs shows an increase in virt-handler mem/cpu usage and go routines, however only cpu usage returns back to the expected level when there are 0 VMIs.

Steps to Reproduce:
1. Create 100 VMIs (not needed to start them), can use Kubevirt Density Test
2. Delete 100 VMIs
3. Observe goroutines metric on virt-handler

Actual results:
Goroutines metric does not go down all the way after deletion.

Expected results:
Goroutines metric should go down all the way after deletion. 

Fixed in https://github.com/kubevirt/kubevirt/pull/6176
Backport for 4.8 is pending until this cherry-pick merges: https://github.com/kubevirt/kubevirt/pull/6227

Comment 3 Sarah Bennert 2021-09-14 23:55:49 UTC
Initial testing[0] was performed using density test, which does start the VMs.
Verified significant reduction in goroutine leakage.


While testing, observed a smaller leak and have opened follow-up bzs.[1,2,3]

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1996660#c0
[1] 2.6 https://bugzilla.redhat.com/show_bug.cgi?id=2004300
[2] 4.8 https://bugzilla.redhat.com/show_bug.cgi?id=2004295
[3] 4.9 https://bugzilla.redhat.com/show_bug.cgi?id=2004299

Comment 8 errata-xmlrpc 2021-09-21 11:08:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.2 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3598