Bug 2004295

Summary: [4.8] virt-handler goroutine count increases over time
Product: Container Native Virtualization (CNV) Reporter: Sarah Bennert <sbennert>
Component: SSPAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED DUPLICATE QA Contact: Geetika Kapoor <gkapoor>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.8.2CC: cnv-qe-bugs, dholler
Target Milestone: ---   
Target Release: future   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2004299 2004300 (view as bug list) Environment:
Last Closed: 2022-06-08 11:54:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2004299, 2004300    
Attachments:
Description Flags
test VM yaml none

Description Sarah Bennert 2021-09-14 23:39:39 UTC
Tracking for 4.8.z

Description of problem:
  Opened as a follow-up to https://bugzilla.redhat.com/show_bug.cgi?id=1996660
  Observed an additional slower goroutine leak.

How reproduceible:
  Very reproduceible, must be run for an extended time to observe increase.

Steps to Reproduce:
  Allow for any stabilization between each step.
  While observing goroutines metric on virt-handler,
    1. Create and Start a batch of 100-300 VMs
    2. Delete all VMs in batch
    3. Repeat

Expected Results:
  Goroutines do not increase between VM deployments after initial deployment

Actual Results:
  Over the course of approximately 48 hours, goroutine count increased between 14-28 per virt-handler.

Comment 1 Andrej Krejcir 2021-10-04 14:36:24 UTC
Because the goroutine leak is in virt-handler, it may be possible to reproduce it in a smaller cluster.

Sarah, do you remember approximately how many VMs were running on each node?
Can you share the VM definition that you used?

Comment 2 Sarah Bennert 2021-10-04 15:15:13 UTC
Created attachment 1829079 [details]
test VM yaml

Hi Andrej,

I deployed 100-300 VMs across three worker nodes using the attached VM definition, so should have been somewhere around 30-100 VMs per node.

Process should be repeated continuously over the test period, but allowing time for the control plane to stabilize between state transitions (start/delete/start/delete/...). On the three worker node cluster, allowed approximately 10-15 minutes for stabilization between states.

Comment 3 Geetika Kapoor 2022-06-08 11:54:10 UTC

*** This bug has been marked as a duplicate of bug 2004299 ***