2132473 – Memory usage of virt-operator PODs increased after updating HCO

Bug 2132473 - Memory usage of virt-operator PODs increased after updating HCO

Summary: Memory usage of virt-operator PODs increased after updating HCO

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	4.12.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.14.0
Assignee:	lpivarc
QA Contact:	Akriti Gupta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-10-05 20:48 UTC by Denys Shchedrivyi
Modified:	2023-11-08 14:05 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 14:05:03 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
virt-operator pod memory usage (120.68 KB, image/png) 2022-10-05 20:48 UTC, Denys Shchedrivyi	no flags	Details
Heap pprof (593.05 KB, image/png) 2022-10-06 20:01 UTC, Jed Lejosne	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 9286	None	Merged	Fix operator oom	2023-08-07 09:46:52 UTC
Red Hat Issue Tracker	CNV-21663	None	None	None	2023-01-23 13:28:51 UTC
Red Hat Product Errata	RHSA-2023:6817	None	None	None	2023-11-08 14:05:28 UTC

Description Denys Shchedrivyi 2022-10-05 20:48:20 UTC

Created attachment 1916302 [details]
virt-operator pod memory usage

Description of problem:
 There is suspicious behavior of virt-operator pod - each time when I'm updating Kubevirt configuration through the HCO - the memory usage of virt-operator  increased on 15-20Mb.

 Initially virt-operator pod used ~200Mb, but after updating HCO several times - the usage became ~400Mb and it never goes down (screenshot is attached)

 The maximum in my tests I saw 450Mb (with 570Mb in peaks)


Version-Release number of selected component (if applicable):
4.12

How reproducible:
100%

Steps to Reproduce:
1. update kubevirt through HCO (for example by updating liveMigrationConfig parameters or by adding some annotations)
2. After updating HCO there is a peak of memory usage (which is probably expected), but when it goes down the memory usage is +15-20Mb from initial value

Comment 1 sgott 2022-10-06 12:06:23 UTC

Let's assume worst case scenario, that the virt-operator pod ends up getting killed due to OOM or memory pressure. In that case, the leader election will pick the alternate virt-operator and a new one will be respawned. In other words, the cluster will recover gracefully.

Because of this I'm estimating the severity of this BZ to be medium. Please let me know if you have concerns with this rationale.

Comment 2 Denys Shchedrivyi 2022-10-06 13:04:24 UTC

 I think medium is fine, sooner or later it will be OOMkilled and respawned, you can see it on a new screenshot - I've run updating HCO in a loop and the virt-operator memory usage raised to 1.2G (it would go further if loop script would continue), but it was restarted only several hours later..

Comment 4 Jed Lejosne 2022-10-06 18:07:49 UTC

On an upstream KubeVirt setup, updating an annotation on the CR in a loop doesn't seem to repro the issue. HCO must be doing something else to KubeVirt on CR update.

Comment 5 Jed Lejosne 2022-10-06 20:01:14 UTC

Created attachment 1916613 [details]
Heap pprof

Comment 6 Jed Lejosne 2022-10-06 20:01:36 UTC

Scratch the above, changing just an annotation doesn't do much at all. However, changing the CPU model has a strong impact on memory consumption.
This temporarily tripled memory consumption (it went back down after but stayed much higher than previously):

for i in `seq 10`; do
  kubectl patch kubevirt kubevirt -n kubevirt --type='json' -p='[{"op": "replace", "path": "/spec/configuration/cpuModel", "value":"Penryn"}]'
  sleep 10
  kubectl patch kubevirt kubevirt -n kubevirt --type='json' -p='[{"op": "remove", "path": "/spec/configuration/cpuModel"}]'
  sleep 10
done

Attached heap pprof after running that and waiting about 15 minutes.

Comment 11 Kedar Bidarkar 2023-03-01 13:58:49 UTC

Raising the severity to High and moving this to CNV 4.14 due to the capacity.

Comment 12 Denys Shchedrivyi 2023-08-17 16:34:57 UTC

Verified on CNV-v4.14.0.rhel9-1576. I've run HCO updates in a loop for several hours and observed some memory spikes during while the script was runnung but when it completed - memory usage returned to initial state.

Comment 15 errata-xmlrpc 2023-11-08 14:05:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Note You need to log in before you can comment on or make changes to this bug.