Bug 2132473 - Memory usage of virt-operator PODs increased after updating HCO
Summary: Memory usage of virt-operator PODs increased after updating HCO
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.12.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.14.0
Assignee: lpivarc
QA Contact: Akriti Gupta
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-05 20:48 UTC by Denys Shchedrivyi
Modified: 2023-11-08 14:05 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 14:05:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
virt-operator pod memory usage (120.68 KB, image/png)
2022-10-05 20:48 UTC, Denys Shchedrivyi
no flags Details
Heap pprof (593.05 KB, image/png)
2022-10-06 20:01 UTC, Jed Lejosne
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 9286 0 None Merged Fix operator oom 2023-08-07 09:46:52 UTC
Red Hat Issue Tracker CNV-21663 0 None None None 2023-01-23 13:28:51 UTC
Red Hat Product Errata RHSA-2023:6817 0 None None None 2023-11-08 14:05:28 UTC

Description Denys Shchedrivyi 2022-10-05 20:48:20 UTC
Created attachment 1916302 [details]
virt-operator pod memory usage

Description of problem:
 There is suspicious behavior of virt-operator pod - each time when I'm updating Kubevirt configuration through the HCO - the memory usage of virt-operator  increased on 15-20Mb.

 Initially virt-operator pod used ~200Mb, but after updating HCO several times - the usage became ~400Mb and it never goes down (screenshot is attached)

 The maximum in my tests I saw 450Mb (with 570Mb in peaks)


Version-Release number of selected component (if applicable):
4.12

How reproducible:
100%

Steps to Reproduce:
1. update kubevirt through HCO (for example by updating liveMigrationConfig parameters or by adding some annotations)
2. After updating HCO there is a peak of memory usage (which is probably expected), but when it goes down the memory usage is +15-20Mb from initial value

Comment 1 sgott 2022-10-06 12:06:23 UTC
Let's assume worst case scenario, that the virt-operator pod ends up getting killed due to OOM or memory pressure. In that case, the leader election will pick the alternate virt-operator and a new one will be respawned. In other words, the cluster will recover gracefully.

Because of this I'm estimating the severity of this BZ to be medium. Please let me know if you have concerns with this rationale.

Comment 2 Denys Shchedrivyi 2022-10-06 13:04:24 UTC
 I think medium is fine, sooner or later it will be OOMkilled and respawned, you can see it on a new screenshot - I've run updating HCO in a loop and the virt-operator memory usage raised to 1.2G (it would go further if loop script would continue), but it was restarted only several hours later..

Comment 4 Jed Lejosne 2022-10-06 18:07:49 UTC
On an upstream KubeVirt setup, updating an annotation on the CR in a loop doesn't seem to repro the issue. HCO must be doing something else to KubeVirt on CR update.

Comment 5 Jed Lejosne 2022-10-06 20:01:14 UTC
Created attachment 1916613 [details]
Heap pprof

Comment 6 Jed Lejosne 2022-10-06 20:01:36 UTC
Scratch the above, changing just an annotation doesn't do much at all. However, changing the CPU model has a strong impact on memory consumption.
This temporarily tripled memory consumption (it went back down after but stayed much higher than previously):

for i in `seq 10`; do
  kubectl patch kubevirt kubevirt -n kubevirt --type='json' -p='[{"op": "replace", "path": "/spec/configuration/cpuModel", "value":"Penryn"}]'
  sleep 10
  kubectl patch kubevirt kubevirt -n kubevirt --type='json' -p='[{"op": "remove", "path": "/spec/configuration/cpuModel"}]'
  sleep 10
done

Attached heap pprof after running that and waiting about 15 minutes.

Comment 11 Kedar Bidarkar 2023-03-01 13:58:49 UTC
Raising the severity to High and moving this to CNV 4.14 due to the capacity.

Comment 12 Denys Shchedrivyi 2023-08-17 16:34:57 UTC
Verified on CNV-v4.14.0.rhel9-1576. I've run HCO updates in a loop for several hours and observed some memory spikes during while the script was runnung but when it completed - memory usage returned to initial state.

Comment 15 errata-xmlrpc 2023-11-08 14:05:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817


Note You need to log in before you can comment on or make changes to this bug.