Created attachment 1916302 [details] virt-operator pod memory usage Description of problem: There is suspicious behavior of virt-operator pod - each time when I'm updating Kubevirt configuration through the HCO - the memory usage of virt-operator increased on 15-20Mb. Initially virt-operator pod used ~200Mb, but after updating HCO several times - the usage became ~400Mb and it never goes down (screenshot is attached) The maximum in my tests I saw 450Mb (with 570Mb in peaks) Version-Release number of selected component (if applicable): 4.12 How reproducible: 100% Steps to Reproduce: 1. update kubevirt through HCO (for example by updating liveMigrationConfig parameters or by adding some annotations) 2. After updating HCO there is a peak of memory usage (which is probably expected), but when it goes down the memory usage is +15-20Mb from initial value
Let's assume worst case scenario, that the virt-operator pod ends up getting killed due to OOM or memory pressure. In that case, the leader election will pick the alternate virt-operator and a new one will be respawned. In other words, the cluster will recover gracefully. Because of this I'm estimating the severity of this BZ to be medium. Please let me know if you have concerns with this rationale.
I think medium is fine, sooner or later it will be OOMkilled and respawned, you can see it on a new screenshot - I've run updating HCO in a loop and the virt-operator memory usage raised to 1.2G (it would go further if loop script would continue), but it was restarted only several hours later..
On an upstream KubeVirt setup, updating an annotation on the CR in a loop doesn't seem to repro the issue. HCO must be doing something else to KubeVirt on CR update.
Created attachment 1916613 [details] Heap pprof
Scratch the above, changing just an annotation doesn't do much at all. However, changing the CPU model has a strong impact on memory consumption. This temporarily tripled memory consumption (it went back down after but stayed much higher than previously): for i in `seq 10`; do kubectl patch kubevirt kubevirt -n kubevirt --type='json' -p='[{"op": "replace", "path": "/spec/configuration/cpuModel", "value":"Penryn"}]' sleep 10 kubectl patch kubevirt kubevirt -n kubevirt --type='json' -p='[{"op": "remove", "path": "/spec/configuration/cpuModel"}]' sleep 10 done Attached heap pprof after running that and waiting about 15 minutes.
Raising the severity to High and moving this to CNV 4.14 due to the capacity.
Verified on CNV-v4.14.0.rhel9-1576. I've run HCO updates in a loop for several hours and observed some memory spikes during while the script was runnung but when it completed - memory usage returned to initial state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817