Bug 1935219

Summary: [CNV-2.5] Set memory and CPU request on hco-operator and hco-webhook deployments
Product: Container Native Virtualization (CNV) Reporter: sgott
Component: InstallationAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Debarati Basu-Nag <dbasunag>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.5.3CC: cnv-qe-bugs, dbasunag, ipinto, jgil, kbidarka, stirabos
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.9.0-32 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1931519 Environment:
Last Closed: 2021-11-02 15:57:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1931519    
Bug Blocks: 1935217, 1935218    

Description sgott 2021-03-04 14:45:53 UTC
+++ This bug was initially created as a clone of Bug #1931519 +++

This is a clone to track items specifically related to install component

------------

Description of problem:

Most of the deployments and daemonsets stored in the openshift-cnv namespace don't specify the resource request in their manifests. Only daemonset/kube-cni-linux-bridge-plugin, deployment/kubemacpool-mac-controller-manager and daemonset/kube-cni-linux-bridge-plugin have it defined as follows:


Kind       | Name                               | CPU Req/Limits | Mem Req/Limits
---------- | ---------------------------------- | -------------- | ---------------
daemonset  | kube-cni-linux-bridge-plugin       | 60m/0m         | 30Mi/0Mi
deployment | kubemacpool-mac-controller-manager | 100m/300m      | 300Mi/600Mi


The following list of manifests don't define the resource requirements:

Kind       | Name
---------- | ---- 
daemonset  | bridge-marker
daemonset  | nmstate-handler
daemonset  | ovs-cni-amd64
daemonset  | bridge-marker
daemonset  | nmstate-handler
daemonset  | ovs-cni-amd64
daemonset  | kubevirt-node-labeller
daemonset  | ovs-cni-amd64
daemonset  | nmstate-handler
deployment | cdi-uploadproxy
deployment | cdi-apiserver
deployment | nmstate-webhook
deployment | hostpath-provisioner-operator
deployment | virt-api
deployment | virt-controller
deployment | virt-handler
deployment | virt-operator
deployment | virt-template-validator
deployment | vm-import-controller
deployment | vm-import-operator
deployment | cdi-deployment
deployment | cluster-network-addons-operator
deployment | cdi-operator
deployment | cluster-network-addons-operator
deployment | kubevirt-ssp-operato
deployment | hco-operator


Version-Release number of selected component (if applicable):
CNV 2.5.3 and onward.

How reproducible:



Steps to Reproduce:
1.Create CNV namespace
2.Create CNV Operator Group
3.Create HCO subscription and deploy stable
4.Wait for deployment of HCO operator to complete
5.Check for resource requests in deployed manifests.

Actual results:
Only 2 deployed manifests define their resource requirements, and only 1 define the resource limits (see list above). 

Expected results:
All deployed manifests define the resource requirements.

Additional info:
N/A

Comment 1 Simone Tiraboschi 2021-03-10 14:47:10 UTC
This is a subset of a larger effort ( https://bugzilla.redhat.com/1931519 ); in this specific bug we are focusing only at setting memory and CPU limits on hco-operator and hco-webhook deployments.

Comment 2 Nico Schieder 2021-03-22 10:42:48 UTC
While working on this, we found out that HCO is watching ConfigMaps (and Services) across the whole cluster, leading to unpredictable memory consumption depending on the size of the cluster.
To rectify this we are looking into filtering our caches for those objects.

Will update this issue as soon as we agreed on how to tackle it.

Comment 3 Simone Tiraboschi 2021-04-19 08:29:43 UTC
We are now waiting this change on controller-runtime:
https://github.com/kubernetes-sigs/controller-runtime/pull/1435

to have a predictable memory consumption.
Only at that time we will be able to really implement a memory limit.
This is probably not going to happen in 4.8 timeframe.

Comment 4 Simone Tiraboschi 2021-04-30 12:58:48 UTC
https://github.com/kubernetes-sigs/controller-runtime/pull/1435 got merged, we can start consuming it as soon as we will get a new release of controller-runtime

Comment 5 Simone Tiraboschi 2021-06-16 08:52:42 UTC
According to https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits guidelines that states:
"
Therefore, cluster components SHOULD NOT be configured with resource limits.
However, cluster components MUST declare resource requests for both CPU and memory.
"

we are going to set resource requests for both CPU and memory but not resource limits.

Comment 6 Debarati Basu-Nag 2021-08-09 15:10:16 UTC
Validated against at 4.9 cluster:

For hco-operator:
=================
 resources:
          requests:
            cpu: 10m
            memory: 96Mi
=================

For hco-webhook:
=================
        resources:
          requests:
            cpu: 5m
            memory: 48Mi
=================

Based on this above results, marking this ticket as verified.

Comment 9 errata-xmlrpc 2021-11-02 15:57:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104