Bug 1935219 - [CNV-2.5] Set memory and CPU request on hco-operator and hco-webhook deployments
Summary: [CNV-2.5] Set memory and CPU request on hco-operator and hco-webhook deployments
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 2.5.3
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: 4.9.0
Assignee: Simone Tiraboschi
QA Contact: Debarati Basu-Nag
URL:
Whiteboard:
Depends On: 1931519
Blocks: 1935217 1935218
TreeView+ depends on / blocked
 
Reported: 2021-03-04 14:45 UTC by sgott
Modified: 2021-11-02 15:57 UTC (History)
6 users (show)

Fixed In Version: hco-bundle-registry-container-v4.9.0-32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1931519
Environment:
Last Closed: 2021-11-02 15:57:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 1335 0 None closed Reduce the memory footprint using cache selectors 2021-06-16 07:24:08 UTC
Github kubevirt hyperconverged-cluster-operator pull 1405 0 None closed Tune resources requests 2021-07-06 15:41:18 UTC
Red Hat Product Errata RHSA-2021:4104 0 None None None 2021-11-02 15:57:41 UTC

Description sgott 2021-03-04 14:45:53 UTC
+++ This bug was initially created as a clone of Bug #1931519 +++

This is a clone to track items specifically related to install component

------------

Description of problem:

Most of the deployments and daemonsets stored in the openshift-cnv namespace don't specify the resource request in their manifests. Only daemonset/kube-cni-linux-bridge-plugin, deployment/kubemacpool-mac-controller-manager and daemonset/kube-cni-linux-bridge-plugin have it defined as follows:


Kind       | Name                               | CPU Req/Limits | Mem Req/Limits
---------- | ---------------------------------- | -------------- | ---------------
daemonset  | kube-cni-linux-bridge-plugin       | 60m/0m         | 30Mi/0Mi
deployment | kubemacpool-mac-controller-manager | 100m/300m      | 300Mi/600Mi


The following list of manifests don't define the resource requirements:

Kind       | Name
---------- | ---- 
daemonset  | bridge-marker
daemonset  | nmstate-handler
daemonset  | ovs-cni-amd64
daemonset  | bridge-marker
daemonset  | nmstate-handler
daemonset  | ovs-cni-amd64
daemonset  | kubevirt-node-labeller
daemonset  | ovs-cni-amd64
daemonset  | nmstate-handler
deployment | cdi-uploadproxy
deployment | cdi-apiserver
deployment | nmstate-webhook
deployment | hostpath-provisioner-operator
deployment | virt-api
deployment | virt-controller
deployment | virt-handler
deployment | virt-operator
deployment | virt-template-validator
deployment | vm-import-controller
deployment | vm-import-operator
deployment | cdi-deployment
deployment | cluster-network-addons-operator
deployment | cdi-operator
deployment | cluster-network-addons-operator
deployment | kubevirt-ssp-operato
deployment | hco-operator


Version-Release number of selected component (if applicable):
CNV 2.5.3 and onward.

How reproducible:



Steps to Reproduce:
1.Create CNV namespace
2.Create CNV Operator Group
3.Create HCO subscription and deploy stable
4.Wait for deployment of HCO operator to complete
5.Check for resource requests in deployed manifests.

Actual results:
Only 2 deployed manifests define their resource requirements, and only 1 define the resource limits (see list above). 

Expected results:
All deployed manifests define the resource requirements.

Additional info:
N/A

Comment 1 Simone Tiraboschi 2021-03-10 14:47:10 UTC
This is a subset of a larger effort ( https://bugzilla.redhat.com/1931519 ); in this specific bug we are focusing only at setting memory and CPU limits on hco-operator and hco-webhook deployments.

Comment 2 Nico Schieder 2021-03-22 10:42:48 UTC
While working on this, we found out that HCO is watching ConfigMaps (and Services) across the whole cluster, leading to unpredictable memory consumption depending on the size of the cluster.
To rectify this we are looking into filtering our caches for those objects.

Will update this issue as soon as we agreed on how to tackle it.

Comment 3 Simone Tiraboschi 2021-04-19 08:29:43 UTC
We are now waiting this change on controller-runtime:
https://github.com/kubernetes-sigs/controller-runtime/pull/1435

to have a predictable memory consumption.
Only at that time we will be able to really implement a memory limit.
This is probably not going to happen in 4.8 timeframe.

Comment 4 Simone Tiraboschi 2021-04-30 12:58:48 UTC
https://github.com/kubernetes-sigs/controller-runtime/pull/1435 got merged, we can start consuming it as soon as we will get a new release of controller-runtime

Comment 5 Simone Tiraboschi 2021-06-16 08:52:42 UTC
According to https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits guidelines that states:
"
Therefore, cluster components SHOULD NOT be configured with resource limits.
However, cluster components MUST declare resource requests for both CPU and memory.
"

we are going to set resource requests for both CPU and memory but not resource limits.

Comment 6 Debarati Basu-Nag 2021-08-09 15:10:16 UTC
Validated against at 4.9 cluster:

For hco-operator:
=================
 resources:
          requests:
            cpu: 10m
            memory: 96Mi
=================

For hco-webhook:
=================
        resources:
          requests:
            cpu: 5m
            memory: 48Mi
=================

Based on this above results, marking this ticket as verified.

Comment 9 errata-xmlrpc 2021-11-02 15:57:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104


Note You need to log in before you can comment on or make changes to this bug.