Bug 1938492

Summary:	Marketplace extract container does not request CPU or memory
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	OLM	Assignee:	Kevin Rizza <krizza>
OLM sub component:	OLM	QA Contact:	Salvatore Colangelo <scolange>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	jiazha, jlanford, nhale, tflannag, wking
Version:	4.8
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1952851 (view as bug list)		Environment:
Last Closed:	2021-07-27 22:53:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1952851

Description Clayton Coleman 2021-03-13 22:27:22 UTC

All payload components should request a reasonable minimum CPU and p90 memory usage

https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits

The openshift-marketplace batch job container "extract" does not request cpu or  memory.  Please follow recommendations in that doc.

Referenced from the new e2e test which gates components without resource requests and enforces the resource conventions.

  batch/v1/Job/openshift-marketplace/ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2e8839/container/extract does not have a cpu request (rule: "batch/v1/Job/openshift-marketplace/ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2e8839/container/extract/request[cpu]")
  batch/v1/Job/openshift-marketplace/ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2e8839/container/extract does not have a memory request (rule: "batch/v1/Job/openshift-marketplace/ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2e8839/container/extract/request[memory]")

Comment 1 Joe Lanford 2021-03-15 13:59:14 UTC

From the CONVENTIONS doc:

The memory request of cluster components should be set to a value 10% higher than their 90th percentile actual consumption over a standard end-to-end suite run.
The CPU request of cluster components is based on the following formula and lower/upper bound rules:

> floor(baseline_request / baseline_actual * component_actual)
>
> Then, these rules for lower and upper bounds should be applied:
>
>    The CPU request should never be lower than 5m. Setting a 5m limit avoids extreme ratio calculations when the node is stressed, while still representing the noise of a mostly idle workload.
>    If the computed value is more than 100m, use the lower of the computed value and 200% of the usage of the component in an idle cluster. This cap means components that require bursts of CPU time may be throttled on busy hosts, but they are more likely to be schedulable in the first place.

Since the openshift-marketplace's "extract" batch job is part of a control plane component, we will use etcd as a baseline to compute its CPU resource requests.

Both CPU and memory request formulas use numbers based on the end-to-end parallel conformance test job. After running the tests, use the Prometheus instance in the cluster to query the kube_pod_resource_request and kube_pod_resource_limit metrics and find numbers for the Pod(s) for the component being tuned.

Comment 3 Clayton Coleman 2021-03-25 22:16:33 UTC

Moving to high severity, payload workloads may not run without requests. 

This may not be deferred from 4.8

Comment 4 W. Trevor King 2021-04-02 21:40:19 UTC

> This may not be deferred from 4.8

That means blocker+.

Comment 8 W. Trevor King 2021-04-16 02:55:58 UTC

pulling back to POST briefly while I update the test suite.  This will not affect verifying the marketplace fix, so feel free to go ahead with that.  Will be back to ON_QA shortly.

Comment 10 Salvatore Colangelo 2021-04-20 16:00:31 UTC

[scolange@scolange go]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-18-101412   True        False         26h     Cluster version is 4.8.0-0.nightly-2021-04-18-101412


1. Install an operator in a general namespaces

[scolange@scolange go]$ oc -n scolange get sub
NAME                             PACKAGE                          SOURCE                CHANNEL
couchbase-enterprise-certified   couchbase-enterprise-certified   certified-operators   stable
[scolange@scolange go]$ oc -n scolange get csv
NAME                        DISPLAY              VERSION   REPLACES                    PHASE
couchbase-operator.v2.1.0   Couchbase Operator   2.1.0     couchbase-operator.v2.0.2   Succeeded
[scolange@scolange go]$ oc -n scolange get ip
NAME            CSV                         APPROVAL    APPROVED
install-4vgjl   couchbase-operator.v2.1.0   Automatic   true

2. Verify the jobs from origin namespace in this case ( openshift-marketplace ) 

[scolange@scolange go]$ oc -n openshift-marketplace get jobs
NAME                                                              COMPLETIONS   DURATION   AGE
5c410a08445875ef0dd1a81b992b068f3a86bd2f5a79c433ad9e0bc4d62ef09   1/1           25s        19m
 

3. Verify inside the jobs the value of spec.containers[].resources.requests field are setted

scolange@scolange go]$ oc -n openshift-marketplace get jobs 5c410a08445875ef0dd1a81b992b068f3a86bd2f5a79c433ad9e0bc4d62ef09 -o yaml
apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2021-04-20T15:38:07Z"
  labels:
    controller-uid: 52bcb04d-571c-408b-9b5d-69f2466e7806
    job-name: 5c410a08445875ef0dd1a81b992b068f3a86bd2f5a79c433ad9e0bc4d62ef09
  managedFields:
  - apiVersion: batch/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .: {}
          k:{"uid":"dc09c497-b0c5-4f82-a70b-31fc03ce774a"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
....
...
....

        resources:
          requests:
            cpu: 10m
            memory: 50Mi
....








LGMT

Comment 13 errata-xmlrpc 2021-07-27 22:53:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438