All payload components should request a reasonable minimum CPU and p90 memory usage https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits The packageserver container does not. Please following recommendations in that doc. Referenced from the new e2e test which gates components without resource requests and enforces the resource conventions.
Taking some quick notes... From the CONVENTIONS doc: The memory request of cluster components should be set to a value 10% higher than their 90th percentile actual consumption over a standard end-to-end suite run. The CPU request of cluster components is based on the following formula and lower/upper bound rules: > floor(baseline_request / baseline_actual * component_actual) > > Then, these rules for lower and upper bounds should be applied: > > The CPU request should never be lower than 5m. Setting a 5m limit avoids extreme ratio calculations when the node is stressed, while still representing the noise of a mostly idle workload. > If the computed value is more than 100m, use the lower of the computed value and 200% of the usage of the component in an idle cluster. This cap means components that require bursts of CPU time may be throttled on busy hosts, but they are more likely to be schedulable in the first place. Since packageserver is a control plane component, we will use etcd as a baseline to compute its CPU resource requests. Both CPU and memory request formulas use numbers based on the end-to-end parallel conformance test job. After running the tests, use the Prometheus instance in the cluster to query the kube_pod_resource_request and kube_pod_resource_limit metrics and find numbers for the Pod(s) for the component being tuned.
Looking at the manifest for the packageserver deployment, it seems like it already does declare both cpu and memory resource requests. Downstream repo: https://github.com/openshift/operator-framework-olm/blob/3440fa2c16fc6a744e2e9bbb1352a0b4731cdd6f/manifests/0000_50_olm_15-packageserver.clusterserviceversion.yaml#L129-L132 Upstream repo: https://github.com/operator-framework/operator-lifecycle-manager/blob/03233fd51c2e8986b6ed1975e501d5991cbe6f9b/manifests/0000_50_olm_15-packageserver.clusterserviceversion.yaml#L125-L128 And inspecting a 4.8.0 nightly cluster also shows that the packageserver containers have those requests specified: ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-04-22-061234 True False 21m Cluster version is 4.8.0-0.nightly-2021-04-22-061234 $ oc -n openshift-operator-lifecycle-manager get deployment packageserver NAME READY UP-TO-DATE AVAILABLE AGE packageserver 2/2 2 2 47m $ oc -n openshift-operator-lifecycle-manager get deployment packageserver -o yaml apiVersion: apps/v1 kind: Deployment metadata: name: packageserver namespace: openshift-operator-lifecycle-manager ... spec: ... template: ... spec: containers: - command: - /bin/package-server - -v=4 - --secure-port - "5443" - --global-namespace - openshift-marketplace image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9e27561a75453fcec25e1995e672f657e3802811424365408006c23a4bfb66be imagePullPolicy: IfNotPresent ... resources: requests: cpu: 10m memory: 50Mi ... $ oc -n openshift-operator-lifecycle-manager get pods packageserver-94cbcf856-4xc7j -o yaml apiVersion: v1 kind: Pod metadata: name: packageserver-94cbcf856-4xc7j namespace: openshift-operator-lifecycle-manager ... spec: containers: - command: - /bin/package-server - -v=4 - --secure-port - "5443" - --global-namespace - openshift-marketplace image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9e27561a75453fcec25e1995e672f657e3802811424365408006c23a4bfb66be imagePullPolicy: IfNotPresent ... resources: requests: cpu: 10m memory: 50Mi ... ``` Is there an instance of the e2e test "[sig-arch] Managed cluster should set requests but not limits" that's failing so we can look at the output to see why it's failing for the packageserver container? https://github.com/openshift/origin/blob/master/test/extended/operators/resources.go#L18-L30 Or rather if it's using the same manifests to deploy packageserver that I've linked above.
verify: zhaoxia@xzha-mac bug-1946838 % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-04-29-063720 True False 27m Cluster version is 4.8.0-0.nightly-2021-04-29-063720 zhaoxia@xzha-mac bug-1946838 % oc adm release info registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-04-29-063720 --commits|grep operator-lifecycle-manager operator-lifecycle-manager https://github.com/openshift/operator-framework-olm 1751d4a123c7966987f3a57190d4e8068c047a47 zhaoxia@xzha-mac bug-1946838 % oc -n openshift-operator-lifecycle-manager get deployment packageserver -o=jsonpath="{..containers[0].resources.requests.cpu}" 10m% zhaoxia@xzha-mac bug-1946838 % oc -n openshift-operator-lifecycle-manager get deployment packageserver -o=jsonpath="{..containers[0].resources.requests.memory}" 50Mi% zhaoxia@xzha-mac bug-1946838 % oc -n openshift-operator-lifecycle-manager get pod NAME READY STATUS RESTARTS AGE catalog-operator-6d68b579dd-5vhcg 1/1 Running 0 49m olm-operator-78bbb49d48-wrmzd 1/1 Running 0 49m packageserver-6b8df7ff98-26sf5 1/1 Running 0 46m packageserver-6b8df7ff98-szphv 1/1 Running 0 46m zhaoxia@xzha-mac bug-1946838 % oc -n openshift-operator-lifecycle-manager get pod packageserver-6b8df7ff98-26sf5 -o yaml | grep requests -A 2 requests: cpu: 10m memory: 50Mi zhaoxia@xzha-mac bug-1946838 % oc -n openshift-operator-lifecycle-manager get pod packageserver-6b8df7ff98-szphv -o yaml | grep requests -A 2 requests: cpu: 10m memory: 50Mi LGTM, verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438