Bug 1938466 - packageserver deployment sets neither CPU or memory request on the packageserver container
Summary: packageserver deployment sets neither CPU or memory request on the packageser...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Haseeb Tariq
QA Contact: xzha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-13 17:36 UTC by Clayton Coleman
Modified: 2021-07-27 22:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:53:17 UTC
Target Upstream Version:
Embargoed:
htariq: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26106 0 None open Bug 1938466: test/extended/operators/resources: Packageserver already sets requests 2021-04-26 19:21:39 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:53:36 UTC

Description Clayton Coleman 2021-03-13 17:36:36 UTC
All payload components should request a reasonable minimum CPU and p90 memory usage

https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits

The packageserver container does not.  Please following recommendations in that doc.

Referenced from the new e2e test which gates components without resource requests and enforces the resource conventions.

Comment 1 Joe Lanford 2021-03-15 13:55:48 UTC
Taking some quick notes...

From the CONVENTIONS doc:

The memory request of cluster components should be set to a value 10% higher than their 90th percentile actual consumption over a standard end-to-end suite run.
The CPU request of cluster components is based on the following formula and lower/upper bound rules:

> floor(baseline_request / baseline_actual * component_actual)
>
> Then, these rules for lower and upper bounds should be applied:
>
>    The CPU request should never be lower than 5m. Setting a 5m limit avoids extreme ratio calculations when the node is stressed, while still representing the noise of a mostly idle workload.
>    If the computed value is more than 100m, use the lower of the computed value and 200% of the usage of the component in an idle cluster. This cap means components that require bursts of CPU time may be throttled on busy hosts, but they are more likely to be schedulable in the first place.

Since packageserver is a control plane component, we will use etcd as a baseline to compute its CPU resource requests.

Both CPU and memory request formulas use numbers based on the end-to-end parallel conformance test job. After running the tests, use the Prometheus instance in the cluster to query the kube_pod_resource_request and kube_pod_resource_limit metrics and find numbers for the Pod(s) for the component being tuned.

Comment 8 Haseeb Tariq 2021-04-22 21:29:14 UTC
Looking at the manifest for the packageserver deployment, it seems like it already does declare both cpu and memory resource requests.

Downstream repo: https://github.com/openshift/operator-framework-olm/blob/3440fa2c16fc6a744e2e9bbb1352a0b4731cdd6f/manifests/0000_50_olm_15-packageserver.clusterserviceversion.yaml#L129-L132
Upstream repo: https://github.com/operator-framework/operator-lifecycle-manager/blob/03233fd51c2e8986b6ed1975e501d5991cbe6f9b/manifests/0000_50_olm_15-packageserver.clusterserviceversion.yaml#L125-L128

And inspecting a 4.8.0 nightly cluster also shows that the packageserver containers have those requests specified:

```
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-22-061234   True        False         21m     Cluster version is 4.8.0-0.nightly-2021-04-22-061234

$ oc -n openshift-operator-lifecycle-manager get deployment packageserver
NAME            READY   UP-TO-DATE   AVAILABLE   AGE
packageserver   2/2     2            2           47m

$ oc -n openshift-operator-lifecycle-manager get deployment packageserver -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: packageserver
  namespace: openshift-operator-lifecycle-manager
  ...
spec:
  ...
  template:
    ...
    spec:
      containers:
      - command:
        - /bin/package-server
        - -v=4
        - --secure-port
        - "5443"
        - --global-namespace
        - openshift-marketplace
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9e27561a75453fcec25e1995e672f657e3802811424365408006c23a4bfb66be
        imagePullPolicy: IfNotPresent
        ...
        resources:
          requests:
            cpu: 10m
            memory: 50Mi
        ...


$ oc -n openshift-operator-lifecycle-manager get pods packageserver-94cbcf856-4xc7j -o yaml
apiVersion: v1
kind: Pod
metadata:
  name: packageserver-94cbcf856-4xc7j
  namespace: openshift-operator-lifecycle-manager
  ...
spec:
  containers:
  - command:
    - /bin/package-server
    - -v=4
    - --secure-port
    - "5443"
    - --global-namespace
    - openshift-marketplace
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9e27561a75453fcec25e1995e672f657e3802811424365408006c23a4bfb66be
    imagePullPolicy: IfNotPresent
    ...
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
  ...
``` 

Is there an instance of the e2e test "[sig-arch] Managed cluster should set requests but not limits" that's failing so we can look at the output to see why it's failing for the packageserver container?
https://github.com/openshift/origin/blob/master/test/extended/operators/resources.go#L18-L30
Or rather if it's using the same manifests to deploy packageserver that I've linked above.

Comment 10 xzha 2021-04-29 07:49:59 UTC
verify:

zhaoxia@xzha-mac bug-1946838 % oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-29-063720   True        False         27m     Cluster version is 4.8.0-0.nightly-2021-04-29-063720
zhaoxia@xzha-mac bug-1946838 %  oc adm release info  registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-04-29-063720 --commits|grep operator-lifecycle-manager
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         1751d4a123c7966987f3a57190d4e8068c047a47


zhaoxia@xzha-mac bug-1946838 % oc -n openshift-operator-lifecycle-manager get deployment packageserver -o=jsonpath="{..containers[0].resources.requests.cpu}"
10m%                                                                                                                                                                                                      zhaoxia@xzha-mac bug-1946838 % oc -n openshift-operator-lifecycle-manager get deployment packageserver -o=jsonpath="{..containers[0].resources.requests.memory}"
50Mi%

zhaoxia@xzha-mac bug-1946838 % oc  -n openshift-operator-lifecycle-manager get pod
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-6d68b579dd-5vhcg   1/1     Running   0          49m
olm-operator-78bbb49d48-wrmzd       1/1     Running   0          49m
packageserver-6b8df7ff98-26sf5      1/1     Running   0          46m
packageserver-6b8df7ff98-szphv      1/1     Running   0          46m
zhaoxia@xzha-mac bug-1946838 % oc  -n openshift-operator-lifecycle-manager get pod packageserver-6b8df7ff98-26sf5  -o yaml | grep requests -A 2
      requests:
        cpu: 10m
        memory: 50Mi
zhaoxia@xzha-mac bug-1946838 % oc  -n openshift-operator-lifecycle-manager get pod packageserver-6b8df7ff98-szphv -o yaml | grep requests -A 2
      requests:
        cpu: 10m
        memory: 50Mi

LGTM, verified.

Comment 13 errata-xmlrpc 2021-07-27 22:53:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.