Bug 1760608
Summary: | OLM pods have no resource limits set | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Christoph Blecker <cblecker> | |
Component: | OLM | Assignee: | Evan Cordell <ecordell> | |
OLM sub component: | OLM | QA Contact: | Salvatore Colangelo <scolange> | |
Status: | CLOSED WONTFIX | Docs Contact: | ||
Severity: | high | |||
Priority: | unspecified | CC: | agreene, bandrade, dageoffr, dsover, ecordell, nhale, nmalik, tbuskey | |
Version: | 4.4 | |||
Target Milestone: | --- | |||
Target Release: | 4.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1780755 (view as bug list) | Environment: | ||
Last Closed: | 2020-01-30 14:06:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1780755 |
Description
Christoph Blecker
2019-10-10 22:21:28 UTC
Moving to 4.3. Will consider backport for both 4.1 and 4.2 after delivered to Master. I am going to gather some data from a running CI cluster on the OLM operator pod's usage of memory and CPU as more operators and CRs are introduced into the cluster. Once that data is available I'm going to share it with members of the OLM team and come up with some kind of summary metric that we can apply as limits when deploying OLM. PR hasn't merged yet. https://github.com/operator-framework/operator-lifecycle-manager/pull/1142 has merged, but happened around the time that release branches were cut (I think). So moving this to 4.4 and will clone to 4.3 (and update PR link). Hi , no limit show in pods: as told in https://github.com/operator-framework/operator-lifecycle-manager/pull/1179 [scolange@scolange ~]$ oc project openshift-operator-lifecycle-manager Now using project "openshift-operator-lifecycle-manager" on server "https://api.juzhao-44.qe.devcluster.openshift.com:6443". [scolange@scolange ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2019-12-20-210709 True False 15h Cluster version is 4.4.0-0.nightly-2019-12-20-210709 [scolange@scolange ~]$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-6b4898f6f5-95pzx 1/1 Running 0 16h olm-operator-6fb5fffb9-fvvgk 1/1 Running 0 16h packageserver-6d5658d454-f659k 1/1 Running 0 16h packageserver-6d5658d454-qkw5h 1/1 Running 0 16h [scolange@scolange ~]$ oc get catalog-operator-6b4898f6f5-95pzx -o yaml error: the server doesn't have a resource type "catalog-operator-6b4898f6f5-95pzx" [scolange@scolange ~]$ oc get pod catalog-operator-6b4898f6f5-95pzx -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.0.17" ], "dns": {}, "default-route": [ "10.129.0.1" ] }] creationTimestamp: "2019-12-24T00:53:54Z" generateName: catalog-operator-6b4898f6f5- labels: app: catalog-operator pod-template-hash: 6b4898f6f5 name: catalog-operator-6b4898f6f5-95pzx namespace: openshift-operator-lifecycle-manager ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: catalog-operator-6b4898f6f5 uid: abf073f4-9cea-4d8a-bf89-345bf2fc75a9 resourceVersion: "4720" selfLink: /api/v1/namespaces/openshift-operator-lifecycle-manager/pods/catalog-operator-6b4898f6f5-95pzx uid: f2e34439-dfb6-4af4-8025-a7288b9a881b spec: containers: - args: - -namespace - openshift-marketplace - -configmapServerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cecac84c3cdc369e130ceb8107de07cd7c00399d28cfb681a3968e09d9094be0 - -writeStatusName - operator-lifecycle-manager-catalog - -tls-cert - /var/run/secrets/serving-cert/tls.crt - -tls-key - /var/run/secrets/serving-cert/tls.key command: - /bin/catalog env: - name: RELEASE_VERSION value: 4.4.0-0.nightly-2019-12-20-210709 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ba931b3266d8bef0b61dfd64383ef4b5d36a50ec3091f2ca59fd2e17609aa60 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 8080 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: catalog-operator ports: - containerPort: 8080 protocol: TCP - containerPort: 8081 name: metrics protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 8080 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: requests: cpu: 10m memory: 80Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /var/run/secrets/serving-cert name: serving-cert - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: olm-operator-serviceaccount-token-4h84j readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true nodeName: ip-10-0-163-60.us-east-2.compute.internal nodeSelector: beta.kubernetes.io/os: linux node-role.kubernetes.io/master: "" priority: 2000000000 priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: olm-operator-serviceaccount serviceAccountName: olm-operator-serviceaccount terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 120 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 120 - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists volumes: - name: serving-cert secret: defaultMode: 420 secretName: catalog-operator-serving-cert - name: olm-operator-serviceaccount-token-4h84j secret: defaultMode: 420 secretName: olm-operator-serviceaccount-token-4h84j status: conditions: - lastProbeTime: null lastTransitionTime: "2019-12-24T00:55:36Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2019-12-24T00:57:23Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2019-12-24T00:57:23Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2019-12-24T00:55:36Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://c2850d92cecbdb66a2aae1b010c0eb1305fdb7e4f489fbe5701d33a8917df082 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ba931b3266d8bef0b61dfd64383ef4b5d36a50ec3091f2ca59fd2e17609aa60 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ba931b3266d8bef0b61dfd64383ef4b5d36a50ec3091f2ca59fd2e17609aa60 lastState: {} name: catalog-operator ready: true restartCount: 0 started: true state: running: startedAt: "2019-12-24T00:57:16Z" hostIP: 10.0.163.60 phase: Running podIP: 10.129.0.17 podIPs: - ip: 10.129.0.17 qosClass: Burstable startTime: "2019-12-24T00:55:36Z" [scolange@scolange ~]$ oc get pod olm-operator-6fb5fffb9-fvvgk -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.0.16" ], "dns": {}, "default-route": [ "10.129.0.1" ] }] creationTimestamp: "2019-12-24T00:53:54Z" generateName: olm-operator-6fb5fffb9- labels: app: olm-operator pod-template-hash: 6fb5fffb9 name: olm-operator-6fb5fffb9-fvvgk namespace: openshift-operator-lifecycle-manager ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: olm-operator-6fb5fffb9 uid: 7b19acc6-c772-40ec-9105-b426901260b2 resourceVersion: "4707" selfLink: /api/v1/namespaces/openshift-operator-lifecycle-manager/pods/olm-operator-6fb5fffb9-fvvgk uid: c85e49b9-8a9a-4685-a5a4-3d54f769f58c spec: containers: - args: - -namespace - $(OPERATOR_NAMESPACE) - -writeStatusName - operator-lifecycle-manager - -writePackageServerStatusName - operator-lifecycle-manager-packageserver - -tls-cert - /var/run/secrets/serving-cert/tls.crt - -tls-key - /var/run/secrets/serving-cert/tls.key command: - /bin/olm env: - name: RELEASE_VERSION value: 4.4.0-0.nightly-2019-12-20-210709 - name: OPERATOR_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: OPERATOR_NAME value: olm-operator image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ba931b3266d8bef0b61dfd64383ef4b5d36a50ec3091f2ca59fd2e17609aa60 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 8080 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: olm-operator ports: - containerPort: 8080 protocol: TCP - containerPort: 8081 name: metrics protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 8080 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: requests: cpu: 10m memory: 160Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /var/run/secrets/serving-cert name: serving-cert - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: olm-operator-serviceaccount-token-4h84j readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true nodeName: ip-10-0-163-60.us-east-2.compute.internal nodeSelector: beta.kubernetes.io/os: linux node-role.kubernetes.io/master: "" priority: 2000000000 priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: olm-operator-serviceaccount serviceAccountName: olm-operator-serviceaccount terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 120 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 120 - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists volumes: - name: serving-cert secret: defaultMode: 420 secretName: olm-operator-serving-cert - name: olm-operator-serviceaccount-token-4h84j secret: defaultMode: 420 secretName: olm-operator-serviceaccount-token-4h84j status: conditions: - lastProbeTime: null lastTransitionTime: "2019-12-24T00:55:36Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2019-12-24T00:57:21Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2019-12-24T00:57:21Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2019-12-24T00:55:36Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://bb157d1858a7a5098c5742347e9f2912a648637791cd2aa18d7732154c81624d image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ba931b3266d8bef0b61dfd64383ef4b5d36a50ec3091f2ca59fd2e17609aa60 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ba931b3266d8bef0b61dfd64383ef4b5d36a50ec3091f2ca59fd2e17609aa60 lastState: {} name: olm-operator ready: true restartCount: 0 started: true state: running: startedAt: "2019-12-24T00:57:16Z" hostIP: 10.0.163.60 phase: Running podIP: 10.129.0.16 podIPs: - ip: 10.129.0.16 qosClass: Burstable startTime: "2019-12-24T00:55:36Z" Exspected result: resources: limits: cpu: 200m memory: 200Mi cpu: 400m memory: 400Mi Actual result : resources: requests: cpu: 10m memory: 160Mi *** Bug 1780755 has been marked as a duplicate of this bug. *** We attempted to set some reasonable limits here, but during scale testing we found that even those could be hit and cause a problem. We reverted this change to align with other openshift cluster operators, which do not set limits. |