Bug 1740937
| Summary: | Pods for marketplace CatalogSource and CatalogSourceConfig consuming large amounts of memory | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Naveen Malik <nmalik> | |
| Component: | OLM | Assignee: | Evan Cordell <ecordell> | |
| OLM sub component: | OperatorHub | QA Contact: | Fan Jia <jfan> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | bandrade, cblecker, chuo, jeder, scolange | |
| Version: | 4.1.z | |||
| Target Milestone: | --- | |||
| Target Release: | 4.1.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1746197 (view as bug list) | Environment: | ||
| Last Closed: | 2019-09-25 07:27:53 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1746197 | |||
| Bug Blocks: | ||||
| Attachments: | ||||
|
Description
Naveen Malik
2019-08-13 22:00:07 UTC
Cluster: cblecker-4x
Env: stage
Created: 6/5/2019 3:43:09 PM
Current version: 4.1.9
History from ClusterVersion for latest version:
- completionTime: "2019-08-07T17:51:15Z"
image: quay.io/openshift-release-dev/ocp-release@sha256:27fd24c705d1107cc73cb7dda8257fe97900e130b68afc314d0ef0e31bcf9b8e
startedTime: "2019-08-07T17:12:00Z"
state: Completed
verified: true
version: 4.1.9
Query in screenshots: container_memory_rss{namespace="openshift-marketplace",container_name!="",container_name!="POD",container_name!="marketplace-operator"}
Screenshots:
cblecker-4x-2w.png - last 2 weeks of query
cblecker-4x-upgrade-to-stable.png - from when the 4.1.9 upgrade completed until memory stabilized
This shows the containers were growing post upgrade, were restarted a few times and eventually stabilized at a good memory consumption.
Created attachment 1603556 [details]
cblecker-4x: last 2 weeks of metric
Created attachment 1603557 [details]
cblecker-4x: from when the 4.1.9 upgrade completed until memory stabilized
Cluster: example with operators installed
Env: production
Created: 2019-07-16T22:03:14Z
Current version: 4.1.9
History from ClusterVersion for latest version:
- completionTime: "2019-08-08T20:48:40Z"
image: quay.io/openshift-release-dev/ocp-release@sha256:27fd24c705d1107cc73cb7dda8257fe97900e130b68afc314d0ef0e31bcf9b8e
startedTime: "2019-08-08T14:36:40Z"
state: Completed
verified: true
version: 4.1.9
Query in screenshots: container_memory_rss{namespace="openshift-marketplace",container_name!="",container_name!="POD",container_name!="marketplace-operator"}
Screenshots:
cluster-with-operators-2w.png - last 2 weeks of query
This shows the containers are growing post upgrade. In addition, the cluster was used to install operators via OperatorHub late last week. Each of the lines is a pod in openshift-marketplace namespace:
- certified-operators
- community-operators
- installed-redhat-openshift-logging
- installed-openshift-operators
- redhat-operators
Created attachment 1603558 [details]
cluster-with-operators: last 2 weeks of query
Hi, Naveen Thanks for your reporting this issue. We have a cluster which running one day, but we don't find this issue. We will keep an eye on it, thanks! mac:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.11 True False 16h Cluster version is 4.1.11 mac:~ jianzhang$ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-129-68.us-east-2.compute.internal Ready master 19h v1.13.4+d81afa6ba ip-10-0-141-168.us-east-2.compute.internal Ready worker 19h v1.13.4+d81afa6ba ip-10-0-153-224.us-east-2.compute.internal Ready worker 19h v1.13.4+d81afa6ba ip-10-0-155-205.us-east-2.compute.internal Ready master 19h v1.13.4+d81afa6ba ip-10-0-164-116.us-east-2.compute.internal Ready worker 19h v1.13.4+d81afa6ba ip-10-0-174-123.us-east-2.compute.internal Ready master 19h v1.13.4+d81afa6ba mac:~ jianzhang$ oc adm top pods NAME CPU(cores) MEMORY(bytes) certified-operators-6bcdc96b-lzvd9 2m 22Mi community-operators-655bb9cd-h9fn7 2m 68Mi marketplace-operator-7df66dbf67-d7829 2m 14Mi redhat-operators-7c4b9f9f6f-b978p 3m 40Mi We think that the grpc library backports we performed fix this in 4.1.15. Moving to modified for that reason. Still seeing this issue on a cluster upgraded to 4.1.15. Created attachment 1616712 [details]
example catalog-operator log from 4.1.15 cluster with this problem
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2820 |