Bug 2126626
| Summary: | ocs-operator pods getting OOMKilled failing the ocs-consumer installations on the respective cluster | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Gobinda Das <godas> | |
| Component: | ocs-operator | Assignee: | Malay Kumar parida <mparida> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Elena Bondarenko <ebondare> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.12 | CC: | aeyal, dbindra, hnallurv, kramdoss, lgangava, mparida, muagarwa, nberry, nigoyal, ocs-bugs, odf-bz-bot, sostapov, ykukreja | |
| Target Milestone: | --- | |||
| Target Release: | ODF 4.12.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause:
OCS-operator is relying on the default "AllNamespaces" cache in the controller-runtime, which works by syncing all the Kubernetes resources in it when the operator starts running for the first time.
Consequence:
The initial informer cache sync is so huge that it causes a sudden massive spike in the memory usage of the operator. And this spike is directly proportional to the amount of resources present in the underlying Kubernetes/Openshift cluster.
The kind of memory limits configured for ocs-operator mostly compensates for the memory spike by a close margin, yet there are a few situations where the memory spike is not compensated by the set memory limits which causes OOMKilled failures for the ocs-operator pods.
Fix:
Rather than the default "AllNamespaces" cache, We specify a cache which would only cache-sync the resources / custom-resources in the same namespace.
Result:
This massively reduces operator memory usage spike & would help to avoid OOMKIlled situation for OCS operator pod.
|
Story Points: | --- | |
| Clone Of: | 2121329 | |||
| : | 2161650 (view as bug list) | Environment: | ||
| Last Closed: | 2023-02-08 14:06:28 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2121329, 2131662, 2161650 | |||
|
Description
Gobinda Das
2022-09-14 07:41:31 UTC
*** Bug 2131662 has been marked as a duplicate of this bug. *** |