Bug 1856990
| Summary: | OLM creating and listing installplans continuously lets explode kube-apiserver memory consumption until OOM | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | rvanderp |
| Component: | OLM | Assignee: | Evan Cordell <ecordell> |
| OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | agabriel, andcosta, aos-bugs, asonmez, assingh, bshirren, cpassare, etamir, farandac, jcoscia, joboyer, jolee, kzona, mfojtik, oarribas, openshift-bugs-escalate, palonsor, palshure, pamoedom, ratamir, rcyriac, rgregory, rhowe, rkshirsa, sttts, vdinh, vwalek |
| Version: | 4.4 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-24 03:56:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
rvanderp
2020-07-14 21:50:42 UTC
Hi rvanderp, The issue for the installPlan being created is the missing SA? We (OCS) are trying to figure out how to reproduce this issue and this might be the clue we are looking for Hi Raz - The lib-bucket-provisioner pod was crash-looping on the missing SA. I appeared that a new installplan was being created after the crash. I couldn't logically piece together why that would occur other than maybe a new installplan gets created when the pod got restarted, I wanted to review the source for that. There may have been other missing resources, but that was the only one I could find. We created the missing SA, which resolved that specific error but they hit other problems(which didn't really shock me as we didn't really have time to make sure the account had the right role bindings, RBAC, etc...). At that point we decided to remove the installplans to give the API server some breathing room and the cluster has been stable since then. I reproduced a similar issue on my own cluster by just installing 4.4.1 and letting it sit for a few hours. *** Bug 1857676 has been marked as a duplicate of this bug. *** |