Bug 1717244
Summary: | Cloud credential operator pod OOMKilled | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Justin Pierce <jupierce> | ||||||
Component: | Master | Assignee: | Michal Fojtik <mfojtik> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Xingxing Xia <xxia> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.1.0 | CC: | aos-bugs, dgoodwin, jokerman, mmccomas | ||||||
Target Milestone: | --- | Keywords: | DeliveryBlocker, OpsBlocker | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-06-20 19:26:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 1582752 [details]
kubelet log for wedged cloud-credential-operator pod
- https://bugzilla.redhat.com/show_bug.cgi?id=1711402 is tracking the fix for the memory limits for this pod. - https://bugzilla.redhat.com/show_bug.cgi?id=1722604 was discovered as the likely root cause of the excessive memory use. - https://bugzilla.redhat.com/show_bug.cgi?id=1701326#c7 looks like it is tracking the kubelet error: https://bugzilla.redhat.com/attachment.cgi?id=1582752 - I've not seen the hostname changing issue on 4.1.2, so I'm assuming this has been fixed. *** This bug has been marked as a duplicate of bug 1722604 *** I think the OOM will persist just to due to the number of namespaces due to https://bugzilla.redhat.com/show_bug.cgi?id=1723892, proposed to backport to 4.1.z. |
Created attachment 1577379 [details] cluster listings Description of problem: After a long run period (>30 days), the cloud credentials operator on a starter cluster exited with OOMKilled. Version-Release number of selected component (if applicable): 4.1.0-rc.4 How reproducible: Unknown Steps to Reproduce: 1. Cluster with 928 namespaces 2. Allow the cluster to run for > 30 days 3. Actual results: Pod OOMKilled and sitting in ContainerCreating. Expected results: A 3.x starter cluster could have > 15k projects (but I expect run time is a factor here). Additional info: see attached listings