Bug 1717244

Summary: Cloud credential operator pod OOMKilled
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: MasterAssignee: Michal Fojtik <mfojtik>
Status: CLOSED DUPLICATE QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, dgoodwin, jokerman, mmccomas
Target Milestone: ---Keywords: DeliveryBlocker, OpsBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-20 19:26:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cluster listings
none
kubelet log for wedged cloud-credential-operator pod none

Description Justin Pierce 2019-06-05 01:15:01 UTC
Created attachment 1577379 [details]
cluster listings

Description of problem:
After a long run period (>30 days), the cloud credentials operator on a starter cluster exited with OOMKilled.

Version-Release number of selected component (if applicable):
4.1.0-rc.4 

How reproducible:
Unknown

Steps to Reproduce:
1. Cluster with 928 namespaces
2. Allow the cluster to run for > 30 days
3.

Actual results:
Pod OOMKilled and sitting in ContainerCreating.

Expected results:
A 3.x starter cluster could have > 15k projects (but I expect run time is a factor here). 

Additional info:
see attached listings

Comment 4 Justin Pierce 2019-06-20 16:40:08 UTC
Created attachment 1582752 [details]
kubelet log for wedged cloud-credential-operator pod

Comment 5 Justin Pierce 2019-06-20 19:26:33 UTC
- https://bugzilla.redhat.com/show_bug.cgi?id=1711402 is tracking the fix for the memory limits for this pod. 
- https://bugzilla.redhat.com/show_bug.cgi?id=1722604 was discovered as the likely root cause of the excessive memory use.
- https://bugzilla.redhat.com/show_bug.cgi?id=1701326#c7 looks like it is tracking the kubelet error: https://bugzilla.redhat.com/attachment.cgi?id=1582752
- I've not seen the hostname changing issue on 4.1.2, so I'm assuming this has been fixed.

*** This bug has been marked as a duplicate of bug 1722604 ***

Comment 6 Devan Goodwin 2019-06-25 16:06:40 UTC
I think the OOM will persist just to due to the number of namespaces due to https://bugzilla.redhat.com/show_bug.cgi?id=1723892, proposed to backport to 4.1.z.