Bug 1717244 - Cloud credential operator pod OOMKilled
Summary: Cloud credential operator pod OOMKilled
Keywords:
Status: CLOSED DUPLICATE of bug 1722604
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Michal Fojtik
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-05 01:15 UTC by Justin Pierce
Modified: 2019-06-25 16:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-20 19:26:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cluster listings (11.99 KB, text/plain)
2019-06-05 01:15 UTC, Justin Pierce
no flags Details
kubelet log for wedged cloud-credential-operator pod (67.16 KB, text/plain)
2019-06-20 16:40 UTC, Justin Pierce
no flags Details

Description Justin Pierce 2019-06-05 01:15:01 UTC
Created attachment 1577379 [details]
cluster listings

Description of problem:
After a long run period (>30 days), the cloud credentials operator on a starter cluster exited with OOMKilled.

Version-Release number of selected component (if applicable):
4.1.0-rc.4 

How reproducible:
Unknown

Steps to Reproduce:
1. Cluster with 928 namespaces
2. Allow the cluster to run for > 30 days
3.

Actual results:
Pod OOMKilled and sitting in ContainerCreating.

Expected results:
A 3.x starter cluster could have > 15k projects (but I expect run time is a factor here). 

Additional info:
see attached listings

Comment 4 Justin Pierce 2019-06-20 16:40:08 UTC
Created attachment 1582752 [details]
kubelet log for wedged cloud-credential-operator pod

Comment 5 Justin Pierce 2019-06-20 19:26:33 UTC
- https://bugzilla.redhat.com/show_bug.cgi?id=1711402 is tracking the fix for the memory limits for this pod. 
- https://bugzilla.redhat.com/show_bug.cgi?id=1722604 was discovered as the likely root cause of the excessive memory use.
- https://bugzilla.redhat.com/show_bug.cgi?id=1701326#c7 looks like it is tracking the kubelet error: https://bugzilla.redhat.com/attachment.cgi?id=1582752
- I've not seen the hostname changing issue on 4.1.2, so I'm assuming this has been fixed.

*** This bug has been marked as a duplicate of bug 1722604 ***

Comment 6 Devan Goodwin 2019-06-25 16:06:40 UTC
I think the OOM will persist just to due to the number of namespaces due to https://bugzilla.redhat.com/show_bug.cgi?id=1723892, proposed to backport to 4.1.z.


Note You need to log in before you can comment on or make changes to this bug.