Bug 1830141

Summary: openshift-apiserver pods getting oomkilled
Product: OpenShift Container Platform Reporter: Rob Gregory <rgregory>
Component: NodeAssignee: Peter Hunt <pehunt>
Status: CLOSED ERRATA QA Contact: Sunil Choudhary <schoudha>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: aos-bugs, jcrumple, jokerman, openshift-bugs-escalate, pehunt
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Incorrect handling of cgroup teardown on container removal Consequence: occasionally labelling a correctly exiting pod as being OOM killed Fix: drop the conmon monitor that incorrectly handled cgroup teardown Result: container cgroup teardown no longer incorrectly reports the pod was OOM killed
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 13:47:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rob Gregory 2020-04-30 22:15:55 UTC
Description of problem:

Seems similar to an open BZ for 4.4:
https://bugzilla.redhat.com/show_bug.cgi?id=1809593

Version-Release number of selected component (if applicable):
OpenShift 4.3

How reproducible:
Occurs periodically in customer's environment

Actual results:
[core@cp-2prqk-master-0 ~]$ oc get pods -A | grep -v Running | grep -v Completed
NAMESPACE                                               NAME                                                              READY   STATUS             RESTARTS   AGE
openshift-apiserver                                     apiserver-24dz5                                                   0/1     CrashLoopBackOff   1608       6d1h
openshift-operator-lifecycle-manager                    packageserver-5866bccf76-g4jft                                    1/1     Terminating        0          3m44s



Expected results:
apiserver pod stability

Comment 3 Peter Hunt 2020-05-01 16:38:53 UTC
what specific version of 4.3 is it happening in? A fix for this is coming in 4.3.18 (assuming it is the same issue as was in 4.4, which I believe to be the case)

Comment 4 Rob Gregory 2020-05-06 18:31:36 UTC
This occurrence was on 4.3.12 - I'm sure they would be willing to test a backport on 4.3.18 if not update to 4.4.3

Comment 9 errata-xmlrpc 2020-05-20 13:47:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2129