Bug 1799079

Summary: revision-pruner pod OOMKilled during installation
Product: OpenShift Container Platform Reporter: Andy McCrae <amccrae>
Component: kube-apiserverAssignee: Mike Dame <mdame>
Status: CLOSED WONTFIX QA Contact: Ke Wang <kewang>
Severity: low Docs Contact:
Priority: low    
Version: 4.2.zCC: aos-bugs, cbaus, danili, kewang, mdame, mfojtik, openshift-bugzilla-robot
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: s390x   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 16:49:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1848583    
Bug Blocks:    

Description Andy McCrae 2020-02-06 15:27:04 UTC
Description of problem:

When deploying 4.2.16 on the mini clusters some of the revision-pruner containers are OOMKilled during installation. They do restart successfully.

Version-Release number of selected component (if applicable):
4.2.18/4.2.16

How reproducible:
Everytime I've done an install I've had at least 1 or 2.

Steps to Reproduce:
1. Run through the regular installation approach.
2. oc get pods --all-namespaces=true | grep -i oom

Actual results:

# oc get pods --all-namespaces=true | grep -i oom
openshift-kube-apiserver                                revision-pruner-3-master-2.test.example.com                       0/1     OOMKilled   0          

Expected results:

No pods are OOMKilled.

Additional info:
The logs from the failed pod are as follows:
# oc logs -f revision-pruner-3-master-2.test.example.com -n openshift-kube-apiserver
I0206 11:12:58.379630       1 cmd.go:38] &{<nil> true {false} prune true map[max-eligible-revision:0xc0006f5d60 protected-revisions:0xc0006f5e00 resource-dir:0xc0006f5ea0 static-pod-name:0xc0006f5f40 v:0xc0006f45a0] [0xc0006f45a0 0xc0006f5d60 0xc0006f5e00 0xc0006f5ea0 0xc0006f5f40] [] map[alsologtostderr:0xc0006f4000 help:0xc000704320 log-backtrace-at:0xc0006f40a0 log-dir:0xc0006f4140 log-file:0xc0006f41e0 log-file-max-size:0xc0006f4280 log-flush-frequency:0xc0000de280 logtostderr:0xc0006f4320 max-eligible-revision:0xc0006f5d60 protected-revisions:0xc0006f5e00 resource-dir:0xc0006f5ea0 skip-headers:0xc0006f43c0 skip-log-headers:0xc0006f4460 static-pod-name:0xc0006f5f40 stderrthreshold:0xc0006f4500 v:0xc0006f45a0 vmodule:0xc0006f4640] [0xc0006f5d60 0xc0006f5e00 0xc0006f5ea0 0xc0006f5f40 0xc0006f4000 0xc0006f40a0 0xc0006f4140 0xc0006f41e0 0xc0006f4280 0xc0000de280 0xc0006f4320 0xc0006f43c0 0xc0006f4460 0xc0006f4500 0xc0006f45a0 0xc0006f4640 0xc000704320] [0xc0006f4000 0xc000704320 0xc0006f40a0 0xc0006f4140 0xc0006f41e0 0xc0006f4280 0xc0000de280 0xc0006f4320 0xc0006f5d60 0xc0006f5e00 0xc0006f5ea0 0xc0006f43c0 0xc0006f4460 0xc0006f5f40 0xc0006f4500 0xc0006f45a0 0xc0006f4640] map[104:0xc000704320 118:0xc0006f45a0] [] -1 0 0xc0006f2cc0 true <nil> []}
I0206 11:12:58.379942       1 cmd.go:39] (*prune.PruneOptions)(0xc000097100)({
 MaxEligibleRevision: (int) 3,
 ProtectedRevisions: ([]int) (len=3 cap=3) {
  (int) 1,
  (int) 2,
  (int) 3
 },
 ResourceDir: (string) (len=36) "/etc/kubernetes/static-pod-resources",
 StaticPodName: (string) (len=18) "kube-apiserver-pod"
})

Comment 1 Andy McCrae 2020-02-06 15:28:24 UTC
The installations still succeed, so setting this as low prio.

Comment 3 Carvel Baus 2020-06-05 14:48:37 UTC
Assigning to kube-apiserver for further triage. Don't believe this is arch specific.

Comment 4 Mike Dame 2020-06-18 14:50:01 UTC
*** Bug 1848584 has been marked as a duplicate of this bug. ***

Comment 7 Ke Wang 2020-07-13 07:32:39 UTC
- Checked if the bug related PR already was merged in 4.2 branch.
$ cd library-go

$ git checkout -b 4.2 remotes/origin/release-4.2
Updating files: 100% (13716/13716), done.
Branch '4.2' set up to track remote branch 'release-4.2' from 'origin'.
Switched to a new branch '4.2'

$ git pull

$ git log -1
commit c27670c64634202911c0f9b3f946bf4658d27d71 (HEAD -> 4.2, origin/release-4.2)
Merge: d58edcb9 2cfb6b95
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Tue Jun 30 21:06:53 2020 -0400

    Merge pull request #822 from damemi/4.2-backport-pruner-requests

The bug related PR has been merged in on Jun 30.

- Check if the PR #822 has been bumped to component kube-apiserver-operator of latest payload,
$ cd cluster-kube-apiserver-operator/

$ git pull

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2020-07-12-134436 | grep kube-apiserver
  cluster-kube-apiserver-operator               https://github.com/openshift/cluster-kube-apiserver-operator               925d5210ffb332e4da75459df7a66380a135bc3a

$ git log --date local --pretty="%h %an %cd - %s" 925d521 | grep '#822'

$ git log --date local --pretty="%h %an %cd - %s" 925d521 -1
925d5210 OpenShift Merge Robot Thu May 14 11:12:00 2020 - Merge pull request #724 from openshift-cherrypick-robot/cherry-pick-629-to-release-4.2

The last merged PR date is on May 14, the date is earlier than the date PR #822 merged in, so need to bump PR '#822' to kube-apiserer-operator component.

Comment 8 Mike Dame 2020-07-13 14:08:00 UTC
@Ke Wang thank you for checking, you are right that this still needs to be bumped in the operator. I opened https://github.com/openshift/cluster-kube-apiserver-operator/pull/900 to do that and am switching this back to assigned so the bug bot will pick it up. Thanks!