Description of problem: When deploying 4.2.16 on the mini clusters some of the revision-pruner containers are OOMKilled during installation. They do restart successfully. Version-Release number of selected component (if applicable): 4.2.18/4.2.16 How reproducible: Everytime I've done an install I've had at least 1 or 2. Steps to Reproduce: 1. Run through the regular installation approach. 2. oc get pods --all-namespaces=true | grep -i oom Actual results: # oc get pods --all-namespaces=true | grep -i oom openshift-kube-apiserver revision-pruner-3-master-2.test.example.com 0/1 OOMKilled 0 Expected results: No pods are OOMKilled. Additional info: The logs from the failed pod are as follows: # oc logs -f revision-pruner-3-master-2.test.example.com -n openshift-kube-apiserver I0206 11:12:58.379630 1 cmd.go:38] &{<nil> true {false} prune true map[max-eligible-revision:0xc0006f5d60 protected-revisions:0xc0006f5e00 resource-dir:0xc0006f5ea0 static-pod-name:0xc0006f5f40 v:0xc0006f45a0] [0xc0006f45a0 0xc0006f5d60 0xc0006f5e00 0xc0006f5ea0 0xc0006f5f40] [] map[alsologtostderr:0xc0006f4000 help:0xc000704320 log-backtrace-at:0xc0006f40a0 log-dir:0xc0006f4140 log-file:0xc0006f41e0 log-file-max-size:0xc0006f4280 log-flush-frequency:0xc0000de280 logtostderr:0xc0006f4320 max-eligible-revision:0xc0006f5d60 protected-revisions:0xc0006f5e00 resource-dir:0xc0006f5ea0 skip-headers:0xc0006f43c0 skip-log-headers:0xc0006f4460 static-pod-name:0xc0006f5f40 stderrthreshold:0xc0006f4500 v:0xc0006f45a0 vmodule:0xc0006f4640] [0xc0006f5d60 0xc0006f5e00 0xc0006f5ea0 0xc0006f5f40 0xc0006f4000 0xc0006f40a0 0xc0006f4140 0xc0006f41e0 0xc0006f4280 0xc0000de280 0xc0006f4320 0xc0006f43c0 0xc0006f4460 0xc0006f4500 0xc0006f45a0 0xc0006f4640 0xc000704320] [0xc0006f4000 0xc000704320 0xc0006f40a0 0xc0006f4140 0xc0006f41e0 0xc0006f4280 0xc0000de280 0xc0006f4320 0xc0006f5d60 0xc0006f5e00 0xc0006f5ea0 0xc0006f43c0 0xc0006f4460 0xc0006f5f40 0xc0006f4500 0xc0006f45a0 0xc0006f4640] map[104:0xc000704320 118:0xc0006f45a0] [] -1 0 0xc0006f2cc0 true <nil> []} I0206 11:12:58.379942 1 cmd.go:39] (*prune.PruneOptions)(0xc000097100)({ MaxEligibleRevision: (int) 3, ProtectedRevisions: ([]int) (len=3 cap=3) { (int) 1, (int) 2, (int) 3 }, ResourceDir: (string) (len=36) "/etc/kubernetes/static-pod-resources", StaticPodName: (string) (len=18) "kube-apiserver-pod" })
The installations still succeed, so setting this as low prio.
Assigning to kube-apiserver for further triage. Don't believe this is arch specific.
*** Bug 1848584 has been marked as a duplicate of this bug. ***
- Checked if the bug related PR already was merged in 4.2 branch. $ cd library-go $ git checkout -b 4.2 remotes/origin/release-4.2 Updating files: 100% (13716/13716), done. Branch '4.2' set up to track remote branch 'release-4.2' from 'origin'. Switched to a new branch '4.2' $ git pull $ git log -1 commit c27670c64634202911c0f9b3f946bf4658d27d71 (HEAD -> 4.2, origin/release-4.2) Merge: d58edcb9 2cfb6b95 Author: OpenShift Merge Robot <openshift-merge-robot.github.com> Date: Tue Jun 30 21:06:53 2020 -0400 Merge pull request #822 from damemi/4.2-backport-pruner-requests The bug related PR has been merged in on Jun 30. - Check if the PR #822 has been bumped to component kube-apiserver-operator of latest payload, $ cd cluster-kube-apiserver-operator/ $ git pull $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2020-07-12-134436 | grep kube-apiserver cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator 925d5210ffb332e4da75459df7a66380a135bc3a $ git log --date local --pretty="%h %an %cd - %s" 925d521 | grep '#822' $ git log --date local --pretty="%h %an %cd - %s" 925d521 -1 925d5210 OpenShift Merge Robot Thu May 14 11:12:00 2020 - Merge pull request #724 from openshift-cherrypick-robot/cherry-pick-629-to-release-4.2 The last merged PR date is on May 14, the date is earlier than the date PR #822 merged in, so need to bump PR '#822' to kube-apiserer-operator component.
@Ke Wang thank you for checking, you are right that this still needs to be bumped in the operator. I opened https://github.com/openshift/cluster-kube-apiserver-operator/pull/900 to do that and am switching this back to assigned so the bug bot will pick it up. Thanks!