Description of problem: Upgrade operator pod is in crashloopback after deployed with OOMKilled error. The memory limit defined in manager container is too low (30M): https://github.com/openshift-kni/cluster-group-upgrades-operator/blob/main/config/manager/manager.yaml#L51 Version-Release number of selected component (if applicable): 4.10 How reproducible: 100% in my env Steps to Reproduce: 1.clone upgrade operator repo, and follow instructions to build and deploy it: https://github.com/openshift-kni/cluster-group-upgrades-operator#how-to-deploy 2. check deployed resources Actual results: pod in crashloopback due to OOMKilled terminated: containerID: cri-o://83ef8df7d68111459978511ce7f2b02d669ec50b5baff98ef6435655700cf77e exitCode: 137 finishedAt: "2022-01-06T19:08:04Z" reason: OOMKilled startedAt: "2022-01-06T19:07:40Z" name: manager ready: false restartCount: 9 started: false Expected results: deployment succeeded Additional info: workaround: Bump up memory limit via "oc edit deployments.apps -n openshift-cluster-group-upgrades" Then delete old pods and replicas.
Verified with latest TALO. TALO pod now has only resource request without limit. resources: requests: cpu: 100m memory: 20Mi
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056