Bug 2079082

Summary: cronjob collect-profiles failed leads node reach to OutOfpods status
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: OLMAssignee: Per da Silva <pegoncal>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: achernet, agreene, bzhai, bzvonar, cback, ealcaniz, jaimelm, jiazha, krizza, openshift-bugs-escalate, tflannag
Version: 4.10   
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-12 20:40:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2071941    
Bug Blocks:    

Comment 4 Jian Zhang 2022-04-29 02:29:47 UTC
1, Create an OCP 4.9 cluster that contains the fixed PR.
mac:~ jianzhang$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2022-04-28-234507|grep olm
W0429 09:22:30.786095   35573 helpers.go:151] Defaulting of registry auth file to "${HOME}/.docker/config.json" is deprecated. The default will be switched to podman config locations in the future version.
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         b1110c6d14e549babb2bd7d6dea4ff6402c617a2
  operator-registry                              https://github.com/openshift/operator-framework-olm                         b1110c6d14e549babb2bd7d6dea4ff6402c617a2
mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2022-04-28-234507   True        False         2m16s   Cluster version is 4.9.0-0.nightly-2022-04-28-234507

2, Cordon the worker nodes that the collect-profiles job pods running on.
mac:~ jianzhang$ oc adm cordon ci-ln-6lxd7kk-72292-sb7rh-worker-a-mns47 ci-ln-6lxd7kk-72292-sb7rh-worker-b-mvvtr ci-ln-6lxd7kk-72292-sb7rh-worker-c-w89fk
node/ci-ln-6lxd7kk-72292-sb7rh-worker-a-mns47 cordoned
node/ci-ln-6lxd7kk-72292-sb7rh-worker-b-mvvtr cordoned
node/ci-ln-6lxd7kk-72292-sb7rh-worker-c-w89fk cordoned
mac:~ jianzhang$ oc get nodes
NAME                                       STATUS                     ROLES    AGE   VERSION
ci-ln-6lxd7kk-72292-sb7rh-master-0         Ready                      master   18m   v1.22.8+c02bd9d
ci-ln-6lxd7kk-72292-sb7rh-master-1         Ready                      master   18m   v1.22.8+c02bd9d
ci-ln-6lxd7kk-72292-sb7rh-master-2         Ready                      master   18m   v1.22.8+c02bd9d
ci-ln-6lxd7kk-72292-sb7rh-worker-a-mns47   Ready,SchedulingDisabled   worker   10m   v1.22.8+c02bd9d
ci-ln-6lxd7kk-72292-sb7rh-worker-b-mvvtr   Ready,SchedulingDisabled   worker   10m   v1.22.8+c02bd9d
ci-ln-6lxd7kk-72292-sb7rh-worker-c-w89fk   Ready,SchedulingDisabled   worker   10m   v1.22.8+c02bd9d

3, Check if more collect-profiles pods are generated.
mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-9666ccff-grxwt          1/1     Running     0          57m
collect-profiles-27519945--1-zmrhj       0/1     Completed   0          43m
collect-profiles-27519975--1-w92lc       0/1     Pending     0          13m
...
As above, only one pod is pending after the job run twice. LGTM, verify it.

Comment 6 Per da Silva 2022-05-10 12:12:22 UTC
Hey, this has been merged. Should it move to POST?

Comment 8 errata-xmlrpc 2022-05-12 20:40:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.32 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1694