Bug 2071941

Summary: cronjob collect-profiles failed leads node reach to OutOfpods status
Product: OpenShift Container Platform Reporter: Per da Silva <pegoncal>
Component: OLMAssignee: Per da Silva <pegoncal>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: achernet, agreene, bzhai, cback, ealcaniz, jiazha, krizza, openshift-bugs-escalate, tflannag
Version: 4.10Flags: cback: needinfo? (pegoncal)
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2055861 Environment:
Last Closed: 2022-04-25 19:51:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 2055861    
Bug Blocks: 2079082    

Comment 5 Jian Zhang 2022-04-14 04:22:41 UTC
1, Create an OCP 4.10 cluster that contains the fixed PR.
mac:~ jianzhang$ oc adm release info registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-04-13-214142 -a .dockerconfigjson --commits|grep olm
  operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         1cb0c9a578ffcc6d471b483ab34b627430677f09
  operator-registry                              https://github.com/openshift/operator-framework-olm                         1cb0c9a578ffcc6d471b483ab34b627430677f09

mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-04-13-214142   True        False         96s     Cluster version is 4.10.0-0.nightly-2022-04-13-214142

2, Cordon the worker nodes that the collect-profiles job pods running on.
mac:~ jianzhang$ oc adm cordon ci-ln-tpy5pqb-72292-8ww7d-worker-a-vp62w  ci-ln-tpy5pqb-72292-8ww7d-worker-b-gdjg9  ci-ln-tpy5pqb-72292-8ww7d-worker-c-fjd7l 
node/ci-ln-tpy5pqb-72292-8ww7d-worker-a-vp62w cordoned
node/ci-ln-tpy5pqb-72292-8ww7d-worker-b-gdjg9 cordoned
node/ci-ln-tpy5pqb-72292-8ww7d-worker-c-fjd7l cordoned
mac:~ jianzhang$ oc get nodes
NAME                                       STATUS                     ROLES    AGE   VERSION
ci-ln-tpy5pqb-72292-8ww7d-master-0         Ready                      master   21m   v1.23.5+9ce5071
ci-ln-tpy5pqb-72292-8ww7d-master-1         Ready                      master   21m   v1.23.5+9ce5071
ci-ln-tpy5pqb-72292-8ww7d-master-2         Ready                      master   21m   v1.23.5+9ce5071
ci-ln-tpy5pqb-72292-8ww7d-worker-a-vp62w   Ready,SchedulingDisabled   worker   11m   v1.23.5+9ce5071
ci-ln-tpy5pqb-72292-8ww7d-worker-b-gdjg9   Ready,SchedulingDisabled   worker   11m   v1.23.5+9ce5071
ci-ln-tpy5pqb-72292-8ww7d-worker-c-fjd7l   Ready,SchedulingDisabled   worker   11m   v1.23.5+9ce5071

3, Check if more collect-profiles pods are generated.
mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager
NAME                                      READY   STATUS      RESTARTS      AGE
catalog-operator-68558fff4b-rf79c         1/1     Running     0             61m
collect-profiles-27498450-wvj57           0/1     Completed   0             51m
collect-profiles-27498495-9qq9n           0/1     Pending     0             6m32s
olm-operator-5c6f5df9f6-dzw4p             1/1     Running     0             61m
..

As above, only one pod is pending after the job run twice. LGTM, verify it.

Comment 9 errata-xmlrpc 2022-04-25 19:51:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1431