test: operator conditions special-resource-operator is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=operator+conditions+special-resource-operator level=error msg=Cluster initialization failed because one or more operators are not functioning properly. level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below, level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation level=fatal msg=failed to initialize the cluster: Cluster operator special-resource-operator is still updating This fails on all platforms https://sippy.ci.openshift.org/testdetails?release=4.7&test=operator+conditions+special-resource-operator 4.7 promotions are completely blocked https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/#4.7.0-0.nightly
I think this is the PR that addresses this issue: https://github.com/openshift/ocp-build-data/pull/786
Attempted to deploy SRO from https://github.com/openshift-psap/special-resource-operator.git on OCP 4.7 entitled cluster with SPECIALRESOURCE=nvidia-gpu make, after deploying Node Feature Discovery Operator from Operator Hub and and adding a new machineset for a new node of g4dn.xlarge instance with GPU resource. The nvidia-gpu-driver-container-rhel8-bltsm is stuck in CrashLoopBackOff and keep restarting, but I can still see "special resource operator" as part of the other cluster operators, version says "0.0.1-snapshot": [root@ip-172-31-45-145 special-resource-operator]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2021-01-22-134922 True False False 7h25m baremetal 4.7.0-0.nightly-2021-01-22-134922 True False False 14h cloud-credential 4.7.0-0.nightly-2021-01-22-134922 True False False 14h cluster-autoscaler 4.7.0-0.nightly-2021-01-22-134922 True False False 14h config-operator 4.7.0-0.nightly-2021-01-22-134922 True False False 14h console 4.7.0-0.nightly-2021-01-22-134922 True False False 14h csi-snapshot-controller 4.7.0-0.nightly-2021-01-22-134922 True False False 14h dns 4.7.0-0.nightly-2021-01-22-134922 True False False 14h etcd 4.7.0-0.nightly-2021-01-22-134922 True False False 14h image-registry 4.7.0-0.nightly-2021-01-22-134922 True False False 14h ingress 4.7.0-0.nightly-2021-01-22-134922 True False False 14h insights 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-scheduler 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-storage-version-migrator 4.7.0-0.nightly-2021-01-22-134922 True False False 7h23m machine-api 4.7.0-0.nightly-2021-01-22-134922 True False False 14h machine-approver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h machine-config 4.7.0-0.nightly-2021-01-22-134922 True False False 14h marketplace 4.7.0-0.nightly-2021-01-22-134922 True False False 14h monitoring 4.7.0-0.nightly-2021-01-22-134922 True False False 14h network 4.7.0-0.nightly-2021-01-22-134922 True False False 14h node-tuning 4.7.0-0.nightly-2021-01-22-134922 True False False 14h openshift-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h openshift-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 14h openshift-samples 4.7.0-0.nightly-2021-01-22-134922 True False False 14h operator-lifecycle-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 14h operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-01-22-134922 True False False 14h operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h service-ca 4.7.0-0.nightly-2021-01-22-134922 True False False 14h special-resource-operator 0.0.1-snapshot True False False 2m59s storage 4.7.0-0.nightly-2021-01-22-134922 True False False 14h [root@ip-172-31-45-145 special-resource-operator]# [root@ip-172-31-45-145 special-resource-operator]# [root@ip-172-31-45-145 special-resource-operator]# oc get pods -n nvidia-gpu NAME READY STATUS RESTARTS AGE nvidia-gpu-driver-build-1-build 0/1 Completed 0 27m nvidia-gpu-driver-container-rhel8-bltsm 0/1 CrashLoopBackOff 6 27m [root@ip-172-31-45-145 special-resource-operator]# [root@ip-172-31-45-145 special-resource-operator]# oc describe co special-resource-operator Name: special-resource-operator Namespace: Labels: <none> Annotations: include.release.openshift.io/ibm-cloud-managed: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2021-01-26T06:10:10Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:include.release.openshift.io/ibm-cloud-managed: f:include.release.openshift.io/self-managed-high-availability: f:include.release.openshift.io/single-node-developer: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: Manager: kubectl-client-side-apply Operation: Update Time: 2021-01-26T06:10:10Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: f:extension: f:versions: Manager: manager Operation: Update Time: 2021-01-26T06:10:23Z Resource Version: 289793 Self Link: /apis/config.openshift.io/v1/clusteroperators/special-resource-operator UID: fc95853f-7937-4e5f-9e5b-a828a54abf1a Spec: Status: Conditions: Last Transition Time: 2021-01-26T06:43:21Z Message: Reconciling nvidia-gpu Reason: Reconciling Status: False Type: Available Last Transition Time: 2021-01-26T06:43:21Z Message: Reconciling nvidia-gpu Reason: Reconciling Status: True Type: Progressing Last Transition Time: 2021-01-26T06:43:21Z Message: Special Resource Operator reconciling special resources Reason: Reconciled Status: False Type: Degraded Extension: <nil> Versions: Name: operator Version: 0.0.1-snapshot Events: <none> [root@ip-172-31-45-145 special-resource-operator]#
As the SRO operator was reconciling, verified that the conditions and statuses of this operator AVAILABLE, PROGRESSING, and SINCE fields were updating when looking at `oc get co` output: # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2021-01-22-134922 True False False 7h48m baremetal 4.7.0-0.nightly-2021-01-22-134922 True False False 23h cloud-credential 4.7.0-0.nightly-2021-01-22-134922 True False False 23h cluster-autoscaler 4.7.0-0.nightly-2021-01-22-134922 True False False 23h config-operator 4.7.0-0.nightly-2021-01-22-134922 True False False 23h console 4.7.0-0.nightly-2021-01-22-134922 True False False 23h csi-snapshot-controller 4.7.0-0.nightly-2021-01-22-134922 True False False 23h dns 4.7.0-0.nightly-2021-01-22-134922 True False False 23h etcd 4.7.0-0.nightly-2021-01-22-134922 True False False 23h image-registry 4.7.0-0.nightly-2021-01-22-134922 True False False 23h ingress 4.7.0-0.nightly-2021-01-22-134922 True False False 23h insights 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-scheduler 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-storage-version-migrator 4.7.0-0.nightly-2021-01-22-134922 True False False 16h machine-api 4.7.0-0.nightly-2021-01-22-134922 True False False 23h machine-approver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h machine-config 4.7.0-0.nightly-2021-01-22-134922 True False False 23h marketplace 4.7.0-0.nightly-2021-01-22-134922 True False False 23h monitoring 4.7.0-0.nightly-2021-01-22-134922 True False False 23h network 4.7.0-0.nightly-2021-01-22-134922 True False False 23h node-tuning 4.7.0-0.nightly-2021-01-22-134922 True False False 23h openshift-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h openshift-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 23h openshift-samples 4.7.0-0.nightly-2021-01-22-134922 True False False 23h operator-lifecycle-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 23h operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-01-22-134922 True False False 23h operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h service-ca 4.7.0-0.nightly-2021-01-22-134922 True False False 23h special-resource-operator 0.0.1-snapshot True False False 20s storage 4.7.0-0.nightly-2021-01-22-134922 True False False 23h # oc get co | grep special special-resource-operator 0.0.1-snapshot True False False 26s # oc get co | grep special special-resource-operator 0.0.1-snapshot True False False 3m18s # oc get co | grep special special-resource-operator 0.0.1-snapshot False True False 31s # oc get pods -n nvidia-gpu NAME READY STATUS RESTARTS AGE nvidia-gpu-driver-build-1-build 0/1 Completed 0 9h nvidia-gpu-driver-container-rhel8-bltsm 0/1 CrashLoopBackOff 83 9h # oc get pods -n driver-container-base NAME READY STATUS RESTARTS AGE driver-container-base 0/1 Completed 0 9h # oc get pods -n openshift-special-resource-operator NAME READY STATUS RESTARTS AGE special-resource-controller-manager-757fc45f5-4w9gh 2/2 Running 0 9h
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633