Bug 1905133
| Summary: | operator conditions special-resource-operator | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Seth Jennings <sjenning> |
| Component: | Special Resource Operator | Assignee: | Zvonko Kosic <zkosic> |
| Status: | CLOSED ERRATA | QA Contact: | Walid A. <wabouham> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | CC: | aos-bugs, jluhrsen, lszaszki, wking |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
operator conditions special-resource-operator
|
|
| Last Closed: | 2021-02-24 15:40:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Seth Jennings
2020-12-07 15:58:01 UTC
I think this is the PR that addresses this issue: https://github.com/openshift/ocp-build-data/pull/786 Attempted to deploy SRO from https://github.com/openshift-psap/special-resource-operator.git on OCP 4.7 entitled cluster with SPECIALRESOURCE=nvidia-gpu make, after deploying Node Feature Discovery Operator from Operator Hub and and adding a new machineset for a new node of g4dn.xlarge instance with GPU resource. The nvidia-gpu-driver-container-rhel8-bltsm is stuck in CrashLoopBackOff and keep restarting, but I can still see "special resource operator" as part of the other cluster operators, version says "0.0.1-snapshot": [root@ip-172-31-45-145 special-resource-operator]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2021-01-22-134922 True False False 7h25m baremetal 4.7.0-0.nightly-2021-01-22-134922 True False False 14h cloud-credential 4.7.0-0.nightly-2021-01-22-134922 True False False 14h cluster-autoscaler 4.7.0-0.nightly-2021-01-22-134922 True False False 14h config-operator 4.7.0-0.nightly-2021-01-22-134922 True False False 14h console 4.7.0-0.nightly-2021-01-22-134922 True False False 14h csi-snapshot-controller 4.7.0-0.nightly-2021-01-22-134922 True False False 14h dns 4.7.0-0.nightly-2021-01-22-134922 True False False 14h etcd 4.7.0-0.nightly-2021-01-22-134922 True False False 14h image-registry 4.7.0-0.nightly-2021-01-22-134922 True False False 14h ingress 4.7.0-0.nightly-2021-01-22-134922 True False False 14h insights 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-scheduler 4.7.0-0.nightly-2021-01-22-134922 True False False 14h kube-storage-version-migrator 4.7.0-0.nightly-2021-01-22-134922 True False False 7h23m machine-api 4.7.0-0.nightly-2021-01-22-134922 True False False 14h machine-approver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h machine-config 4.7.0-0.nightly-2021-01-22-134922 True False False 14h marketplace 4.7.0-0.nightly-2021-01-22-134922 True False False 14h monitoring 4.7.0-0.nightly-2021-01-22-134922 True False False 14h network 4.7.0-0.nightly-2021-01-22-134922 True False False 14h node-tuning 4.7.0-0.nightly-2021-01-22-134922 True False False 14h openshift-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h openshift-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 14h openshift-samples 4.7.0-0.nightly-2021-01-22-134922 True False False 14h operator-lifecycle-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 14h operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-01-22-134922 True False False 14h operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-01-22-134922 True False False 14h service-ca 4.7.0-0.nightly-2021-01-22-134922 True False False 14h special-resource-operator 0.0.1-snapshot True False False 2m59s storage 4.7.0-0.nightly-2021-01-22-134922 True False False 14h [root@ip-172-31-45-145 special-resource-operator]# [root@ip-172-31-45-145 special-resource-operator]# [root@ip-172-31-45-145 special-resource-operator]# oc get pods -n nvidia-gpu NAME READY STATUS RESTARTS AGE nvidia-gpu-driver-build-1-build 0/1 Completed 0 27m nvidia-gpu-driver-container-rhel8-bltsm 0/1 CrashLoopBackOff 6 27m [root@ip-172-31-45-145 special-resource-operator]# [root@ip-172-31-45-145 special-resource-operator]# oc describe co special-resource-operator Name: special-resource-operator Namespace: Labels: <none> Annotations: include.release.openshift.io/ibm-cloud-managed: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2021-01-26T06:10:10Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:include.release.openshift.io/ibm-cloud-managed: f:include.release.openshift.io/self-managed-high-availability: f:include.release.openshift.io/single-node-developer: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: Manager: kubectl-client-side-apply Operation: Update Time: 2021-01-26T06:10:10Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: f:extension: f:versions: Manager: manager Operation: Update Time: 2021-01-26T06:10:23Z Resource Version: 289793 Self Link: /apis/config.openshift.io/v1/clusteroperators/special-resource-operator UID: fc95853f-7937-4e5f-9e5b-a828a54abf1a Spec: Status: Conditions: Last Transition Time: 2021-01-26T06:43:21Z Message: Reconciling nvidia-gpu Reason: Reconciling Status: False Type: Available Last Transition Time: 2021-01-26T06:43:21Z Message: Reconciling nvidia-gpu Reason: Reconciling Status: True Type: Progressing Last Transition Time: 2021-01-26T06:43:21Z Message: Special Resource Operator reconciling special resources Reason: Reconciled Status: False Type: Degraded Extension: <nil> Versions: Name: operator Version: 0.0.1-snapshot Events: <none> [root@ip-172-31-45-145 special-resource-operator]# As the SRO operator was reconciling, verified that the conditions and statuses of this operator AVAILABLE, PROGRESSING, and SINCE fields were updating when looking at `oc get co` output: # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2021-01-22-134922 True False False 7h48m baremetal 4.7.0-0.nightly-2021-01-22-134922 True False False 23h cloud-credential 4.7.0-0.nightly-2021-01-22-134922 True False False 23h cluster-autoscaler 4.7.0-0.nightly-2021-01-22-134922 True False False 23h config-operator 4.7.0-0.nightly-2021-01-22-134922 True False False 23h console 4.7.0-0.nightly-2021-01-22-134922 True False False 23h csi-snapshot-controller 4.7.0-0.nightly-2021-01-22-134922 True False False 23h dns 4.7.0-0.nightly-2021-01-22-134922 True False False 23h etcd 4.7.0-0.nightly-2021-01-22-134922 True False False 23h image-registry 4.7.0-0.nightly-2021-01-22-134922 True False False 23h ingress 4.7.0-0.nightly-2021-01-22-134922 True False False 23h insights 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-scheduler 4.7.0-0.nightly-2021-01-22-134922 True False False 23h kube-storage-version-migrator 4.7.0-0.nightly-2021-01-22-134922 True False False 16h machine-api 4.7.0-0.nightly-2021-01-22-134922 True False False 23h machine-approver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h machine-config 4.7.0-0.nightly-2021-01-22-134922 True False False 23h marketplace 4.7.0-0.nightly-2021-01-22-134922 True False False 23h monitoring 4.7.0-0.nightly-2021-01-22-134922 True False False 23h network 4.7.0-0.nightly-2021-01-22-134922 True False False 23h node-tuning 4.7.0-0.nightly-2021-01-22-134922 True False False 23h openshift-apiserver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h openshift-controller-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 23h openshift-samples 4.7.0-0.nightly-2021-01-22-134922 True False False 23h operator-lifecycle-manager 4.7.0-0.nightly-2021-01-22-134922 True False False 23h operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-01-22-134922 True False False 23h operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-01-22-134922 True False False 23h service-ca 4.7.0-0.nightly-2021-01-22-134922 True False False 23h special-resource-operator 0.0.1-snapshot True False False 20s storage 4.7.0-0.nightly-2021-01-22-134922 True False False 23h # oc get co | grep special special-resource-operator 0.0.1-snapshot True False False 26s # oc get co | grep special special-resource-operator 0.0.1-snapshot True False False 3m18s # oc get co | grep special special-resource-operator 0.0.1-snapshot False True False 31s # oc get pods -n nvidia-gpu NAME READY STATUS RESTARTS AGE nvidia-gpu-driver-build-1-build 0/1 Completed 0 9h nvidia-gpu-driver-container-rhel8-bltsm 0/1 CrashLoopBackOff 83 9h # oc get pods -n driver-container-base NAME READY STATUS RESTARTS AGE driver-container-base 0/1 Completed 0 9h # oc get pods -n openshift-special-resource-operator NAME READY STATUS RESTARTS AGE special-resource-controller-manager-757fc45f5-4w9gh 2/2 Running 0 9h Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |