Bug 1883363
Summary: | OCP 4.6: Failed to deploy Special Resource Operator (SRO) successfully from OperatorHub on entitled cluster | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Walid A. <wabouham> |
Component: | Special Resource Operator | Assignee: | Brett Thurber <bthurber> |
Status: | CLOSED ERRATA | QA Contact: | Walid A. <wabouham> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | aos-bugs, carangog, dagray, mifiedle |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: |
N/A
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-13 07:30:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Walid A.
2020-09-28 22:57:47 UTC
Current workaround is to deploy from source using the branch `sro-list` This is still not creating the nvidia-gpu-device plugin and driver containers when deployed form OperatorHub and creating the nvidia-gpu special resource from console. This was tested on OCP 4.6.5: # oc get pods -n nvidia-gpu NAME READY STATUS RESTARTS AGE nfd-operator-67876f6f5-6msc8 1/1 Running 0 11m special-resource-operator-5fd68b4586-hhg2w 1/1 Running 0 11m [root@ip-172-31-17-233 ~]# [root@ip-172-31-17-233 ~]# [root@ip-172-31-17-233 ~]# oc logs -n nvidia-gpu special-resource-operator-5fd68b4586-hhg2w {"level":"info","ts":1606341000.667781,"logger":"cmd","msg":"Go Version: go1.13.8"} {"level":"info","ts":1606341000.6678047,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1606341000.6678097,"logger":"cmd","msg":"Version of operator-sdk: v0.10.0"} {"level":"info","ts":1606341000.6682265,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1606341000.7991717,"logger":"leader","msg":"No pre-existing lock was found."} {"level":"info","ts":1606341000.8066115,"logger":"leader","msg":"Became the leader."} {"level":"info","ts":1606341000.9134064,"logger":"cmd","msg":"Registering Components."} {"level":"info","ts":1606341000.9153435,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.915477,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.9155464,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.9156294,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.9157097,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.915772,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.915836,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.9159052,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.9159617,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.9160302,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.916095,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341000.9161553,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"specialresource","source":"kind source: /, Kind="} {"level":"info","ts":1606341001.0426922,"logger":"cmd","msg":"Could not create metrics Service","error":"failed to create or get service for metrics: services \"special-resource-operator-metrics\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"} {"level":"info","ts":1606341001.0624459,"logger":"cmd","msg":"Starting the Cmd."} {"level":"info","ts":1606341001.1626635,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"specialresource"} {"level":"info","ts":1606341001.262836,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"specialresource","worker count":1} {"level":"info","ts":1606341106.9086497,"logger":"specialresource","msg":"Reconciling SpecialResource","Namespace":"nvidia-gpu","Name":"nvidia-gpu"} {"level":"info","ts":1606341106.9086807,"logger":"specialresource","msg":"Looking for Hardware Configuration ConfigMaps with label specialresource.openshift.io/config: true"} @wabouham the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1919581 should also work on 4.6. Please verify. Verified on 4.6.0-0.nightly-2021-04-17-182039 that we can deploy SRO from OperatorHub and create a the sinple-kmod special resource successfully. NFD operator was deployed before deploying SRO. # oc debug node/<worker_node> . . sh-4.4# chroot /host sh-4.4# lsmod | grep kmod simple_procfs_kmod 16384 0 simple_kmod 16384 0 sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... # oc get pods -n driver-container-base NAME READY STATUS RESTARTS AGE driver-container-base-ee060ca2c5056b7 0/1 Completed 0 20m # oc get pods -n simple-kmod NAME READY STATUS RESTARTS AGE simple-kmod-driver-build-ee060ca2c5056b7-1-build 0/1 Completed 0 16m simple-kmod-driver-container-ee060ca2c5056b7-74fhf 1/1 Running 0 16m simple-kmod-driver-container-ee060ca2c5056b7-g4rvp 1/1 Running 0 16m simple-kmod-driver-container-ee060ca2c5056b7-k54w2 1/1 Running 0 16m [root@ip-172-31-45-145 special-resource-operator]# [root@ip-172-31-45-145 special-resource-operator]# oc get pods -n openshift-operators NAME READY STATUS RESTARTS AGE nfd-master-9nc78 1/1 Running 0 101m nfd-master-fvndg 1/1 Running 0 101m nfd-master-ws94z 1/1 Running 0 101m nfd-operator-576d77d47f-rkhjd 1/1 Running 0 103m nfd-worker-bjcgw 1/1 Running 0 101m nfd-worker-hqfmn 1/1 Running 0 101m nfd-worker-j4thh 1/1 Running 0 101m special-resource-controller-manager-765fbc7f54-mbhsw 2/2 Running 0 21m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.47 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3737 |