Bug 1835446

Summary: Special resource operator gpu-driver-container pod error related to elfutils-libelf-devel
Product: OpenShift Container Platform Reporter: Paige Rubendall <prubenda>
Component: Special Resource OperatorAssignee: Zvonko Kosic <zkosic>
Status: CLOSED NOTABUG QA Contact: Walid A. <wabouham>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, ematysek, mifiedle
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-26 18:29:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
This is the output of oc logs for the nvidia gpu driver container none

Description Paige Rubendall 2020-05-13 19:34:53 UTC
Created attachment 1688185 [details]
This is the output of oc logs for the nvidia gpu driver container

Description of problem:
Gpu driver container pod hits error, looks like it is unable to find elfutils-libelf-devel.x86_64 package

Version-Release number of selected component (if applicable): 4.5

How reproducible: 100%

Steps to Reproduce:
1. Deploy ipi cluster on RHCOS
2. Deploy SRO from github master

Actual results:

  Type     Reason          Age                    From                                                 Message
  ----     ------          ----                   ----                                                 -------
  Normal   Scheduled       7m1s                   default-scheduler                                    Successfully assigned nvidia-gpu/nvidia-gpu-driver-container-rhel8-tzbr9 to ip-10-0-148-114.us-east-2.compute.internal
  Normal   AddedInterface  7m                     multus                                               Add eth0 []
  Normal   Started         6m8s (x4 over 6m59s)   kubelet, ip-10-0-148-114.us-east-2.compute.internal  Started container nvidia-gpu-driver-container-rhel8
  Normal   Pulling         5m18s (x5 over 6m59s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Pulling image "image-registry.openshift-image-registry.svc:5000/nvidia-gpu/nvidia-gpu-driver-container:v4.18.0-147.8.1.el8_1.x86_64"
  Normal   Pulled          5m18s (x5 over 6m59s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Successfully pulled image "image-registry.openshift-image-registry.svc:5000/nvidia-gpu/nvidia-gpu-driver-container:v4.18.0-147.8.1.el8_1.x86_64"
  Normal   Created         5m18s (x5 over 6m59s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Created container nvidia-gpu-driver-container-rhel8
  Warning  BackOff         111s (x23 over 6m54s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Back-off restarting failed container

$ oc get pods
NAME                                         READY   STATUS             RESTARTS   AGE
nvidia-gpu-driver-build-1-build              0/1     Completed          0          14m
nvidia-gpu-driver-container-rhel8-tzbr9      0/1     CrashLoopBackOff   6          6m43s
special-resource-operator-76b658c584-lxzwr   1/1     Running            0          14m

Expected results:
Container running successfully

Additional info:

Using $oc logs nvidia-gpu-driver-container-rhel8-tzbr9 I can see the following error message. 
"Error: Unable to find a match: elfutils-libelf-devel.x86_64"

Comment 1 Zvonko Kosic 2020-05-26 18:29:30 UTC
The cluster is not entitled, please entitle the cluster and try again. 
This is not a bug.