Bug 1835446 - Special resource operator gpu-driver-container pod error related to elfutils-libelf-devel
Summary: Special resource operator gpu-driver-container pod error related to elfutils-...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Special Resource Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Zvonko Kosic
QA Contact: Walid A.
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-13 19:34 UTC by Paige Rubendall
Modified: 2020-05-26 18:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-26 18:29:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
This is the output of oc logs for the nvidia gpu driver container (3.30 KB, text/plain)
2020-05-13 19:34 UTC, Paige Rubendall
no flags Details

Description Paige Rubendall 2020-05-13 19:34:53 UTC
Created attachment 1688185 [details]
This is the output of oc logs for the nvidia gpu driver container

Description of problem:
Gpu driver container pod hits error, looks like it is unable to find elfutils-libelf-devel.x86_64 package

Version-Release number of selected component (if applicable): 4.5


How reproducible: 100%


Steps to Reproduce:
1. Deploy ipi cluster on RHCOS
2. Deploy SRO from github master


Actual results:

Events:
  Type     Reason          Age                    From                                                 Message
  ----     ------          ----                   ----                                                 -------
  Normal   Scheduled       7m1s                   default-scheduler                                    Successfully assigned nvidia-gpu/nvidia-gpu-driver-container-rhel8-tzbr9 to ip-10-0-148-114.us-east-2.compute.internal
  Normal   AddedInterface  7m                     multus                                               Add eth0 [10.129.4.22/23]
  Normal   Started         6m8s (x4 over 6m59s)   kubelet, ip-10-0-148-114.us-east-2.compute.internal  Started container nvidia-gpu-driver-container-rhel8
  Normal   Pulling         5m18s (x5 over 6m59s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Pulling image "image-registry.openshift-image-registry.svc:5000/nvidia-gpu/nvidia-gpu-driver-container:v4.18.0-147.8.1.el8_1.x86_64"
  Normal   Pulled          5m18s (x5 over 6m59s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Successfully pulled image "image-registry.openshift-image-registry.svc:5000/nvidia-gpu/nvidia-gpu-driver-container:v4.18.0-147.8.1.el8_1.x86_64"
  Normal   Created         5m18s (x5 over 6m59s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Created container nvidia-gpu-driver-container-rhel8
  Warning  BackOff         111s (x23 over 6m54s)  kubelet, ip-10-0-148-114.us-east-2.compute.internal  Back-off restarting failed container


$ oc get pods
NAME                                         READY   STATUS             RESTARTS   AGE
nvidia-gpu-driver-build-1-build              0/1     Completed          0          14m
nvidia-gpu-driver-container-rhel8-tzbr9      0/1     CrashLoopBackOff   6          6m43s
special-resource-operator-76b658c584-lxzwr   1/1     Running            0          14m


Expected results:
Container running successfully


Additional info:

Using $oc logs nvidia-gpu-driver-container-rhel8-tzbr9 I can see the following error message. 
"Error: Unable to find a match: elfutils-libelf-devel.x86_64"

Comment 1 Zvonko Kosic 2020-05-26 18:29:30 UTC
The cluster is not entitled, please entitle the cluster and try again. 
This is not a bug.


Note You need to log in before you can comment on or make changes to this bug.