Bug 1862229 - NVIDIA GPU Operator fails to install on Openshift version 4.4.11 in AWS (via operator hub)
Summary: NVIDIA GPU Operator fails to install on Openshift version 4.4.11 in AWS (via ...
Keywords:
Status: CLOSED DUPLICATE of bug 1853726
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Special Resource Operator
Version: 4.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: 4.7.0
Assignee: Zvonko Kosic
QA Contact: Walid A.
URL:
Whiteboard:
: 1867854 1886059 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-30 19:28 UTC by Diane Feddema
Modified: 2020-10-20 11:12 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-20 11:12:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5232901 0 None None None 2020-10-09 13:29:38 UTC

Description Diane Feddema 2020-07-30 19:28:02 UTC
Description of problem:
The installation of NVIDIA GPU Operator fails with Openshift 4.4.11 on AWS when installation is attempted from Operator Hub.  

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Create OCP 4.4.11 cluster on AWS
2.Follow NVIDIA GPU Operator instruction in NVIDIA online documentation
https://docs.nvidia.com/datacenter/kubernetes/openshift-on-gpu-install-guide/index.html
  a) make sure entitlements are working properly
  b) create new project “gpu-operator-resources”
  c) Install the NVIDIA GPU Operator.From the side menu, select Operators > OperatorHub, then search for the NVIDIA GPU Operator.
  d) click install 
  
3. view log oc logs -f nvidia-driver-daemonset-XXXXX
"Error: Unable to find a match: kernel-headers-4.18.0-147.20.1.el8_1.x86_64 kernel-devel-4.18.0-147.20.1.el8_1.x86_64"

Actual results:

$ oc logs -f nvidia-driver-daemonset-vrx5c
+ set -eu
+ RUN_DIR=/run/nvidia
+ PID_FILE=/run/nvidia/nvidia-driver.pid
+ DRIVER_VERSION=440.64.00
+ KERNEL_UPDATE_HOOK=/run/kernel/postinst.d/update-nvidia-driver
+ '[' 1 -eq 0 ']'
+ command=init
+ shift
+ case "${command}" in
++ getopt -l accept-license -o a --
+ options=' --'
+ '[' 0 -ne 0 ']'
+ eval set -- ' --'
++ set -- --
+ ACCEPT_LICENSE=
++ uname -r
+ KERNEL_VERSION=4.18.0-147.20.1.el8_1.x86_64
+ PRIVATE_KEY=
+ PACKAGE_TAG=
+ for opt in ${options}
+ case "$opt" in
+ shift
+ break
+ '[' 0 -ne 0 ']'
+ init

========== NVIDIA Software Installer ==========

+ echo -e '\n========== NVIDIA Software Installer ==========\n'
Starting installation of NVIDIA driver version 440.64.00 for Linux kernel version 4.18.0-147.20.1.el8_1.x86_64

+ echo -e 'Starting installation of NVIDIA driver version 440.64.00 for Linux kernel version 4.18.0-147.20.1.el8_1.x86_64\n'
+ exec
+ flock -n 3
+ echo 138704
+ trap 'echo '\''Caught signal'\''; exit 1' HUP INT QUIT PIPE TERM
+ trap _shutdown EXIT
+ _unload_driver
+ rmmod_args=()
+ local rmmod_args
+ local nvidia_deps=0
+ local nvidia_refs=0
+ local nvidia_uvm_refs=0
+ local nvidia_modeset_refs=0
+ echo 'Stopping NVIDIA persistence daemon...'
Stopping NVIDIA persistence daemon...
+ '[' -f /var/run/nvidia-persistenced/nvidia-persistenced.pid ']'
Unloading NVIDIA driver kernel modules...
+ echo 'Unloading NVIDIA driver kernel modules...'
+ '[' -f /sys/module/nvidia_modeset/refcnt ']'
+ '[' -f /sys/module/nvidia_uvm/refcnt ']'
+ '[' -f /sys/module/nvidia/refcnt ']'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ return 0
+ _unmount_rootfs
Unmounting NVIDIA driver rootfs...
+ echo 'Unmounting NVIDIA driver rootfs...'
+ findmnt -r -o TARGET
+ grep /run/nvidia/driver
+ _kernel_requires_package
+ local proc_mount_arg=
Checking NVIDIA driver packages...
+ echo 'Checking NVIDIA driver packages...'
+ [[ ! -d /usr/src/nvidia-440.64.00/kernel ]]
+ cd /usr/src/nvidia-440.64.00/kernel
+ proc_mount_arg='--proc-mount-point /lib/modules/4.18.0-147.20.1.el8_1.x86_64/proc'
++ ls -d -1 'precompiled/**'
+ return 0
+ _update_package_cache
Updating the package cache...
+ '[' '' '!=' builtin ']'
+ echo 'Updating the package cache...'
+ yum -q makecache
+ _install_prerequisites
++ mktemp -d
+ local tmp_dir=/tmp/tmp.u1TOrlvdnA
+ trap 'rm -rf /tmp/tmp.u1TOrlvdnA' EXIT
+ cd /tmp/tmp.u1TOrlvdnA
+ dnf install -q -y elfutils-libelf.x86_64 elfutils-libelf-devel.x86_64
+ rm -rf /lib/modules/4.18.0-147.20.1.el8_1.x86_64
+ mkdir -p /lib/modules/4.18.0-147.20.1.el8_1.x86_64/proc
Installing Linux kernel headers...
+ echo 'Installing Linux kernel headers...'
+ dnf -q -y install kernel-headers-4.18.0-147.20.1.el8_1.x86_64 kernel-devel-4.18.0-147.20.1.el8_1.x86_64
Error: Unable to find a match: kernel-headers-4.18.0-147.20.1.el8_1.x86_64 kernel-devel-4.18.0-147.20.1.el8_1.x86_64
++ rm -rf /tmp/tmp.u1TOrlvdnA
+ _shutdown
+ _unload_driver
+ rmmod_args=()
+ local rmmod_args
Stopping NVIDIA persistence daemon...
+ local nvidia_deps=0
+ local nvidia_refs=0
+ local nvidia_uvm_refs=0
+ local nvidia_modeset_refs=0
+ echo 'Stopping NVIDIA persistence daemon...'
+ '[' -f /var/run/nvidia-persistenced/nvidia-persistenced.pid ']'
Unloading NVIDIA driver kernel modules...
+ echo 'Unloading NVIDIA driver kernel modules...'
+ '[' -f /sys/module/nvidia_modeset/refcnt ']'
+ '[' -f /sys/module/nvidia_uvm/refcnt ']'
+ '[' -f /sys/module/nvidia/refcnt ']'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ return 0
+ _unmount_rootfs
+ echo 'Unmounting NVIDIA driver rootfs...'
Unmounting NVIDIA driver rootfs...
+ findmnt -r -o TARGET
+ grep /run/nvidia/driver
+ rm -f /run/nvidia/nvidia-driver.pid /run/kernel/postinst.d/update-nvidia-driver
+ return 0

Expected results:
Expected NVIDIA GPU Operator to install when you click "install" in Operator Hub


Additional info:
$ oc logs -f cluster-entitled-build-pod
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Red Hat Enterprise Linux 8 for x86_64 - BaseOS   34 MB/s |  20 MB     00:00
Red Hat Enterprise Linux 8 for x86_64 - AppStre  17 MB/s |  19 MB     00:01
Red Hat Universal Base Image 8 (RPMs) - BaseOS  4.7 MB/s | 767 kB     00:00
Red Hat Universal Base Image 8 (RPMs) - AppStre  22 MB/s | 3.9 MB     00:00
Red Hat Universal Base Image 8 (RPMs) - CodeRea  56 kB/s |  11 kB     00:00
====================== Name Exactly Matched: kernel-devel ======================
kernel-devel-4.18.0-80.1.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.el8.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.4.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.7.1.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.11.1.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.el8.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.11.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.7.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.0.3.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.8.1.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.0.2.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.3.1.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.5.1.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.el8.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.1.2.el8_2.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.6.3.el8_2.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.13.2.el8_2.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.14.3.el8_2.x86_64 : Development package for building kernel modules to match the kernel
Kyles-Mac-mini:infra-nodes kylebader$

Comment 2 Carlos Eduardo Arango Gutierrez 2020-09-10 01:43:06 UTC
*** Bug 1867854 has been marked as a duplicate of this bug. ***

Comment 3 Zvonko Kosic 2020-10-09 11:05:39 UTC
*** Bug 1886059 has been marked as a duplicate of this bug. ***

Comment 7 Zvonko Kosic 2020-10-20 11:12:08 UTC

*** This bug has been marked as a duplicate of bug 1853726 ***


Note You need to log in before you can comment on or make changes to this bug.