Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1886059

Summary: Nvidia GPU operator fails to build kernel module on 4.4
Product: OpenShift Container Platform Reporter: Sean Pryor <spryor>
Component: Special Resource OperatorAssignee: Zvonko Kosic <zkosic>
Status: CLOSED DUPLICATE QA Contact: Walid A. <wabouham>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, spryor
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-09 11:05:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sean Pryor 2020-10-07 15:02:50 UTC
Description of problem:
On 4.4.17 and 4.4.26, the Nvidia GPU operator container fails to build the driver due to missing kernel headers/devel packages:

In the logs:
+ dnf -q -y install kernel-headers-4.18.0-193.23.1.el8_2.x86_64 kernel-devel-4.18.0-193.23.1.el8_2.x86_64
Error: Unable to find a match: kernel-headers-4.18.0-193.23.1.el8_2.x86_64 kernel-devel-4.18.0-193.23.1.el8_2.x86_64

The following versions are available:
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Red Hat Enterprise Linux 8 for x86_64 - AppStre  16 MB/s |  19 MB     00:01
Red Hat Enterprise Linux 8 for x86_64 - BaseOS   17 MB/s |  21 MB     00:01
Red Hat Universal Base Image 8 (RPMs) - BaseOS  3.4 MB/s | 769 kB     00:00
Red Hat Universal Base Image 8 (RPMs) - AppStre  15 MB/s | 4.0 MB     00:00
Red Hat Universal Base Image 8 (RPMs) - CodeRea  88 kB/s |  12 kB     00:00
====================== Name Exactly Matched: kernel-devel ======================
kernel-devel-4.18.0-80.1.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.el8.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.4.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.7.1.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.11.1.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.el8.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.11.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-80.7.2.el8_0.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.0.3.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.8.1.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.0.2.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.3.1.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-147.5.1.el8_1.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.el8.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.1.2.el8_2.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.6.3.el8_2.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.13.2.el8_2.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.14.3.el8_2.x86_64 : Development package for building kernel modules to match the kernel
kernel-devel-4.18.0-193.19.1.el8_2.x86_64 : Development package for building kernel modules to match the kernel



Version-Release number of selected component (if applicable):
4.4.17 and 4.4.26, other versions may also be affected

How reproducible:
Always

Steps to Reproduce:
1. Have an OpenShift cluster of one of those versions
2. Install the Nvidia GPU operator
3. Check logs from the crash looping containers

Actual results:
GPU operator containers fail to build the driver, rendering GPUs unusable

Expected results:
GPU operator should build the driver successfully

Additional info:

Comment 1 Sean Pryor 2020-10-07 15:03:29 UTC
Likely a relative of https://bugzilla.redhat.com/show_bug.cgi?id=1862229

Comment 2 Zvonko Kosic 2020-10-09 11:05:39 UTC

*** This bug has been marked as a duplicate of bug 1862229 ***