RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1824956 - Kickstart Install Fails When Trying to Install Packages from NVIDIA Repository
Summary: Kickstart Install Fails When Trying to Install Packages from NVIDIA Repository
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: anaconda
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Anaconda Maintenance Team
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-16 18:03 UTC by Jamie Nguyen (Nvidia)
Modified: 2020-09-30 22:32 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-20 15:23:38 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs from my reproducer (181.34 KB, application/gzip)
2020-04-17 15:43 UTC, Jamie Nguyen (Nvidia)
no flags Details

Description Jamie Nguyen (Nvidia) 2020-04-16 18:03:16 UTC
Description of problem (from IBM):
Our unattended network disk installation of RH81 and CUDA 11.0.0_445.43-1 is failing,.
Power9 ppc64le

We used the local /var/cuda-repo-rhel8-11-0-local repository on the install server from
cuda-repo-rhel8-11-0-local-11.0.0_445.43-1.ppc64le.rpm

Our kickstart file has %package stanza entries:
%packagesired
<list of require rpms>
opencl-filesystem
opencl-headers
@nvidia-driver:latest-dkms
%end

The install recognizes the package is a module but fails with:
File "/usr/lib64/python3.6/site-packages/pyanaconda/threading.py", line 286, in run
threading.Thread.run(self)
pyanaconda.payload.PayloadError: Payload error - DNF installation has ended up abruptly: /bin/sh is needed by cuda-drivers-445.35-1.ppc64leTraceback (most recent call last):
File "/usr/lib64/python3.6/site-packages/pyanaconda/payload/dnfpayload.py", line 283, in do_transaction
base.do_transaction(display=display)
File "/usr/lib/python3.6/site-packages/dnf/base.py", line 842, in do_transaction
raise dnf.exceptions.TransactionCheckError(msg)
dnf.exceptions.TransactionCheckError: /bin/sh is needed by cuda-drivers-445.35-1.ppc64le

We get the same failure when installing with nvidia-driver as an rpm.

After boot, we can install the module once the OS is booted from our setup, so we see we have a good repository.

Our customers requested that the unattended network installation result in the nvidia-driver being loaded and
configured after install is complete. We were able to do this on RH76 with CUDA10.2 and earlier
versions of CUDA and OS’s.
In previous versions, install of the NVIDIA driver required a reboot after installation for the driver to load.

How should we install the nvidia drivers during kickstart install?


Version-Release number of selected component (if applicable):
RedHat 8.1, kernel 4.18.0-147.5.1
cuda-11.0.0-1
Driver 445.43-1


How reproducible:


Steps to Reproduce:
1. Make sure CUDA repository is available.  In this case, the customer was using a repository that includes CUDA11.0+445 
2. Perform a kick start install
3. kickstart file should have a stanza like the following to install NVIDIA packages:

%packagesired
<list of require rpms>
opencl-filesystem
opencl-headers
@nvidia-driver:latest-dkms
%end

Actual results:
The install fails with the following backtrace:

File "/usr/lib64/python3.6/site-packages/pyanaconda/threading.py", line 286, in run
threading.Thread.run(self)
pyanaconda.payload.PayloadError: Payload error - DNF installation has ended up abruptly: /bin/sh is needed by cuda-drivers-445.35-1.ppc64leTraceback (most recent call last):
File "/usr/lib64/python3.6/site-packages/pyanaconda/payload/dnfpayload.py", line 283, in do_transaction
base.do_transaction(display=display)
File "/usr/lib/python3.6/site-packages/dnf/base.py", line 842, in do_transaction
raise dnf.exceptions.TransactionCheckError(msg)
dnf.exceptions.TransactionCheckError: /bin/sh is needed by cuda-drivers-445.35-1.ppc64le


Expected results:
We would expect for the install to succeed.


Additional info:
I'm in the process of uploading cuda-repo-rhel8-11-0-local-11.0.0_445.43-1.ppc64le.rpm.  I'll share a link to this when it is available.

Comment 1 Jamie Nguyen (Nvidia) 2020-04-16 18:15:28 UTC
Here's a link to the repo IBM was using.  Hopefully it helps reproduce this: https://drive.google.com/open?id=1cYdM2Pem3XxNJD3Q1vUpRzrDpBKDF7i_

Comment 2 Jan Stodola 2020-04-16 18:28:22 UTC
Jamie,
unfortunately I'm not able to access the google drive link:

"You need permission"

Could you please attach the installation logs? You can find them as /tmp/*log in the installation environment and you can copy them from the machine using scp or copy them to a USB flash drive.

Comment 3 Jamie Nguyen (Nvidia) 2020-04-16 18:54:53 UTC
Hi Jan,

Can you give it another try?  I just tweaked the sharing settings:
https://drive.google.com/open?id=1cYdM2Pem3XxNJD3Q1vUpRzrDpBKDF7i_

Additionally, a colleague of mine wanted me to add the following comment:
"I'm fairly confident this is the reason it is failing, as kickstart does not have support for this functionality (dnf module install <package>:<stream>).

Additionally, I had found this article:
https://communityblog.fedoraproject.org/fedora-modularity-whats-the-problem/"

Lastly: I will ask our customer for the installation logs.  Thanks for your patience.

Comment 5 Jiri Konecny 2020-04-17 07:51:43 UTC
(In reply to Jamie Nguyen (Nvidia) from comment #3)
> Hi Jan,
> 
> Can you give it another try?  I just tweaked the sharing settings:
> https://drive.google.com/open?id=1cYdM2Pem3XxNJD3Q1vUpRzrDpBKDF7i_
> 
> Additionally, a colleague of mine wanted me to add the following comment:
> "I'm fairly confident this is the reason it is failing, as kickstart does
> not have support for this functionality (dnf module install
> <package>:<stream>).

In the bug description:

> We get the same failure when installing with nvidia-driver as an rpm.

From this I understand that this is happening also when modularity is not involved. Please correct me if I misunderstood something.

Comment 6 Jiri Konecny 2020-04-17 07:59:27 UTC
(In reply to Jan Stodola from comment #4)
> I can now access the google drive file.
> 
> Kickstart supports modules and streams, see:
> "Specifying profiles of module streams" in
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/
> html/performing_an_advanced_rhel_installation/kickstart-script-file-format-
> reference_installing-rhel-as-an-experienced-user#package-selection-
> section_package-selection-in-kickstart
> and
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/
> html/performing_an_advanced_rhel_installation/kickstart-commands-and-options-
> reference_installing-rhel-as-an-experienced-user#module_kickstart-commands-
> for-system-configuration
> 
> Since there was a traceback during the installation, all logs should be
> included in a single file /tmp/anaconda-tb-* - it's enough to attach this
> file.

Please attach also dnf logs which can be found in /tmp/*.log. The ideal would be to attach all the logs from /tmp/.

Comment 7 Timm Bäder 2020-04-17 09:48:00 UTC
Since https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/performing_an_advanced_rhel_installation/kickstart-script-file-format-reference_installing-rhel-as-an-experienced-user lists a few possibilities of installing module stream profiles, I guess Kickstart does support this (search for "specifying profiles of module streams").

This page also says:

> It is also possible to enable module streams using the module Kickstart command and then install packages contained in the module stream by naming them directly.

Does this work as a possible workaround or is this what you meant when you said you have the same problem when installing the driver as an rpm?

Comment 8 Timm Bäder 2020-04-17 11:52:17 UTC
I don't have any experience with Kickstart but as an additional note, the modularity metadata in the linked repository still contains the following problematic line, which is fixed with a later version of the genmodules.py script that I sent to Nvidia:

            - cuda-0:drivers-445.43-1.ppc64le

The format here is wrong. This might be completely unrelated of course and DNF handles this fine in my experience.

Comment 9 Jamie Nguyen (Nvidia) 2020-04-17 14:38:05 UTC
The description comes from a customer (IBM actually) -- so I will ask them for logs and clarification.

As for this quote, "We get the same failure when installing with nvidia-driver as an rpm", I believe this refers to the following stanza in their kickstart file:

%packages
<list of require rpms>
opencl-filesystem
opencl-headers
@nvidia-driver:latest-dkms
%end

vs.

%packages
<list of require rpms>
opencl-filesystem
opencl-headers
nvidia-driver
%end

Reading in between the lines, I think what they're trying to say is that the failure occurs in both cases.  I will need to clarify this point with the customer as well though.

Comment 10 Jamie Nguyen (Nvidia) 2020-04-17 15:43:59 UTC
Created attachment 1679689 [details]
Logs from my reproducer

Good news, I was able to reproduce this using VMs and have attached the logs.

Timm: thanks for the tip on the modularity metadata.  Let me try a version with a fix for this to see if it helps.

Comment 11 Jamie Nguyen (Nvidia) 2020-04-17 16:35:55 UTC
Using a newer repo with the correct modularity metadata, I still run into the same problem:

            - cuda-drivers-0:450.16-1.x86_64

So it doesn't appear to be unrelated.

(In reply to Timm Bäder from comment #8)
> I don't have any experience with Kickstart but as an additional note, the
> modularity metadata in the linked repository still contains the following
> problematic line, which is fixed with a later version of the genmodules.py
> script that I sent to Nvidia:
> 
>             - cuda-0:drivers-445.43-1.ppc64le
> 
> The format here is wrong. This might be completely unrelated of course and
> DNF handles this fine in my experience.

Comment 12 Jamie Nguyen (Nvidia) 2020-04-17 23:57:02 UTC
It seems like the cuda-drivers package is breaking some packaging rules.  It's trying to use /bin/sh in its %pretrans scriptlet which fails in the install environment.  On this topic, the packaging guide [1], says: "Note that the %pretrans scriptlet will, in the particular case of system installation, run before anything at all has been installed. This implies that it cannot have any dependencies at all. For this reason, %pretrans is best avoided, but if used it MUST (by necessity) be written in Lua. See http://rpm.org/user_doc/lua.html for more information.


To test this out further, I created the following package:

  Name:      cuda-drivers
  Version:   450.16
  Release:   1.dirty.el8
  Packager:  NVIDIA CORPORATION <dgx-dev>
  Vendor:    NVIDIA CORPORATION
  License:   NVIDIA
  Summary:   Meta-package to install additional packages

  %description
  Jamie's test package

  %pretrans

  %files

  %changelog

The %pretrans script does nothing -- and even this reproduces the pyanaconda backtrace.  Conversely, when I create this:

  Name:      cuda-drivers
  Version:   450.16
  Release:   1.dirty.el8
  Packager:  NVIDIA CORPORATION <dgx-dev>
  Vendor:    NVIDIA CORPORATION
  License:   NVIDIA
  Summary:   Meta-package to install additional packages

  %description
  Jamie's test package

  %pretrans -p <lua>

  %files

  %changelog

The install proceeds as expected.

[1]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans

Comment 13 Jiri Konecny 2020-04-20 07:51:55 UTC
Thanks a lot for your investigation and debugging. If I understand it correctly that means that your package has to be updated to fix this issue.
Do I understand it correctly? Is there something we can help you with?

Comment 14 Jamie Nguyen (Nvidia) 2020-04-20 15:23:38 UTC
Jiri,

That's correct, we'll have to fix this in our packaging.  I believe there's anything that needs to be investigated from the Red Hat side, so I'll close this.  Thanks!

Comment 15 Jamie Nguyen (Nvidia) 2020-09-30 22:32:32 UTC
Seems like this is a popular bug!

There is a ks profile available now that can be used for kickstart installations.  That is documented here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#kickstart-installation

In short -- when using kickstart, one can specify the following:

  @nvidia-driver:latest-dkms/ks

Hope this update is useful for others.


Note You need to log in before you can comment on or make changes to this bug.