Bug 1824956
Summary: | Kickstart Install Fails When Trying to Install Packages from NVIDIA Repository | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Jamie Nguyen (Nvidia) <janguyen> | ||||
Component: | anaconda | Assignee: | Anaconda Maintenance Team <anaconda-maint-list> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Release Test Team <release-test-team-automation> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.1 | CC: | abeausol, janguyen, jkonecny, jstodola, sroza, tbaeder, triegel | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-04-20 15:23:38 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jamie Nguyen (Nvidia)
2020-04-16 18:03:16 UTC
Here's a link to the repo IBM was using. Hopefully it helps reproduce this: https://drive.google.com/open?id=1cYdM2Pem3XxNJD3Q1vUpRzrDpBKDF7i_ Jamie, unfortunately I'm not able to access the google drive link: "You need permission" Could you please attach the installation logs? You can find them as /tmp/*log in the installation environment and you can copy them from the machine using scp or copy them to a USB flash drive. Hi Jan, Can you give it another try? I just tweaked the sharing settings: https://drive.google.com/open?id=1cYdM2Pem3XxNJD3Q1vUpRzrDpBKDF7i_ Additionally, a colleague of mine wanted me to add the following comment: "I'm fairly confident this is the reason it is failing, as kickstart does not have support for this functionality (dnf module install <package>:<stream>). Additionally, I had found this article: https://communityblog.fedoraproject.org/fedora-modularity-whats-the-problem/" Lastly: I will ask our customer for the installation logs. Thanks for your patience. I can now access the google drive file. Kickstart supports modules and streams, see: "Specifying profiles of module streams" in https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/performing_an_advanced_rhel_installation/kickstart-script-file-format-reference_installing-rhel-as-an-experienced-user#package-selection-section_package-selection-in-kickstart and https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/performing_an_advanced_rhel_installation/kickstart-commands-and-options-reference_installing-rhel-as-an-experienced-user#module_kickstart-commands-for-system-configuration Since there was a traceback during the installation, all logs should be included in a single file /tmp/anaconda-tb-* - it's enough to attach this file. (In reply to Jamie Nguyen (Nvidia) from comment #3) > Hi Jan, > > Can you give it another try? I just tweaked the sharing settings: > https://drive.google.com/open?id=1cYdM2Pem3XxNJD3Q1vUpRzrDpBKDF7i_ > > Additionally, a colleague of mine wanted me to add the following comment: > "I'm fairly confident this is the reason it is failing, as kickstart does > not have support for this functionality (dnf module install > <package>:<stream>). In the bug description: > We get the same failure when installing with nvidia-driver as an rpm. From this I understand that this is happening also when modularity is not involved. Please correct me if I misunderstood something. (In reply to Jan Stodola from comment #4) > I can now access the google drive file. > > Kickstart supports modules and streams, see: > "Specifying profiles of module streams" in > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/ > html/performing_an_advanced_rhel_installation/kickstart-script-file-format- > reference_installing-rhel-as-an-experienced-user#package-selection- > section_package-selection-in-kickstart > and > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/ > html/performing_an_advanced_rhel_installation/kickstart-commands-and-options- > reference_installing-rhel-as-an-experienced-user#module_kickstart-commands- > for-system-configuration > > Since there was a traceback during the installation, all logs should be > included in a single file /tmp/anaconda-tb-* - it's enough to attach this > file. Please attach also dnf logs which can be found in /tmp/*.log. The ideal would be to attach all the logs from /tmp/. Since https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/performing_an_advanced_rhel_installation/kickstart-script-file-format-reference_installing-rhel-as-an-experienced-user lists a few possibilities of installing module stream profiles, I guess Kickstart does support this (search for "specifying profiles of module streams"). This page also says: > It is also possible to enable module streams using the module Kickstart command and then install packages contained in the module stream by naming them directly. Does this work as a possible workaround or is this what you meant when you said you have the same problem when installing the driver as an rpm? I don't have any experience with Kickstart but as an additional note, the modularity metadata in the linked repository still contains the following problematic line, which is fixed with a later version of the genmodules.py script that I sent to Nvidia: - cuda-0:drivers-445.43-1.ppc64le The format here is wrong. This might be completely unrelated of course and DNF handles this fine in my experience. The description comes from a customer (IBM actually) -- so I will ask them for logs and clarification. As for this quote, "We get the same failure when installing with nvidia-driver as an rpm", I believe this refers to the following stanza in their kickstart file: %packages <list of require rpms> opencl-filesystem opencl-headers @nvidia-driver:latest-dkms %end vs. %packages <list of require rpms> opencl-filesystem opencl-headers nvidia-driver %end Reading in between the lines, I think what they're trying to say is that the failure occurs in both cases. I will need to clarify this point with the customer as well though. Created attachment 1679689 [details]
Logs from my reproducer
Good news, I was able to reproduce this using VMs and have attached the logs.
Timm: thanks for the tip on the modularity metadata. Let me try a version with a fix for this to see if it helps.
Using a newer repo with the correct modularity metadata, I still run into the same problem: - cuda-drivers-0:450.16-1.x86_64 So it doesn't appear to be unrelated. (In reply to Timm Bäder from comment #8) > I don't have any experience with Kickstart but as an additional note, the > modularity metadata in the linked repository still contains the following > problematic line, which is fixed with a later version of the genmodules.py > script that I sent to Nvidia: > > - cuda-0:drivers-445.43-1.ppc64le > > The format here is wrong. This might be completely unrelated of course and > DNF handles this fine in my experience. It seems like the cuda-drivers package is breaking some packaging rules. It's trying to use /bin/sh in its %pretrans scriptlet which fails in the install environment. On this topic, the packaging guide [1], says: "Note that the %pretrans scriptlet will, in the particular case of system installation, run before anything at all has been installed. This implies that it cannot have any dependencies at all. For this reason, %pretrans is best avoided, but if used it MUST (by necessity) be written in Lua. See http://rpm.org/user_doc/lua.html for more information. To test this out further, I created the following package: Name: cuda-drivers Version: 450.16 Release: 1.dirty.el8 Packager: NVIDIA CORPORATION <dgx-dev> Vendor: NVIDIA CORPORATION License: NVIDIA Summary: Meta-package to install additional packages %description Jamie's test package %pretrans %files %changelog The %pretrans script does nothing -- and even this reproduces the pyanaconda backtrace. Conversely, when I create this: Name: cuda-drivers Version: 450.16 Release: 1.dirty.el8 Packager: NVIDIA CORPORATION <dgx-dev> Vendor: NVIDIA CORPORATION License: NVIDIA Summary: Meta-package to install additional packages %description Jamie's test package %pretrans -p <lua> %files %changelog The install proceeds as expected. [1]: https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans Thanks a lot for your investigation and debugging. If I understand it correctly that means that your package has to be updated to fix this issue. Do I understand it correctly? Is there something we can help you with? Jiri, That's correct, we'll have to fix this in our packaging. I believe there's anything that needs to be investigated from the Red Hat side, so I'll close this. Thanks! Seems like this is a popular bug! There is a ks profile available now that can be used for kickstart installations. That is documented here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#kickstart-installation In short -- when using kickstart, one can specify the following: @nvidia-driver:latest-dkms/ks Hope this update is useful for others. |