Bug 2320997

Summary: Loading of MPI module with %{_mpich_load} and %{_openmpi_load} is broken in F40 and F39
Product: [Fedora] Fedora Reporter: Sandro <gui1ty>
Component: LmodAssignee: Orion Poplawski <orion>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 40CC: fedora, johannes.lips, manisandro, orion, xavier.delaruelle
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Lmod-8.7.53-1.fc40 Lmod-8.7.53-1.fc39 Lmod-8.7.53-1.el9 Lmod-8.7.53-1.el8 Lmod-8.7.53-1.fc41 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-01 02:43:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sandro 2024-10-22 14:15:18 UTC
The execution of those macros kills the build. It seems due to a non-zero exit status when sourcing environment-modules. Though, there is no specific error printed.

This seems to be the same issue we saw a while ago [1] and it boils down to what package is selected for providing `environment(modules)`. If the honor falls upon Lmod the build breaks. I fit happens to be `environment-modules` everything works.

Last time this was hotfixed by having rpm-mpi-hooks depend on environment-modules instead of environment(modules) [2]. However, that fix was only applied to rawhide at the time. Ultimately, I think whatever package provides environment(modules) should also be useable in out build environments. In other words, I consider the dependency change applied a hack rather than a solution.

[1] https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/SL4IL4SWBYIUZVGXNHTX6CRYDHPIXWMO/#7DPVV7PRXTR73BSXHQXLD6R5LANFIZUV
[2] https://src.fedoraproject.org/rpms/rpm-mpi-hooks/c/90f8dfbf925a815820cd1bcdd469c8827a757b19?branch=rawhide

Reproducible: Always

Steps to Reproduce:
1. fedpkg clone python-lfpy
2. cd python-lfpy
3. fedpkg --release f40 mockbuild --no-cleanup-after
Actual Results:  
Build fails if Lmod is selected as provider of environment(modules).

Expected Results:  
Build succeeds regardless of selected environment(modules) provider.

Both packages, Lmod and environment-modules, provide /etc/profile.d/modules.sh. The version from the latter appears to work while the former doesn't. Or I'm missing something and the issue is manifesting somewhere else entirely.

For F40/F39 there's a workaround by explicitly specifying environment-modules as a build dependency. However, the reverse, specifying Lmod to get a breaking build in F41/rawhide, does not work. It will install both packages and apparently environment-modules gets to install environment-modules.

I'm not sure that's desirable either. Maybe two providers of the same capability should conflict each other?

Comment 1 Sandro 2024-10-22 14:16:59 UTC
cc'ing the maintainers of environment-modules and rpm-mpi-hooks for awareness

Comment 2 Sandro 2024-10-22 14:25:13 UTC
I've been a bit hasty and sloppy. Let me clarify...

(In reply to Sandro from comment #0)
> The execution of those macros kills the build. It seems due to a non-zero
> exit status when sourcing environment-modules.

when sourcing /etc/profile.d/modules.sh

> For F40/F39 there's a workaround by explicitly specifying
> environment-modules as a build dependency. However, the reverse, specifying
> Lmod to get a breaking build in F41/rawhide, does not work. It will install
> both packages and apparently environment-modules gets to install
> environment-modules.

gets to install /etc/profile.d/modules.sh

Let me know if anything else is unclear.

Comment 3 hannes 2024-10-22 20:58:10 UTC
There are quite some packages affected by this issue. 
https://koschei.fedoraproject.org/affected-by/Lmod?epoch1=0&version1=8.7.37&release1=1.fc40&epoch2=0&version2=8.7.48&release2=1.fc40&collection=f40

I just recently came across this issue, when trying to update gretl in f40 and the build failed 
https://koji.fedoraproject.org/koji/buildinfo?buildID=2572608

Comment 4 Cristian Le 2024-10-23 00:38:41 UTC
I was working on cp2k update for unrelated reasons and I've made a change to remove the `source /etc/profile.d/modues.sh` and I have the pure `module load` commands running. The build seems to have detected the mpi environments just fine. Not sure why it is working though. I will re-run the builds for F40 and check the build logs again

Comment 5 Orion Poplawski 2024-10-23 14:00:47 UTC
I thik this is fixed in later Lmod releases.  Starting some tests now...

Comment 6 Orion Poplawski 2024-10-23 14:05:40 UTC
Confirmed that 8.7.53 is working.  FYI - for future debugging, this helps:

export LMOD_SH_DBG_ON=1

Comment 7 Fedora Update System 2024-10-23 14:09:07 UTC
FEDORA-2024-14d553a254 (Lmod-8.7.53-1.fc39) has been submitted as an update to Fedora 39.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-14d553a254

Comment 8 Fedora Update System 2024-10-23 14:09:08 UTC
FEDORA-EPEL-2024-7ca00cd70b (Lmod-8.7.53-1.el9) has been submitted as an update to Fedora EPEL 9.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-7ca00cd70b

Comment 9 Fedora Update System 2024-10-23 14:09:08 UTC
FEDORA-2024-b883c27c18 (Lmod-8.7.53-1.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-b883c27c18

Comment 10 Fedora Update System 2024-10-23 14:09:09 UTC
FEDORA-2024-602fe3db71 (Lmod-8.7.53-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-602fe3db71

Comment 11 Fedora Update System 2024-10-23 14:09:10 UTC
FEDORA-EPEL-2024-14060f4306 (Lmod-8.7.53-1.el8) has been submitted as an update to Fedora EPEL 8.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-14060f4306

Comment 12 Orion Poplawski 2024-10-23 14:11:21 UTC
I've also submitted buildroot overrides for the Fedora releases.

Comment 13 Sandro 2024-10-23 19:53:47 UTC
First of all thanks for the fix. Purely out of curiosity, do you happen to know or have a pointer to what got fixed where?

When investigating the issue, I also tried running the first two commands %{_mpich_load} expands to inside the mock chroot. Nothing blew up. I just did so again with `export LMOD_SH_DBG_ON=1` (still on Lmod-8.7.49). It didn't give me any additional output.

Anyway, I can confirm I'm able to build again (in Koji with the override) using Lmod as the environment(modules) provider.

Comment 14 Fedora Update System 2024-10-24 01:57:29 UTC
FEDORA-2024-602fe3db71 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-602fe3db71`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-602fe3db71

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2024-10-24 02:08:22 UTC
FEDORA-EPEL-2024-7ca00cd70b has been pushed to the Fedora EPEL 9 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-7ca00cd70b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 16 Fedora Update System 2024-10-24 02:14:31 UTC
FEDORA-EPEL-2024-14060f4306 has been pushed to the Fedora EPEL 8 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-14060f4306

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2024-10-24 02:19:09 UTC
FEDORA-2024-14d553a254 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-14d553a254`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-14d553a254

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2024-10-24 02:23:26 UTC
FEDORA-2024-b883c27c18 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-b883c27c18`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-b883c27c18

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2024-11-01 02:43:59 UTC
FEDORA-2024-b883c27c18 (Lmod-8.7.53-1.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 20 Fedora Update System 2024-11-01 03:17:23 UTC
FEDORA-2024-14d553a254 (Lmod-8.7.53-1.fc39) has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2024-11-01 03:17:28 UTC
FEDORA-EPEL-2024-7ca00cd70b (Lmod-8.7.53-1.el9) has been pushed to the Fedora EPEL 9 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 22 Fedora Update System 2024-11-01 03:36:32 UTC
FEDORA-EPEL-2024-14060f4306 (Lmod-8.7.53-1.el8) has been pushed to the Fedora EPEL 8 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 23 Fedora Update System 2024-11-01 03:42:04 UTC
FEDORA-2024-602fe3db71 (Lmod-8.7.53-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.