Bug 1839571 - building MPI packages can fail due to oversubscription
Summary: building MPI packages can fail due to oversubscription
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: openmpi
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-24 19:11 UTC by david08741
Modified: 2024-03-12 12:28 UTC (History)
6 users (show)

Fixed In Version: openmpi-4.0.4-0.2.rc1.fc33
Clone Of:
Environment:
Last Closed: 2020-05-24 22:56:53 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description david08741 2020-05-24 19:11:27 UTC
Description of problem:
(re)building mpi packages can fail with openmpi, as openmpi does not run by default if oversubscribed. This can be avoided by exporting the env variable 
export OMPI_MCA_rmaps_base_oversubscribe=yes

Typically that is required in the check section, but sometimes it works fine on koji, but fails on copr, as copr has fewer cores available, the recent sundials update is an example.

Rather then ensuring that every package exports this in the %check section, openmpi could set this flag in %_openmpi_load

Version-Release number of selected component (if applicable):
current rawhide

How reproducible:
if check oversubscribes, always

Additional info:
Normally this is good to not oversubscribe, but we only want to check the package works, and oversubscription is generally accepted for testing.

devel thread that inspired this:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/M4EZN5P6SYNLSL5NQZM6EG32TQVEUCJN/

Comment 1 Orion Poplawski 2020-05-24 22:56:53 UTC
I like it.  Let's give it a try.  Thanks.

Comment 2 Cristian Le 2024-03-12 12:28:19 UTC
I think it's worth more follow up on this. I got another recommendation for setting `OMPI_MCA_hwloc_base_binding_policy=none` [1]. In that case I've made sure to not oversubscribe, but still the OpenMPI runner is significantly slower than the MPICH one.

I am also asking around for some feedback on how to write the spec file to be legible, easy to maintain and use the native macros [2].

[1]: https://github.com/cp2k/cp2k/pull/3268#discussion_r1521234892 
[2]: https://pagure.io/packaging-committee/issue/1320#comment-899983


Note You need to log in before you can comment on or make changes to this bug.