Bug 2142304 - openmpi: Bad address on f38 i686 koji
Summary: openmpi: Bad address on f38 i686 koji
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: openmpi
Version: 38
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2137308 2137389 2152521 2155197 2171588
TreeView+ depends on / blocked
 
Reported: 2022-11-12 21:21 UTC by marcindulak
Modified: 2023-02-21 04:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-mpi ompi issues 11065 0 None open mmap failure on i686 during vader init 2023-02-02 15:18:18 UTC

Description marcindulak 2022-11-12 21:21:05 UTC
Description of problem:

mpiexec prints an error "Bad address", seemingly only on i686 arch:
https://koji.fedoraproject.org/koji/taskinfo?taskID=94103358.
Adding "--mca btl self,tcp --mca btl_tcp_if_include 127.0.0.1/24" as mpiexec arguments makes the error disappear.

Version-Release number of selected component (if applicable):

openmpi-4.1.4-7.fc38

How reproducible:

Steps to Reproduce:
1. create libgomp-test.spec

Name:			libgomp-test
Version:		1.0.0
Release:		1%{?dist}
Summary:		libgomp test

License:		GPLv3+

BuildRequires:		openssh-clients
BuildRequires:		openmpi-devel
BuildRequires:		libgomp
BuildRequires:		gcc
BuildRequires:		strace
BuildRequires:		hostname
BuildRequires:		time

%description

%check

export TIMEOUT_OPTS='--preserve-status --kill-after 10 60'

%{_openmpi_load}
# https://github.com/mikaem/mpi-examples/blob/master/helloworld.cpp
cat <<EOF > hello.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    printf("Hello world! from rank %d"
           " out of %d processors\n",
           world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}
EOF
mpicc -fopenmp hello.c -o hello
timeout ${TIMEOUT_OPTS} time -- env -i PATH=$MPI_BIN:/bin OMP_NUM_THREADS=1 mpiexec --mca btl self,tcp --mca btl_tcp_if_include 127.0.0.1/24 --allow-run-as-root -np 2 ./hello
timeout ${TIMEOUT_OPTS} time -- env -i PATH=$MPI_BIN:/bin OMP_NUM_THREADS=1 mpiexec --mca btl_tcp_if_include 127.0.0.1/24 --allow-run-as-root -np 2 ./hello
timeout ${TIMEOUT_OPTS} time -- env -i PATH=$MPI_BIN:/bin OMP_NUM_THREADS=1 mpiexec --allow-run-as-root -np 2 ./hello

%{_openmpi_unload}

2. rpmbuild -bs libgomp-test.spec
3. koji build --nowait --scratch f38 ~/rpmbuild/SRPMS/libgomp-test-1.0.0-1.fc36.src.rpm

Actual results:

```
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.
  Local host:  d279f57c8afa4f999af7064835646a20
  System call: mmap(2) 
  Error:       Bad address (errno 14)
Hello world! from rank 0 out of 2 processors
Hello world! from rank 1 out of 2 processors
```

Expected results:
```
Hello world! from rank 0 out of 2 processors
Hello world! from rank 1 out of 2 processors
```

Additional info:

Despite the "Bad address" error, the above hello world mpi program succeeds.
On the other hand:

1) elk i686 https://koji.fedoraproject.org/koji/taskinfo?taskID=94086999 is printing "libgomp: Thread creation failed: Bad address" and returns exit 1 when running tests. The tests pass despite it as they produce the expected output files.

2) gpaw i686 https://koji.fedoraproject.org/koji/taskinfo?taskID=94085529 prints "Bad address" and fails with "Fatal Python error: Segmentation fault" when running tests. Adding "--mca btl_tcp_if_include 127.0.0.1/24" to mpiexec arguments allows the tests to pass.

Comment 1 marcindulak 2022-12-16 20:06:10 UTC
Another case of "Segmentation fault - invalid memory reference" in bug #2152521.

Maybe openmpi spec itself should include %check that performs some basic functionality tests?

Comment 2 Dominik 'Rathann' Mierzejewski 2022-12-20 11:24:28 UTC
I'm seeing this as well in elpa testsuite.

Comment 3 Ben Cotton 2023-02-07 14:58:55 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 4 Orion Poplawski 2023-02-21 04:43:30 UTC
Just a heads up that with 5.0 openmpi is dropping support for 32-bit architectures, so it is unlikely that this bug will be fixed any time soon, and support will disappear completely once 5.0 is released and packaged for Fedora.


Note You need to log in before you can comment on or make changes to this bug.