Bug 1984013 - Segfault after MPI_Finalize
Summary: Segfault after MPI_Finalize
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: openmpi
Version: 8.4
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: beta
: ---
Assignee: Nobody
QA Contact: Infiniband QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-20 12:23 UTC by Paulo Andrade
Modified: 2023-06-30 02:54 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Paulo Andrade 2021-07-20 12:23:02 UTC
Steps to reproduce problem:

"""
$ cat > test.cc<<EOF
#include <thread>
#include <mpi.h>

void thread_mpi() {
    int mpi_threads_provided;
    MPI_Init_thread(nullptr, nullptr, MPI_THREAD_SINGLE, &mpi_threads_provided);
    MPI_Finalize();
}

int main() {
    std::thread thread = std::thread(thread_mpi);
    thread.join();
    return 0;
}
EOF

$ module load mpi/openmpi-x86_64
 g++ $(pkg-config ompi-cxx --cflags)  $(pkg-config ompi-cxx  --libs) test.cc -o test
"""

  User has a python environment, and above is a reduced reproducer.

  There are some workarounds, but still the problem should be fixed in mpi.

  The crash happens in /lib64/libpmix.so.2

  User is not using environemt-modules, but if using, one workaround is to explicitly link with pmix, for example:

"""
# dnf install -y pmix-devel
$ g++ $(pkg-config ompi-cxx --cflags)  $(pkg-config ompi-cxx  --libs) $(pkg-config --libs pmix) test.cc -o test
"""

  Because user is using python scripts, the workaround was a suggestion to load some pmix python module, what appears to have worked, but again, this appears to be just hidding a problem.

  A non tested workaround is that it might work with this pseudo-patch:

"""
-int pmix_tsd_keys_destruct()
+__attribute__((destructor)) int pmix_tsd_keys_destruct()
"""

in pmix_source/src/threads/thread.c

but not guaranteed, as the dependency chain leading to loading the pmix library is complex.

Comment 1 Honggang LI 2021-07-20 14:44:41 UTC
Is it rhel8 specific issue? Can we reproduce it with upstream code?

Comment 2 Paulo Andrade 2021-07-20 18:29:18 UTC
  The issue does not happen in recent Fedora.

  It is rhel8 specific, using rhel8 supported packages, as far as we could test.

Comment 4 Vikramsingh Patil 2023-06-30 02:54:11 UTC
Hello Team,

Is there any update regarding the bug.

Thanks in advance


Note You need to log in before you can comment on or make changes to this bug.