Bug 1984013

Summary: Segfault after MPI_Finalize
Product: Red Hat Enterprise Linux 8 Reporter: Paulo Andrade <pandrade>
Component: openmpiAssignee: Nobody <nobody>
Status: NEW --- QA Contact: Infiniband QE <infiniband-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.4CC: rdma-dev-team, vikpatil
Target Milestone: beta   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paulo Andrade 2021-07-20 12:23:02 UTC
Steps to reproduce problem:

"""
$ cat > test.cc<<EOF
#include <thread>
#include <mpi.h>

void thread_mpi() {
    int mpi_threads_provided;
    MPI_Init_thread(nullptr, nullptr, MPI_THREAD_SINGLE, &mpi_threads_provided);
    MPI_Finalize();
}

int main() {
    std::thread thread = std::thread(thread_mpi);
    thread.join();
    return 0;
}
EOF

$ module load mpi/openmpi-x86_64
 g++ $(pkg-config ompi-cxx --cflags)  $(pkg-config ompi-cxx  --libs) test.cc -o test
"""

  User has a python environment, and above is a reduced reproducer.

  There are some workarounds, but still the problem should be fixed in mpi.

  The crash happens in /lib64/libpmix.so.2

  User is not using environemt-modules, but if using, one workaround is to explicitly link with pmix, for example:

"""
# dnf install -y pmix-devel
$ g++ $(pkg-config ompi-cxx --cflags)  $(pkg-config ompi-cxx  --libs) $(pkg-config --libs pmix) test.cc -o test
"""

  Because user is using python scripts, the workaround was a suggestion to load some pmix python module, what appears to have worked, but again, this appears to be just hidding a problem.

  A non tested workaround is that it might work with this pseudo-patch:

"""
-int pmix_tsd_keys_destruct()
+__attribute__((destructor)) int pmix_tsd_keys_destruct()
"""

in pmix_source/src/threads/thread.c

but not guaranteed, as the dependency chain leading to loading the pmix library is complex.

Comment 1 Honggang LI 2021-07-20 14:44:41 UTC
Is it rhel8 specific issue? Can we reproduce it with upstream code?

Comment 2 Paulo Andrade 2021-07-20 18:29:18 UTC
  The issue does not happen in recent Fedora.

  It is rhel8 specific, using rhel8 supported packages, as far as we could test.

Comment 4 Vikramsingh Patil 2023-06-30 02:54:11 UTC
Hello Team,

Is there any update regarding the bug.

Thanks in advance