Description of problem: mpixec hangs on f38 https://koji.fedoraproject.org/koji/taskinfo?taskID=93958805 and works on f37 https://koji.fedoraproject.org/koji/taskinfo?taskID=93958801 I experience the same type of hang in #2137389, and there mpich does not hang. I'm unable to reproduce the issue in a local openmpi instance of dockerhub's fedora:38@sha256:c7dfa518d9db440fb02362c0f9b014c0e1b8e04bc0f6bf540d1d5ac2ecb43453. Version-Release number of selected component (if applicable): 4.1.4-5.fc38 How reproducible: in koji Steps to Reproduce: 1. Save this as openmpi-test.spec # https://github.com/open-mpi/ompi/issues/10324#issuecomment-1136363475 # https://github.com/open-mpi/ompi/issues/6850 # https://www.mail-archive.com/users@lists.open-mpi.org//msg26012.html Name: openmpi-test Version: 1.0.0 Release: 1%{?dist} Summary: openmpi test License: GPLv3+ BuildRequires: openssh-clients BuildRequires: openmpi-devel BuildRequires: gcc BuildRequires: strace BuildRequires: hostname BuildRequires: time %description %check export TIMEOUT_OPTS='--preserve-status --kill-after 10 60' export OMP_NUM_THREADS=1 %{_openmpi_load} timeout ${TIMEOUT_OPTS} time strace -f -e execve -- env -i PATH=$MPI_BIN:/bin mpiexec --mca btl self,tcp --mca btl_tcp_if_include 127.0.0.1/24 --mca plm_base_verbose 10 --allow-run-as-root -np 2 hostname # https://github.com/mikaem/mpi-examples/blob/master/helloworld.cpp cat <<EOF > hello.c #include <mpi.h> int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); printf("Hello world! from rank %d" " out of %d processors\n", world_rank, world_size); // Finalize the MPI environment. MPI_Finalize(); } EOF mpicc hello.c -o hello timeout ${TIMEOUT_OPTS} time strace -f -e execve -- env -i PATH=$MPI_BIN:/bin mpiexec --mca btl self,tcp --mca btl_tcp_if_include 127.0.0.1/24 --mca plm_base_verbose 10 --allow-run-as-root -np 2 ./hello %{_openmpi_unload} 2. rpmbuild -bs openmpi-test.spec 3. koji build --nowait --scratch f38 ~/rpmbuild/SRPMS/openmpi-test-1.0.0-1.fc36.src.rpm Actual results: hang Expected results: Hello world! from rank 1 out of 2 processors Hello world! from rank 0 out of 2 processors Additional info: It's a strange issue, could be openmpi and/or koji problem.
Does anyone know what change triggered this?
I suspect this was caused by enabling IPv6 support in openmpi. I've reverted that and am building it now.
Nope, that didn't appear to help. I've filed https://github.com/open-mpi/ompi/issues/11055 upstream.
Has anyone found any failures in koschei? I haven't yet. Why would that be?
Turned out to be a libfabric issue. Should be fixed now.