Created attachment 1561224 [details] Full log from Copr After the symptoms described in bz1705296 I've rebuilt mpich and openmpi, but now mpi4py no longer builds. That is mpi4py-3.0.1-4.fc31: ====================================================================== FAIL: testCompareAndSwap (test_rma.TestRMASelf) ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_rma.py", line 228, in testCompareAndSwap self.assertEqual(rbuf[1], -1) AssertionError: 0 != -1 ====================================================================== FAIL: testFetchAndOp (test_rma.TestRMASelf) ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_rma.py", line 190, in testFetchAndOp self.assertEqual(rbuf[1], -1) AssertionError: -116 != -1 ====================================================================== FAIL: testCompareAndSwap (test_rma.TestRMAWorld) ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_rma.py", line 228, in testCompareAndSwap self.assertEqual(rbuf[1], -1) AssertionError: 0 != -1 ====================================================================== FAIL: testFetchAndOp (test_rma.TestRMAWorld) ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_rma.py", line 190, in testFetchAndOp self.assertEqual(rbuf[1], -1) AssertionError: -124 != -1 ---------------------------------------------------------------------- Ran 1100 tests in 3.549s FAILED (failures=4, skipped=61) -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[9253,1],0] Exit code: 1 -------------------------------------------------------------------------- error: Bad exit status from /var/tmp/rpm-tmp.VaRIOu (%check) Full log attached.
I opened https://bitbucket.org/mpi4py/mpi4py/issues/124/test-failure-with-openmpi-401.
There is a new failure after 3.8.0a4: src/mpi4py.MPI.c:314:11: error: too few arguments to function ‘PyCode_New’ 314 | PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) | ^~~~~~~~~~ This means that the sources need to be recythonized.
Adding this to %prep seems to help: # Remove precythonized C sources rm $(grep -rl '/\* Generated by Cython') Building in Copr to see if the previous failure is still there.
Recythonizing the sources leads to: + mpiexec -np 1 python3 test/runtests.py -v --no-builddir --thread-level=serialized -e spawn [41f0acf557e440989184fec990a11425:4660 :0:4660] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7ff83a90b948) ==== backtrace ==== 0 /lib64/libucs.so.0(+0x194a3) [0x7ff83a8934a3] 1 /lib64/libucs.so.0(+0x1965a) [0x7ff83a89365a] 2 /lib64/libuct.so.0(+0x1b72b) [0x7ff83aa4172b] 3 /lib64/ld-linux-x86-64.so.2(+0xfe4a) [0x7ff83da9fe4a] 4 /lib64/ld-linux-x86-64.so.2(+0xff51) [0x7ff83da9ff51] 5 /lib64/ld-linux-x86-64.so.2(+0x13eae) [0x7ff83daa3eae] 6 /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff83d9ff1f9] 7 /lib64/ld-linux-x86-64.so.2(+0x1372e) [0x7ff83daa372e] 8 /lib64/libdl.so.2(+0x239c) [0x7ff83d53739c] 9 /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff83d9ff1f9] 10 /lib64/libc.so.6(_dl_catch_error+0x33) [0x7ff83d9ff293] 11 /lib64/libdl.so.2(+0x2b09) [0x7ff83d537b09] 12 /lib64/libdl.so.2(dlopen+0x4a) [0x7ff83d53742a] 13 /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6ead7) [0x7ff83cb23ad7] 14 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x1f4) [0x7ff83cb01524] 15 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35b) [0x7ff83cb004eb] 16 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x7ff83cb0bdfe] 17 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x256) [0x7ff83cb0c2e6] 18 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x14) [0x7ff83cb0c344] 19 /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x695) [0x7ff83cc76795] 20 /usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Init_thread+0x99) [0x7ff83cca6bf9] 21 /builddir/build/BUILDROOT/mpi4py-3.0.1-2.fc31.x86_64/usr/lib64/python3.8/site-packages/openmpi/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x329bc) [0x7ff83cd849bc] 22 /lib64/libpython3.8.so.1.0(PyModule_ExecDef+0x77) [0x7ff83d724b27] 23 /lib64/libpython3.8.so.1.0(+0x1c7b93) [0x7ff83d724b93] 24 /lib64/libpython3.8.so.1.0(_PyMethodDef_RawFastCallDict+0x350) [0x7ff83d67f9e0] 25 /lib64/libpython3.8.so.1.0(_PyCFunction_FastCallDict+0x23) [0x7ff83d67fa93] 26 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x640d) [0x7ff83d6e82ad] 27 /lib64/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x311) [0x7ff83d66e721] 28 /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0x196) [0x7ff83d6ac346] 29 /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf] 30 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x57ea) [0x7ff83d6e768a] 31 /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa] 32 /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf] 33 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xd7d) [0x7ff83d6e2c1d] 34 /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa] 35 /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf] 36 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc] 37 /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa] 38 /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf] 39 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc] 40 /lib64/libpython3.8.so.1.0(_PyFunction_FastCallDict+0x11a) [0x7ff83d66f44a] 41 /lib64/libpython3.8.so.1.0(+0x121787) [0x7ff83d67e787] 42 /lib64/libpython3.8.so.1.0(_PyObject_CallMethodIdObjArgs+0xb9) [0x7ff83d6a65d9] 43 /lib64/libpython3.8.so.1.0(PyImport_ImportModuleLevelObject+0x26b) [0x7ff83d67263b] 44 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x3219) [0x7ff83d6e50b9] 45 /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa] 46 /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf] 47 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc] 48 /lib64/libpython3.8.so.1.0(+0x1da7df) [0x7ff83d7377df] 49 /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf] 50 /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc] 51 /lib64/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x311) [0x7ff83d66e721] 52 /lib64/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x39) [0x7ff83d66f329] 53 /lib64/libpython3.8.so.1.0(PyEval_EvalCode+0x1b) [0x7ff83d6ff84b] 54 /lib64/libpython3.8.so.1.0(+0x20ee30) [0x7ff83d76be30] 55 /lib64/libpython3.8.so.1.0(PyRun_FileExFlags+0x97) [0x7ff83d76c3b7] 56 /lib64/libpython3.8.so.1.0(PyRun_SimpleFileExFlags+0x19a) [0x7ff83d7736da] 57 /lib64/libpython3.8.so.1.0(_Py_RunMain+0x353) [0x7ff83d774d13] 58 /lib64/libpython3.8.so.1.0(+0x217eb6) [0x7ff83d774eb6] 59 /lib64/libpython3.8.so.1.0(_Py_UnixMain+0x35) [0x7ff83d774f55] 60 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7ff83d8eb193] =================== -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 0 on node 41f0acf557e440989184fec990a11425 exited on signal 11 (Segmentation fault). --------------------------------------------------------------------------
The segfault is a current issue with openmpi 4/UCX that has yet to be resolved.
Orion, do you happen to have some pointers for that segfault?
I'm hoping that it's been resolved with the latest openmpi build - can you try another build?
OK. Rebuilding updated openmpi first.
mpi4py builds.