Bug 1705301 - mpi4py FTBFS with Python 3.8
Summary: mpi4py FTBFS with Python 3.8
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: mpi4py
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Thomas Spura
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PYTHON38 1705296
TreeView+ depends on / blocked
 
Reported: 2019-05-01 23:52 UTC by Miro Hrončok
Modified: 2019-06-03 15:47 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-06-03 15:47:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Full log from Copr (342.27 KB, text/plain)
2019-05-01 23:52 UTC, Miro Hrončok
no flags Details

Description Miro Hrončok 2019-05-01 23:52:00 UTC
Created attachment 1561224 [details]
Full log from Copr

After the symptoms described in bz1705296 I've rebuilt mpich and openmpi, but now mpi4py no longer builds. That is  mpi4py-3.0.1-4.fc31:


======================================================================
FAIL: testCompareAndSwap (test_rma.TestRMASelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 228, in testCompareAndSwap
    self.assertEqual(rbuf[1], -1)
AssertionError: 0 != -1

======================================================================
FAIL: testFetchAndOp (test_rma.TestRMASelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 190, in testFetchAndOp
    self.assertEqual(rbuf[1], -1)
AssertionError: -116 != -1

======================================================================
FAIL: testCompareAndSwap (test_rma.TestRMAWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 228, in testCompareAndSwap
    self.assertEqual(rbuf[1], -1)
AssertionError: 0 != -1

======================================================================
FAIL: testFetchAndOp (test_rma.TestRMAWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 190, in testFetchAndOp
    self.assertEqual(rbuf[1], -1)
AssertionError: -124 != -1

----------------------------------------------------------------------
Ran 1100 tests in 3.549s

FAILED (failures=4, skipped=61)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9253,1],0]
  Exit code:    1
--------------------------------------------------------------------------
error: Bad exit status from /var/tmp/rpm-tmp.VaRIOu (%check)

Full log attached.

Comment 1 Zbigniew Jędrzejewski-Szmek 2019-05-02 18:32:33 UTC
I opened https://bitbucket.org/mpi4py/mpi4py/issues/124/test-failure-with-openmpi-401.

Comment 2 Miro Hrončok 2019-05-27 10:20:02 UTC
There is a new failure after 3.8.0a4:

src/mpi4py.MPI.c:314:11: error: too few arguments to function ‘PyCode_New’
  314 |           PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)
      |           ^~~~~~~~~~

This means that the sources need to be recythonized.

Comment 3 Miro Hrončok 2019-05-27 10:23:57 UTC
Adding this to %prep seems to help:

# Remove precythonized C sources
rm $(grep -rl '/\* Generated by Cython')



Building in Copr to see if the previous failure is still there.

Comment 4 Miro Hrončok 2019-05-27 10:39:50 UTC
Recythonizing the sources leads to:

+ mpiexec -np 1 python3 test/runtests.py -v --no-builddir --thread-level=serialized -e spawn
[41f0acf557e440989184fec990a11425:4660 :0:4660] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7ff83a90b948)
==== backtrace ====
    0  /lib64/libucs.so.0(+0x194a3) [0x7ff83a8934a3]
    1  /lib64/libucs.so.0(+0x1965a) [0x7ff83a89365a]
    2  /lib64/libuct.so.0(+0x1b72b) [0x7ff83aa4172b]
    3  /lib64/ld-linux-x86-64.so.2(+0xfe4a) [0x7ff83da9fe4a]
    4  /lib64/ld-linux-x86-64.so.2(+0xff51) [0x7ff83da9ff51]
    5  /lib64/ld-linux-x86-64.so.2(+0x13eae) [0x7ff83daa3eae]
    6  /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff83d9ff1f9]
    7  /lib64/ld-linux-x86-64.so.2(+0x1372e) [0x7ff83daa372e]
    8  /lib64/libdl.so.2(+0x239c) [0x7ff83d53739c]
    9  /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff83d9ff1f9]
   10  /lib64/libc.so.6(_dl_catch_error+0x33) [0x7ff83d9ff293]
   11  /lib64/libdl.so.2(+0x2b09) [0x7ff83d537b09]
   12  /lib64/libdl.so.2(dlopen+0x4a) [0x7ff83d53742a]
   13  /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6ead7) [0x7ff83cb23ad7]
   14  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x1f4) [0x7ff83cb01524]
   15  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35b) [0x7ff83cb004eb]
   16  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x7ff83cb0bdfe]
   17  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x256) [0x7ff83cb0c2e6]
   18  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x14) [0x7ff83cb0c344]
   19  /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x695) [0x7ff83cc76795]
   20  /usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Init_thread+0x99) [0x7ff83cca6bf9]
   21  /builddir/build/BUILDROOT/mpi4py-3.0.1-2.fc31.x86_64/usr/lib64/python3.8/site-packages/openmpi/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x329bc) [0x7ff83cd849bc]
   22  /lib64/libpython3.8.so.1.0(PyModule_ExecDef+0x77) [0x7ff83d724b27]
   23  /lib64/libpython3.8.so.1.0(+0x1c7b93) [0x7ff83d724b93]
   24  /lib64/libpython3.8.so.1.0(_PyMethodDef_RawFastCallDict+0x350) [0x7ff83d67f9e0]
   25  /lib64/libpython3.8.so.1.0(_PyCFunction_FastCallDict+0x23) [0x7ff83d67fa93]
   26  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x640d) [0x7ff83d6e82ad]
   27  /lib64/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x311) [0x7ff83d66e721]
   28  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0x196) [0x7ff83d6ac346]
   29  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   30  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x57ea) [0x7ff83d6e768a]
   31  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   32  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   33  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xd7d) [0x7ff83d6e2c1d]
   34  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   35  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   36  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   37  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   38  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   39  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   40  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallDict+0x11a) [0x7ff83d66f44a]
   41  /lib64/libpython3.8.so.1.0(+0x121787) [0x7ff83d67e787]
   42  /lib64/libpython3.8.so.1.0(_PyObject_CallMethodIdObjArgs+0xb9) [0x7ff83d6a65d9]
   43  /lib64/libpython3.8.so.1.0(PyImport_ImportModuleLevelObject+0x26b) [0x7ff83d67263b]
   44  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x3219) [0x7ff83d6e50b9]
   45  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   46  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   47  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   48  /lib64/libpython3.8.so.1.0(+0x1da7df) [0x7ff83d7377df]
   49  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   50  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   51  /lib64/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x311) [0x7ff83d66e721]
   52  /lib64/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x39) [0x7ff83d66f329]
   53  /lib64/libpython3.8.so.1.0(PyEval_EvalCode+0x1b) [0x7ff83d6ff84b]
   54  /lib64/libpython3.8.so.1.0(+0x20ee30) [0x7ff83d76be30]
   55  /lib64/libpython3.8.so.1.0(PyRun_FileExFlags+0x97) [0x7ff83d76c3b7]
   56  /lib64/libpython3.8.so.1.0(PyRun_SimpleFileExFlags+0x19a) [0x7ff83d7736da]
   57  /lib64/libpython3.8.so.1.0(_Py_RunMain+0x353) [0x7ff83d774d13]
   58  /lib64/libpython3.8.so.1.0(+0x217eb6) [0x7ff83d774eb6]
   59  /lib64/libpython3.8.so.1.0(_Py_UnixMain+0x35) [0x7ff83d774f55]
   60  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7ff83d8eb193]
===================
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node 41f0acf557e440989184fec990a11425 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Comment 5 Orion Poplawski 2019-05-27 13:31:49 UTC
The segfault is a current issue with openmpi 4/UCX that has yet to be resolved.

Comment 6 Miro Hrončok 2019-06-03 11:46:30 UTC
Orion, do you happen to have some pointers for that segfault?

Comment 7 Orion Poplawski 2019-06-03 14:30:10 UTC
I'm hoping that it's been resolved with the latest openmpi build - can you try another build?

Comment 8 Miro Hrončok 2019-06-03 14:45:21 UTC
OK. Rebuilding updated openmpi first.

Comment 9 Miro Hrončok 2019-06-03 15:47:07 UTC
mpi4py builds.


Note You need to log in before you can comment on or make changes to this bug.