Bug 1728060

Summary: mpi4py fails to build in rawhide
Product: [Fedora] Fedora Reporter: Miro Hrončok <mhroncok>
Component: mpi4pyAssignee: Thomas Spura <tomspur>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: dakingun, python-sig, tomspur, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: mpi4py-3.0.2-2.fc31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-31 05:48:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1686977    

Description Miro Hrončok 2019-07-08 23:14:36 UTC
mpi4py fails to build with Python 3.8.0b1.


======================================================================
FAIL: testCompareAndSwap (test_rma.TestRMASelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 228, in testCompareAndSwap
    self.assertEqual(rbuf[1], -1)
AssertionError: 0 != -1

======================================================================
FAIL: testFetchAndOp (test_rma.TestRMASelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 190, in testFetchAndOp
    self.assertEqual(rbuf[1], -1)
AssertionError: 47 != -1

======================================================================
FAIL: testCompareAndSwap (test_rma.TestRMAWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 228, in testCompareAndSwap
    self.assertEqual(rbuf[1], -1)
AssertionError: 0 != -1

======================================================================
FAIL: testFetchAndOp (test_rma.TestRMAWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 190, in testFetchAndOp
    self.assertEqual(rbuf[1], -1)
AssertionError: -69 != -1

----------------------------------------------------------------------
Ran 1102 tests in 7.053s

FAILED (failures=4, skipped=46)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[14054,1],0]
  Exit code:    1
--------------------------------------------------------------------------
[1562606212.394756] [22b764337d874def8754761c8bb283ea:4889 :0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1562606212.540543] [22b764337d874def8754761c8bb283ea:4889 :0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1562606213.398255] [22b764337d874def8754761c8bb283ea:4889 :0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xb80) for ucp_am_bufs failed: Operation not permitted, please check shared memory limits by 'ipcs -l'

This might actually be a copr problem, not sure. Let me know if you cannot reproduce it outside of mock.

For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.8/fedora-rawhide-x86_64/00964785-mpi4py/

For all our attempts to build mpi4py with Python 3.8, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.8/package/mpi4py/

Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.8:
https://copr.fedorainfracloud.org/coprs/g/python/python3.8/

Let us know here if you have any questions.

Comment 1 Miro Hrončok 2019-07-30 15:02:19 UTC
Zbyszek, would you be able to help here?

Comment 2 Zbigniew Jędrzejewski-Szmek 2019-07-30 17:25:26 UTC
It fails the same in normal rawhide on amd64. No idea.
I'll update mpich to the lastest version, maybe that'll help.

Comment 3 Zbigniew Jędrzejewski-Szmek 2019-07-30 18:09:20 UTC
[1564354515.215559] [08dfc006c2a24ed0bf7d9276d6077ef3:4889 :0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1564354515.363010] [08dfc006c2a24ed0bf7d9276d6077ef3:4889 :0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1564354516.244026] [08dfc006c2a24ed0bf7d9276d6077ef3:4889 :0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xb80) for ucp_am_bufs failed: Operation not permitted, please check shared memory limits by 'ipcs -l'

This might be the cause. But I get the same failure on my machine, and it seems the limits are very high:
$  ipcs -l

------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767

Comment 4 Zbigniew Jędrzejewski-Szmek 2019-07-30 18:16:04 UTC
python3-mpich-3.1.1-1.fc31.x86_64 makes no difference ;(

Comment 6 Zbigniew Jędrzejewski-Szmek 2019-07-31 05:48:32 UTC
I made the build pass by ignoring the test failures. I don't think we gain much by keeping the package
in FTBFS state. Maybe upstream will know how to fix this.