Bug 1791973 - mpi4py fails to build with Python 3.9: INTERNAL ERROR: invalid error code fffffffe
Summary: mpi4py fails to build with Python 3.9: INTERNAL ERROR: invalid error code fff...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: mpi4py
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Thomas Spura
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PYTHON39
TreeView+ depends on / blocked
 
Reported: 2020-01-16 19:18 UTC by Miro Hrončok
Modified: 2020-05-26 08:44 UTC (History)
6 users (show)

Fixed In Version: mpi4py-3.0.3-3.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-26 08:44:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Miro Hrončok 2020-01-16 19:18:38 UTC
mpi4py fails to build with Python 3.9.0a2.

+ mpiexec -np 1 python3 test/runtests.py -v --no-builddir -e spawn
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586)..............: 
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................: 
MPID_nem_init(324).................: 
MPID_nem_tcp_init(175).............: 
MPID_nem_tcp_get_business_card(401): 
MPID_nem_tcp_init(373).............: gethostbyname failed, 91d9a357206d437a81bf35ae5faa2198 (errno 0)


I have no idea how to read this, sorry

For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.9/fedora-rawhide-x86_64/01148603-mpi4py/

For all our attempts to build mpi4py with Python 3.9, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.9/package/mpi4py/

Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.9:
https://copr.fedorainfracloud.org/coprs/g/python/python3.9/

Let us know here if you have any questions.

Python 3.9 will be included in Fedora 33. To make that update smoother, we're building Fedora packages with early pre-releases of Python 3.9.
A build failure prevents us from testing all dependent packages (transitive [Build]Requires), so if this package is required a lot, it's important for us to get it fixed soon.
We'd appreciate help from the people who know this package best, but if you don't want to work on this now, let us know so we can try to work around it on our side.

Comment 1 Ben Cotton 2020-02-11 17:34:20 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Comment 2 Thomas Spura 2020-03-01 21:36:47 UTC
The latest two builds suggest a different error:
+ PYTHONPATH=/builddir/build/BUILDROOT/mpi4py-3.0.3-1.fc33.x86_64/usr/lib64/python3.9/site-packages/mpich
+ mpiexec -np 1 python3 test/runtests.py -v --no-builddir -e spawn
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.LFhbsv (%check)
    extra tokens at the end of %endif directive in line 216:  %endif # mpich disable
    Bad exit status from /var/tmp/rpm-tmp.LFhbsv (%check)
Child return code was: 1
EXCEPTION: [Error()]
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/mockbuild/trace_decorator.py", line 93, in trace
    result = func(*args, **kw)
  File "/usr/lib/python3.7/site-packages/mockbuild/util.py", line 739, in do_with_status
    raise exception.Error("Command failed: \n # %s\n%s" % (command, output), child.returncode)
mockbuild.exception.Error: Command failed: 

I fixed that one and wait for the next build in your copr. This seems a deep error within MPI itself and I would report it upstream after the failure in your copr.

Comment 3 Miro Hrončok 2020-03-01 21:53:48 UTC
https://copr.fedorainfracloud.org/coprs/g/python/python3.9/build/1278451/

+ mpiexec -np 1 python3 test/runtests.py -v --no-builddir -e spawn
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586)..............: 
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................: 
MPID_nem_init(324).................: 
MPID_nem_tcp_init(175).............: 
MPID_nem_tcp_get_business_card(404): 
MPID_nem_tcp_init(375).............: gethostbyname failed, f50d07e28e6449c4a6b7324ce3437a80 (errno 0)


RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.BVYMhv (%check)

Comment 4 Ankur Sinha (FranciscoD) 2020-05-12 09:34:01 UTC
MPID_nem_tcp_init(375).............: gethostbyname failed, f50d07e28e6449c4a6b7324ce3437a80 (errno 0)

*could* be because it's expecting a FQDN for the hostname (like sendmail). Not sure how to set it here to test, though.

Comment 5 Miro Hrončok 2020-05-25 13:02:02 UTC
This comment is mass posted to all bugs blocking the Python 3.9 tracker, sorry if it is not 100 % relevant. When in doubt, please ask.


The Python 3.9 rebuild is in progress in a Koji side tag.

If you fix this bug, please don't rebuild the package in regular rawhide, but do it in the side tag with:

    $ fedpkg build --target=f33-python

The rebuild is progressing slowly and it is possible this package won't have all the required build dependencies yet. If that's the case, please just leave the fix committed and pushed and we will eventually rebuild it for you.

You are not asked to go and try rebuild all the missing dependencies yourself. If you know there is a bootstrap loop in the dependencies, let me know and we can untangle it together.

If you want to test your fix or reproduce the failure, you can still use the Copr repo mentioned in the initial comment of this bug: https://copr.fedorainfracloud.org/coprs/g/python/python3.9/


Note You need to log in before you can comment on or make changes to this bug.