bout++ fails to build with Python 3.10.0a4. ======= FAILURES ======== ----- test-multigrid_laplace ----- rm: cannot remove 'data/BOUT.dmp.*.nc': No such file or directory (It is likely that a timeout occured) ======= 1 failed in 929.92 seconds ======== make: *** [makefile:49: check-integrated-tests] Error 1 For the build logs, see: https://copr-be.cloud.fedoraproject.org/results/@python/python3.10/fedora-rawhide-x86_64/01868967-bout++/ For all our attempts to build bout++ with Python 3.10, see: https://copr.fedorainfracloud.org/coprs/g/python/python3.10/package/bout++/ Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.10: https://copr.fedorainfracloud.org/coprs/g/python/python3.10/ Let us know here if you have any questions. Python 3.10 will be included in Fedora 35. To make that update smoother, we're building Fedora packages with early pre-releases of Python 3.10. A build failure prevents us from testing all dependent packages (transitive [Build]Requires), so if this package is required a lot, it's important for us to get it fixed soon. We'd appreciate help from the people who know this package best, but if you don't want to work on this now, let us know so we can try to work around it on our side.
IIRC this should only happen in Copr and not Koji. A workaround is to enable network access. See https://bugzilla.redhat.com/show_bug.cgi?id=1793612#c1 for details.
I don't think as it is that simple, the MPI issues is I think fixed, at least on rawhide. The test should not be particular slow, either, normally 20 to 30 secs, so well below the 600 secs. I will try to investigate this, and thus keep the bug open.
I am tempted to say this is an issue that copr is not having enough cores. Even though the test only uses 3 threads - that might be sufficient to trigger the timeout. On an old 2-core system the test finishes in about 4 seconds if it is using 1 thread, but with 3 threads it takes over 4 minutes. I am not sure what copr is using, but I think it is also using old CPUs and very few CPU (1?) - in which case it might take well more then 10 minutes. On a decent 64 core system the single tread version takes 1.3 seconds and 1.0 with 3 threads. If this keeps being an issue, and I can disable the test on copr or if there is only one core available. The underlying issue is that MPI is optimized to be fast on non-oversubscribed systems. While in the real world MPI should never be used oversubscribed, this is common for testing, in which case the "idle" threads are busy waiting on the other threads ...
Any explanation why it works with network enabled?
Pure luck - I guess ... Timeout is 600 seconds, in the case with network enabled it took: test-multigrid_laplace ✓ 588.655 s In that case increasing the time-out might be the most easy solution ...
I have increased the timeout from 10m to 15m, I think that should fix the issue.