Description of problem: I tried to build elpa on a host with 24 cpu cores. Build seem to go fine but after 6 hours mock gave up with tests because of timeout. I doubled timeout and after 12 hours mock gave up. I checked elpa spec to find out there is elpa-rpm.patch which does a very big change to CPU core usage by doing similar to this change to several tests: diff -up mpich/Makefile.am.r mpich/Makefile.am --- mpich/Makefile.am.r 2015-03-17 16:05:37.000000000 +0100 +++ mpich/Makefile.am 2015-03-20 10:59:37.967517516 +0100 @@ -204,47 +204,47 @@ check_SCRIPTS = \ TESTS = $(check_SCRIPTS) elpa1_test_real.sh: - echo 'mpiexec -n 2 ./elpa1_test_real@SUFFIX@ $$TEST_FLAGS' > elpa1_test_real.sh + echo 'mpiexec -n `getconf _NPROCESSORS_ONLN` ./elpa1_test_real@SUFFIX@ $$TEST_FLAGS' > elpa1_test_real.sh chmod +x elpa1_test_real.sh By removing all changes to Makefile.am build was successful with first try. I saw from changelog there has been similar problems with different cpu archs before. I see fedora build system only uses 4 cpu cores for x86_64. I even tried to build in a vm with 8 cpu cores only but timeout after 12 hours still happens and build doesn't finish. I'd strongly suggest removing this unnecessary change from elpa-rpm.patch so that package really builds on multi-core machine. This unnecessary optimization prevents that now. I tested build on epel7 epel6/x86_64 and epel6/i686 and I see same timeout problem on all those builds. tested versions: 2015.02.002-4.el7 2015.02.002-4.el6
There might be a bug in the openmpi packages present in epel buildroots. The tests are not timing out on Fedora rawhide and 22. I observed this with openmpi earlier (bug 1144408), but it fixed itself with recent OpenMPI packages. Are you seeing the timeouts for mpich tests as well? Please disable openmpi tests and retry.
Also, massive parallelization is kind of the raison d'être for this library, so if running the testsuite doesn't scale, then it's a bug that needs to be fixed, not worked around by decreasing the number of processes running.
It's not about testsuite failing - it's about testsuite taking rediculous amount of time to build which doesn't make any sense and buildsystems timining out. With -n 2 package build takes around 3 hours to comlete and 90% of the time is taken by test suite. -n 8 doesn't complete in 12h which was my absolute maximum timeout for build. This is not traditional FTBFS problem beause there is no failure in build. Packager modifications for "intended" behaviour of test suite are now reason for timeout, not software itself.
You haven't answered my question. Maybe you missed it, so let me repeat: Are you seeing the timeouts for mpich tests as well? Please disable openmpi tests and retry.
I did answer but you didn't read it. There was only buildsystem timeout. No test timeouts. When I initially added build timeout from 6h to 12h tests just got a little further. Yes, this can be problems in other components but this packaging change is obvious trigger for the bad behaviour.
You still haven't answered my question, so I'm repeating it for the last time: is this timeout issue occurring for both openmpi and mpich or only openmpi? If it's only openmpi, then I'll implement a workaround for openmpi tests, but I don't see any reason not to use the full capacity of the build system for the testsuite. This is especially important for ARM builds. Also, I experienced these timeouts on my own machine which has 4 cores, so this belies your claim that it only happens with >4 CPU cores.
Created attachment 1029158 [details] Mock build logs without modifications and without ncpus hack Log file names should indicated build environment and timeout options given to mock.
elpa-2015.11.001-5.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-7a1372d77b
openmx-3.8.1-9.el7 elpa-2015.11.001-5.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-9161cc56d2
elpa-2015.11.001-5.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-7a1372d77b
elpa-2015.11.001-6.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-7a1372d77b
elpa-2015.11.001-6.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-7a1372d77b
elpa-2015.11.001-6.el7, openmx-3.8.1-9.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-9161cc56d2
elpa-2015.11.001-6.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
elpa-2015.11.001-6.el7, openmx-3.8.1-9.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.