Description of problem: Currently enable-optimizations is only set for x86* builds. The comment in the specfile indicates that builds take too long on other architectures. With ARM hardware now running at 2-3GHz this is obviously no longer the case. A build on my 2GHz ARM machine takes about 1.5 hours. Additionally this can be shortened by limiting the number of tests that are used to build the GCC profile. Actual results: Python is not being built fully optimized for architectures other than x86 Expected results: Python is built with full optimization for all possible platforms. Additional info: Comparing my build with PGO enabled to the default Fedora build on recent Aarch64 Rawhide I see my pgbench run drop from 5.5-5.6s down to about 5s. This 12% improvement seems justifiable of the additional build time considering how much the Fedora distribution uses python.
I have tested a build that is run with a restricted list of profiled tests and the performance difference from the full 416 test list is almost nothing. For slow builds this may be an option. # Invoke the build make EXTRA_CFLAGS="$CFLAGS $MoreCFlags" %{?_smp_mflags} PROFILE_TASK='-m test.regrtest --pgo \ test_array \ test_base64 \ test_binascii \ test_binhex \ test_binop \ test_bytes \ test_c_locale_coercion \ test_class \ test_cmath \ test_codecs \ test_compile \ test_complex \ test_csv \ test_decimal \ test_dict \ test_float \ test_fstring \ test_hashlib \ test_io \ test_iter \ test_json \ test_long \ test_math \ test_memoryview \ test_pickle \ test_re \ test_set \ test_slice \ test_struct \ test_threading \ test_time \ test_traceback \ test_unicode \ ' \
I like the proposed solution of only running limited number of tests on slower architectures. For reference, I've submitted 5 Koji scratchbuilds that do current (full testsuite) PGO on all architectures: diff --git a/python3.spec b/python3.spec index 7910dd3..3ff1972 100644 --- a/python3.spec +++ b/python3.spec @@ -41,13 +41,7 @@ License: Python %bcond_without rpmwheels # Expensive optimizations (mainly, profile-guided optimizations) -%ifarch %{ix86} x86_64 %bcond_without optimizations -%else -# On some architectures, the optimized build takes tens of hours, possibly -# longer than Koji's 24-hour timeout. Disable optimizations here. -%bcond_with optimizations -%endif # Run the test suite in %%check %bcond_without tests https://koji.fedoraproject.org/koji/taskinfo?taskID=37034556 https://koji.fedoraproject.org/koji/taskinfo?taskID=37034578 https://koji.fedoraproject.org/koji/taskinfo?taskID=37034586 https://koji.fedoraproject.org/koji/taskinfo?taskID=37034625 https://koji.fedoraproject.org/koji/taskinfo?taskID=37034638 Let's see what time it takes on various arches.
Yes, I'd expect training PGO on a restricted set of tests would work -- the main thing to optimize is the common interpreter code. Thanks for filing the bug! Once it works, it would then be nice to get the list of "fast tests for slow architectures" marked upstream.
Here are avg (sum/len) times form 5 builds with current (full testuite) PGO: armv7hl: 4:39:22 i686: 1:08:43 x86_64: 0:57:56 aarch64: 2:58:03 ppc64le: 1:50:32 s390x: 0:57:56 I suppose we can run the current thing as is on s390x right now. Will try the suggested restricted list of profiled tests as well soon.
The following scratchbuilds have full testsuite PGO on i686, x86_64, s390x and partial (comment #1) on the rest (that is armv7hl, aarch64, ppc64le): https://koji.fedoraproject.org/koji/taskinfo?taskID=37037949 https://koji.fedoraproject.org/koji/taskinfo?taskID=37037956 https://koji.fedoraproject.org/koji/taskinfo?taskID=37037988 https://koji.fedoraproject.org/koji/taskinfo?taskID=37037995 https://koji.fedoraproject.org/koji/taskinfo?taskID=37038002
armv7hl: 1:37:47 aarch64: 1:13:23 ppc64le: 0:38:36 That is doable, however I would prefer if we could keep it around an hour so the armv7hl time bothers me. Also not that https://koji.fedoraproject.org/koji/taskinfo?taskID=37037995 aarch64 is still running, but that might be some random lag.
Out of curiosity, I have fired up a build that ony does test_array to see how much time we can gain by further reducing the tests that we run.
(In reply to Miro Hrončok from comment #7) > Out of curiosity, I have fired up a build that ony does test_array to see > how much time we can gain by further reducing the tests that we run. https://koji.fedoraproject.org/koji/taskinfo?taskID=37040092 https://koji.fedoraproject.org/koji/taskinfo?taskID=37040099 https://koji.fedoraproject.org/koji/taskinfo?taskID=37040107 https://koji.fedoraproject.org/koji/taskinfo?taskID=37040115 https://koji.fedoraproject.org/koji/taskinfo?taskID=37040134
The master branch of Python upstream was modified recently to only run a subset of tests in the "profiling task" (step used to train the compiler): https://bugs.python.org/issue36044 Extract of the new https://github.com/python/cpython/blob/master/Lib/test/libregrtest/pgo.py file: PGO_TESTS = [ 'test_array', 'test_base64', 'test_binascii', 'test_binop', 'test_bisect', 'test_bytes', 'test_bz2', 'test_cmath', 'test_codecs', 'test_collections', 'test_complex', 'test_dataclasses', 'test_datetime', 'test_decimal', 'test_difflib', 'test_embed', 'test_float', 'test_fstring', 'test_functools', 'test_generators', 'test_hashlib', 'test_heapq', 'test_int', 'test_itertools', 'test_json', 'test_long', 'test_lzma', 'test_math', 'test_memoryview', 'test_operator', 'test_ordered_dict', 'test_pickle', 'test_pprint', 'test_re', 'test_set', 'test_sqlite', 'test_statistics', 'test_struct', 'test_tabnanny', 'test_time', 'test_unicode', 'test_xml_etree', 'test_xml_etree_c', ] We can backport this change from master to Python 3.6 and use this list of tests for PGO for all platforms.
(In reply to Miro Hrončok from comment #7) > Out of curiosity, I have fired up a build that ony does test_array to see > how much time we can gain by further reducing the tests that we run. the average time for running just test_array was 1:46:11. Based on this very non-empirical observation, making the list any shorter than was provided is not helpful. If we want PGO on armv7hl, we need to accept a build that goes closer to 2 hours. I think it is reasonable. (In reply to Victor Stinner from comment #9) > The master branch of Python upstream was modified recently to only run a > subset of tests in the "profiling task" (step used to train the compiler): > https://bugs.python.org/issue36044 This seems backported to 3.8, so we can turn on PGO on rawhide. For f31 and lower, we can provide the list manually.
Here are the 3.8 rawhide (f32-python side tag) scratch builds with: diff --git a/python3.spec b/python3.spec index ab617df..9cb550d 100644 --- a/python3.spec +++ b/python3.spec @@ -52,13 +52,7 @@ License: Python %bcond_without rpmwheels # Expensive optimizations (mainly, profile-guided optimizations) -%ifarch %{ix86} x86_64 %bcond_without optimizations -%else -# On some architectures, the optimized build takes tens of hours, possibly -# longer than Koji's 24-hour timeout. Disable optimizations here. -%bcond_with optimizations -%endif # Run the test suite in %%check %bcond_without tests https://koji.fedoraproject.org/koji/taskinfo?taskID=37044448 https://koji.fedoraproject.org/koji/taskinfo?taskID=37044493 https://koji.fedoraproject.org/koji/taskinfo?taskID=37044526 https://koji.fedoraproject.org/koji/taskinfo?taskID=37044540 https://koji.fedoraproject.org/koji/taskinfo?taskID=37044549
The average total build time (all arches) was 1:55:31. Not great, not terrible.
Sounds like we need faster build machines. Are the ARMv7 builds done on bare metal or VMs on the aarch64 systems
> Are the ARMv7 builds done on bare metal or VMs on the aarch64 systems AFAIK Everything si bare metal, but I'm no 100% sure. See also https://kojipkgs.fedoraproject.org/work/tasks/4449/37044449/hw_info.log
https://src.fedoraproject.org/rpms/python3/pull-request/130
This has been fixed in rawhide, if you want this in an earlier Fedora version as well, please submit a Pull Request.