Bug 1741015 - --enable-optimizations is not set for architectures other than x86*
Summary: --enable-optimizations is not set for architectures other than x86*
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: python3
Version: rawhide
Hardware: arm
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Miro Hrončok
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-14 05:19 UTC by Jon Nettleton
Modified: 2019-09-03 13:20 UTC (History)
11 users (show)

Fixed In Version: python3-3.8.0~b3-4.fc32
Clone Of:
Environment:
Last Closed: 2019-09-03 13:20:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jon Nettleton 2019-08-14 05:19:03 UTC
Description of problem:
Currently enable-optimizations is only set for x86* builds.  The comment in the specfile indicates that builds take too long on other architectures.  With ARM hardware now running at 2-3GHz this is obviously no longer the case.  A build on my 2GHz ARM machine takes about 1.5 hours.  Additionally this can be shortened by limiting the number of tests that are used to build the GCC profile.


Actual results:
Python is not being built fully optimized for architectures other than x86

Expected results:
Python is built with full optimization for all possible platforms.

Additional info:
Comparing my build with PGO enabled to the default Fedora build on recent Aarch64 Rawhide I see my pgbench run drop from 5.5-5.6s down to about 5s.  This 12% improvement seems justifiable of the additional build time considering how much the Fedora distribution uses python.

Comment 1 Jon Nettleton 2019-08-14 07:52:39 UTC
I have tested a build that is run with a restricted list of profiled tests and the performance difference from the full 416 test list is almost nothing.  For slow builds this may be an option.

  # Invoke the build
  make EXTRA_CFLAGS="$CFLAGS $MoreCFlags" %{?_smp_mflags} PROFILE_TASK='-m test.regrtest --pgo \
  			test_array \
			test_base64 \
			test_binascii \
			test_binhex \
			test_binop \
			test_bytes \
			test_c_locale_coercion \
			test_class \
			test_cmath \
			test_codecs \
			test_compile \
			test_complex \
			test_csv \
			test_decimal \
			test_dict \
			test_float \
			test_fstring \
			test_hashlib \
			test_io \
			test_iter \
			test_json \
			test_long \
			test_math \
			test_memoryview \
			test_pickle \
			test_re \
			test_set \
			test_slice \
			test_struct \
			test_threading \
			test_time \
			test_traceback \
			test_unicode \
		' \

Comment 2 Miro Hrončok 2019-08-14 08:48:35 UTC
I like the proposed solution of only running limited number of tests on slower architectures.

For reference, I've submitted 5 Koji scratchbuilds that do current (full testsuite) PGO on all architectures:

diff --git a/python3.spec b/python3.spec
index 7910dd3..3ff1972 100644
--- a/python3.spec
+++ b/python3.spec
@@ -41,13 +41,7 @@ License: Python
 %bcond_without rpmwheels
 
 # Expensive optimizations (mainly, profile-guided optimizations)
-%ifarch %{ix86} x86_64
 %bcond_without optimizations
-%else
-# On some architectures, the optimized build takes tens of hours, possibly
-# longer than Koji's 24-hour timeout. Disable optimizations here.
-%bcond_with optimizations
-%endif
 
 # Run the test suite in %%check
 %bcond_without tests

https://koji.fedoraproject.org/koji/taskinfo?taskID=37034556
https://koji.fedoraproject.org/koji/taskinfo?taskID=37034578
https://koji.fedoraproject.org/koji/taskinfo?taskID=37034586
https://koji.fedoraproject.org/koji/taskinfo?taskID=37034625
https://koji.fedoraproject.org/koji/taskinfo?taskID=37034638


Let's see what time it takes on various arches.

Comment 3 Petr Viktorin (pviktori) 2019-08-14 08:54:58 UTC
Yes, I'd expect training PGO on a restricted set of tests would work -- the main thing to optimize is the common interpreter code.
Thanks for filing the bug!

Once it works, it would then be nice to get the list of "fast tests for slow architectures" marked upstream.

Comment 4 Miro Hrončok 2019-08-14 14:04:22 UTC
Here are avg (sum/len) times form 5 builds with current (full testuite) PGO:

armv7hl: 4:39:22
i686:    1:08:43
x86_64:  0:57:56
aarch64: 2:58:03
ppc64le: 1:50:32
s390x:   0:57:56

I suppose we can run the current thing as is on s390x right now.

Will try the suggested restricted list of profiled tests as well soon.

Comment 6 Miro Hrončok 2019-08-14 16:45:46 UTC
armv7hl:  1:37:47
aarch64:  1:13:23
ppc64le:  0:38:36

That is doable, however I would prefer if we could keep it around an hour so the armv7hl time bothers me.

Also not that https://koji.fedoraproject.org/koji/taskinfo?taskID=37037995 aarch64 is still running, but that might be some random lag.

Comment 7 Miro Hrončok 2019-08-14 17:25:10 UTC
Out of curiosity, I have fired up a build that ony does test_array to see how much time we can gain by further reducing the tests that we run.

Comment 8 Miro Hrončok 2019-08-14 17:53:59 UTC
(In reply to Miro Hrončok from comment #7)
> Out of curiosity, I have fired up a build that ony does test_array to see
> how much time we can gain by further reducing the tests that we run.

https://koji.fedoraproject.org/koji/taskinfo?taskID=37040092
https://koji.fedoraproject.org/koji/taskinfo?taskID=37040099
https://koji.fedoraproject.org/koji/taskinfo?taskID=37040107
https://koji.fedoraproject.org/koji/taskinfo?taskID=37040115
https://koji.fedoraproject.org/koji/taskinfo?taskID=37040134

Comment 9 Victor Stinner 2019-08-14 23:21:47 UTC
The master branch of Python upstream was modified recently to only run a subset of tests in the "profiling task" (step used to train the compiler): https://bugs.python.org/issue36044

Extract of the new https://github.com/python/cpython/blob/master/Lib/test/libregrtest/pgo.py file:

PGO_TESTS = [
    'test_array',
    'test_base64',
    'test_binascii',
    'test_binop',
    'test_bisect',
    'test_bytes',
    'test_bz2',
    'test_cmath',
    'test_codecs',
    'test_collections',
    'test_complex',
    'test_dataclasses',
    'test_datetime',
    'test_decimal',
    'test_difflib',
    'test_embed',
    'test_float',
    'test_fstring',
    'test_functools',
    'test_generators',
    'test_hashlib',
    'test_heapq',
    'test_int',
    'test_itertools',
    'test_json',
    'test_long',
    'test_lzma',
    'test_math',
    'test_memoryview',
    'test_operator',
    'test_ordered_dict',
    'test_pickle',
    'test_pprint',
    'test_re',
    'test_set',
    'test_sqlite',
    'test_statistics',
    'test_struct',
    'test_tabnanny',
    'test_time',
    'test_unicode',
    'test_xml_etree',
    'test_xml_etree_c',
]

We can backport this change from master to Python 3.6 and use this list of tests for PGO for all platforms.

Comment 10 Miro Hrončok 2019-08-14 23:34:17 UTC
(In reply to Miro Hrončok from comment #7)
> Out of curiosity, I have fired up a build that ony does test_array to see
> how much time we can gain by further reducing the tests that we run.

the average time for running just test_array was 1:46:11.

Based on this very non-empirical observation, making the list any shorter than was provided is not helpful. If we want PGO on armv7hl, we need to accept a build that goes closer to 2 hours.
I think it is reasonable.

(In reply to Victor Stinner from comment #9)
> The master branch of Python upstream was modified recently to only run a
> subset of tests in the "profiling task" (step used to train the compiler):
> https://bugs.python.org/issue36044

This seems backported to 3.8, so we can turn on PGO on rawhide.

For f31 and lower, we can provide the list manually.

Comment 11 Miro Hrončok 2019-08-14 23:51:18 UTC
Here are the 3.8 rawhide (f32-python side tag) scratch builds with:

diff --git a/python3.spec b/python3.spec
index ab617df..9cb550d 100644
--- a/python3.spec
+++ b/python3.spec
@@ -52,13 +52,7 @@ License: Python
 %bcond_without rpmwheels
 
 # Expensive optimizations (mainly, profile-guided optimizations)
-%ifarch %{ix86} x86_64
 %bcond_without optimizations
-%else
-# On some architectures, the optimized build takes tens of hours, possibly
-# longer than Koji's 24-hour timeout. Disable optimizations here.
-%bcond_with optimizations
-%endif
 
 # Run the test suite in %%check
 %bcond_without tests


https://koji.fedoraproject.org/koji/taskinfo?taskID=37044448
https://koji.fedoraproject.org/koji/taskinfo?taskID=37044493
https://koji.fedoraproject.org/koji/taskinfo?taskID=37044526
https://koji.fedoraproject.org/koji/taskinfo?taskID=37044540
https://koji.fedoraproject.org/koji/taskinfo?taskID=37044549

Comment 12 Miro Hrončok 2019-08-15 08:05:34 UTC
The average total build time (all arches) was 1:55:31. Not great, not terrible.

Comment 13 Jon Nettleton 2019-08-15 10:52:47 UTC
Sounds like we need faster build machines.  Are the ARMv7 builds done on bare metal or VMs on the aarch64 systems

Comment 14 Miro Hrončok 2019-08-15 10:54:43 UTC
> Are the ARMv7 builds done on bare metal or VMs on the aarch64 systems

AFAIK Everything si bare metal, but I'm no 100% sure.


See also https://kojipkgs.fedoraproject.org/work/tasks/4449/37044449/hw_info.log

Comment 16 Miro Hrončok 2019-09-03 13:20:00 UTC
This has been fixed in rawhide, if you want this in an earlier Fedora version as well, please submit a Pull Request.


Note You need to log in before you can comment on or make changes to this bug.