Bug 566770 - turn on computed_goto
Summary: turn on computed_goto
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: python3
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Dave Malcolm
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 622060
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-02-19 16:44 UTC by Toshio Ernie Kuratomi
Modified: 2010-08-20 21:58 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 622060 (view as bug list)
Environment:
Last Closed: 2010-08-19 19:04:44 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
preprocessed output from gcc (482.19 KB, text/plain)
2010-08-06 22:22 UTC, Toshio Ernie Kuratomi
no flags Details

Description Toshio Ernie Kuratomi 2010-02-19 16:44:27 UTC
Description of problem:

Python 3 adds a --with-computed-gotos=yes configure option that tells python to use the computed goto gcc extension in its inner loop.  On modern CPUs with branch prediction, this can be a major speedup.  (PyCon talk mentions 17% speedup).

We should turn this on for our python3 package with something like this:

%configure --enable-ipv6 --with-wide-unicode --enable-shared --with-system-ffi --with-computed-gotos=yes

Comment 1 Andrew McNabb 2010-02-19 17:00:25 UTC
Are there any downsides to this?  How well tested is it?  Overall, it sounds like a great idea.

Comment 2 Andrew McNabb 2010-02-19 17:01:35 UTC
dmalcolm, if you like the idea of this, I'd be happy to make the change.  It would be an easy change to make, and it would give me some practice with package maintenance.  Let me know what you think.

Comment 3 Toshio Ernie Kuratomi 2010-02-19 22:45:23 UTC
dmalcolm mentions that someone mentioned concerns about seg faults.  This is all I found when googling: http://bugs.python.org/issue6603

Segfault on x86_64 when --with-computed-goto=yes and --with-tsc is applied.  It has a patch attached which is added on several branches.  We'll want to make sure that it's been applied to the tarball we're using or apply the patch when we enable this.

Comment 4 Andrew McNabb 2010-02-20 00:07:41 UTC
Toshio, thanks for the information.  I would be happy to look into this some more and try to get everything to work.

David, do you have any thoughts?

Comment 5 Dave Malcolm 2010-02-20 03:59:59 UTC
(I'm at PyCon and, alas, having lots of difficulty getting connectivity).

Ideally with performance tuning, we should do before/after benchmarking and compare with/without changing the flag.  For the systemtap static probe work, I was able to use the unladen swallow benchmarking suite; some notes on doing this are here:
https://bugzilla.redhat.com/show_bug.cgi?id=545179#c12 onwards

Unfortunately, the us-bmarks code is for Python2 not Python3, some of the bmarks may be portable with 2to3, but not all.

If you do try turning on the option, please ensure that the test suite doesn't regress!

Comment 6 Bug Zapper 2010-07-30 10:52:41 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Toshio Ernie Kuratomi 2010-08-06 21:20:51 UTC
Setting to rawhide again -- we'll apply this for the F15 cycle starting now and see how it goes.  I've verified that the patch from http://bugs.python.org/issue6603

has been applied in the tarball we're using.  I didn't take benchmarks but we can do that anytime during the F15 cycle by running benchmarks, turning off computed-goto, rerunning benchmarks.

There's a %global at the top of the spec to toggle this on and off in the build.  Just change:

%global with_computed_gotos yes
==>
%global with_computed_gotos no

I've kicked off a build with computed-gotos yes.  I' need to compare test suite output to the last build without computed-gotos when it completes.

https://koji.fedoraproject.org/koji/buildinfo?buildID=181256
http://koji.fedoraproject.org/koji/taskinfo?taskID=2386306

Comment 8 Dave Malcolm 2010-08-06 21:35:56 UTC
Builds failed with internal compiler errors:

on x86_64:
http://koji.fedoraproject.org/koji/getfile?taskID=2386310&name=build.log
gcc -pthread -c -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv    -I. -IInclude -I/builddir/build/BUILD/Python-3.1.2/Include -I/usr/lib64/libffi-3.0.9/include   -fPIC -DPy_BUILD_CORE -o Python/ceval.o /builddir/build/BUILD/Python-3.1.2/Python/ceval.c
  [snip parallel tasks]
/builddir/build/BUILD/Python-3.1.2/Python/ceval.c: In function 'PyEval_EvalFrameEx':
/builddir/build/BUILD/Python-3.1.2/Python/ceval.c:3004:1: internal compiler error: in save_call_clobbered_regs, at caller-save.c:911
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/ccQpsjeF.out file, please attach this to your bugreport.
make: *** [Python/ceval.o] Error 1


on i386:
http://koji.fedoraproject.org/koji/getfile?taskID=2386311&name=build.log
gcc -pthread -c -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -D_GNU_SOURCE -fPIC -fwrapv    -I. -IInclude -I/builddir/build/BUILD/Python-3.1.2/Include -I/usr/lib/libffi-3.0.9/include   -fPIC -DPy_BUILD_CORE -o Python/ceval.o /builddir/build/BUILD/Python-3.1.2/Python/ceval.c
  [snip parallel tasks]
/builddir/build/BUILD/Python-3.1.2/Python/ceval.c: In function 'PyEval_EvalFrameEx':
/builddir/build/BUILD/Python-3.1.2/Python/ceval.c:3004:1: internal compiler error: in save_call_clobbered_regs, at caller-save.c:911

Comment 9 Toshio Ernie Kuratomi 2010-08-06 22:22:45 UTC
Created attachment 437273 [details]
preprocessed output from gcc

Here's the output taken from the buildroot

Comment 10 Toshio Ernie Kuratomi 2010-08-19 19:04:44 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=2412372

Statistically, we're getting the same number of failed unittests with the two runs through the test suite with the new code as with the old code:

(Old)
  14 BAD
 299 GOOD
  23 SKIPPED
 336 total

   6 BAD
 307 GOOD
  23 SKIPPED
 336 total


(New)
  14 BAD
 299 GOOD
  23 SKIPPED
 336 total

   6 BAD
 307 GOOD
  23 SKIPPED
 336 total

Looks good for now.

Comment 11 Toshio Ernie Kuratomi 2010-08-20 21:58:46 UTC
One further note, the upstream RFE mentions that gcc sometimes optimizes away the jumps unless you add -fno-gcse but that might be behaviour that was fixed in gcc-4.x.  I performed the suggested test to verify that we're not optimizing that away:

1) compile ceval.c with -S to generate an assembler file
2) Check that the assembler file has around 85 to 90 jmp's

http://bugs.python.org/issue4753#msg78660

gcc -pthread -c -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv    -I. -IInclude -I/builddir/build/BUILD/Python-3.1.2/Include -I/usr/lib64/libffi-3.0.9/include   -fPIC -DPy_BUILD_CORE -o Python/ceval.s /builddir/build/BUILD/Python-3.1.2/Python/ceval.c -S

grep -E "jmp[[:space:]]\*%" ceval.s |wc -l

95

So it looks like we're okay without turning off any gcc optimizations.


Note You need to log in before you can comment on or make changes to this bug.