Created attachment 493900 [details] gdb backtrace Description of problem: gcc -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv -Xlinker -export-dynamic -o python-debug \ Modules/python.o \ -L. -lpython2.7_d -lpthread -ldl -lutil -lm /bin/sh: line 1: 21015 Segmentation fault (core dumped) LD_LIBRARY_PATH=/builddir/build/BUILD/Python-2.7.1/build/debug: CC='gcc -pthread' LDSHARED='gcc -pthread -shared ' OPT='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv' ./python-debug -E /builddir/build/BUILD/Python-2.7.1/setup.py -q build RPM build errors: Version-Release number of selected component (if applicable): python-2.7.1-6.fc15
full logs at https://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=197370
a similar problem still exists in python-2.7.2-4.fc16 on ppc64: /builddir/build/BUILD/Python-2.7.2/Modules/posixmodule.c:7317: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -Wl,-z,relro -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -Wl,-z,relro -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -Wl,-z,relro -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv -Xlinker -export-dynamic -o python-debug \ Modules/python.o \ -L. -lpython2.7_d -lpthread -ldl -lutil -lm /bin/sh: line 1: 3599 Segmentation fault (core dumped) LD_LIBRARY_PATH=/builddir/build/BUILD/Python-2.7.2/build/debug: CC='gcc -pthread' LDSHARED='gcc -pthread -shared ' OPT='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -Wl,-z,relro -m64 -mminimal-toc -D_GNU_SOURCE -fPIC -fwrapv' ./python-debug -E /builddir/build/BUILD/Python-2.7.2/setup.py -q build make: *** [sharedmods] Error 139 full logs at http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=245909 @Jakub: The latest successfull python build was python-2.7.1-4.fc15, python-2.7.1-5.fc15 already failed with this python.debug segmentation fault. The difference is that -4 built with gcc-4.5.1-6 and -5 with gcc-4.6.0-0.12 I wonder if gcc is acting up here
It is much more probable it is an application bug (relying on undefined behavior etc.) than a gcc bug, though of course that can't be ruled out. If you suspect a miscompilation, let somebody familiar with the source code first try to see if some gcc flags makes it working again (e.g. if compiling with -O0 makes it work, or -O2 -fno-strict-aliasing, etc.). If yes, try to do a binary search between objects compiled with the working options and non-working to narrow the problem to one source file. If not, do a similar binary search between older compiler compiled objects and new compiler compiled objects. Once you know which source file is problematic, first compile it with -W -Wall, look at all the warnings, see if some of them might not show up problem in the code, if not, try to narrow down the problem to a particular source file (and see if it can be reproduced even with -fno-inline, that helps to narrow it down to a function), then try to create a self-contained testcase calling that function with the right arguments and abort or somehow else signal if that function misbehaves.
Here's the start of the definition of that function: static PyObject * call_function(PyObject ***pp_stack, int oparg #ifdef WITH_TSC , uint64* pintr0, uint64* pintr1 #endif ) It could be that the WITH_TSC is confused, perhaps, but there is this forwards-declaration: #ifdef WITH_TSC static PyObject * call_function(PyObject ***, int, uint64*, uint64*); #else static PyObject * call_function(PyObject ***, int); #endif and it appears to be used consistently throughout. I don't know if this is significant, but frame #0 in that backtrace is reported with the arguments in reverse order to those of the declaration. Here's the frame from the backtrace: #0 call_function (pintr1=0xfffcf8b6be0, pintr0=0xfffcf8b6bd8, oparg=<optimized out>, pp_stack=0xfffcf8b6be8) at /builddir/build/BUILD/Python-2.7.1/Python/ceval.c:4105 Every other frame appears to be reported with the arguments in the same order as in the declaration; this one is reported in reverse order.
If I disable line 56 within: ppc_getcounter(uint64 *v) in Python/ceval.c, then the problem goes away: 32 typedef unsigned long long uint64; 33 34 /* PowerPC support. 35 "__ppc__" appears to be the preprocessor definition to detect on OS X, whereas 36 "__powerpc__" appears to be the correct one for Linux with GCC 37 */ 38 #if defined(__ppc__) || defined (__powerpc__) 39 40 #define READ_TIMESTAMP(var) ppc_getcounter(&var) 41 42 static void 43 ppc_getcounter(uint64 *v) 44 { 45 register unsigned long tbu, tb, tbu2; 46 47 loop: 48 asm volatile ("mftbu %0" : "=r" (tbu) ); 49 asm volatile ("mftb %0" : "=r" (tb) ); 50 asm volatile ("mftbu %0" : "=r" (tbu2)); 51 if (__builtin_expect(tbu != tbu2, 0)) goto loop; 52 53 /* The slightly peculiar way of writing the next lines is 54 compiled better by GCC than any other way I tried. */ 55 ((long*)(v))[0] = tbu; 56 /*((long*)(v))[1] = tb; */ /* <==== this is the bug */ 57 } 58 59 #elif defined(__i386__) (gdb) p sizeof(long) $44 = 8 (gdb) p sizeof(uint64) $45 = 8 Looks like lines 55 and 56 are erroneously assuming that a long is 4 bytes on this arch: line 56 above is trashing the next value on the machine's stack. The code has been this way since ppc_getcounter was added, in: http://hg.python.org/cpython/rev/f455bbe7ea7e I may have broken this in: http://hg.python.org/cpython/rev/419ca089d365/ which was for: http://bugs.python.org/issue10655 by (perhaps) generalizing support from ppc to (ppc and ppc64) (not sure about this).
Workaround for now is to stop using "--with-tsc" when configure the debug build on ppc64 Fix committed to "python" in rawhide (for f17): http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=commitdiff;h=76e85fb7737abc82d729292607f9e2759645e29c Building python-2.7.2-6.fc17 for dist-rawhide Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3296206 Fix committed to "python3" in rawhide (for f17): http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=commitdiff;h=4763ff864f559286fdcf5090d30db55311119ecb Building python3-3.2.1-4.fc17 for dist-rawhide Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3296208
Created attachment 519514 [details] Candidate patch to fix --with-tsc on ppc64, and to fix aliasing violations on 32-bit ppc Tested and seems to work on ppc64; am about to test on 32-bit ppc
I've applied the patch from attachment #519514 [details] to both python and python3 in rawhide, and re-enabled --with-tsc on ppc64 for the debug build; rebuilding both now: python: http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=commitdiff;h=92ed49e1f9a286b6ee791a29f6b25be191d0c4c5 Building python-2.7.2-7.fc17 for dist-rawhide Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3296752 python3: http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=commitdiff;h=ceb359a69b285160f7997c0b77de1dfd3567e80e Building python3-3.2.1-5.fc17 for dist-rawhide Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3296765
For Fedora 16, let's simply disable --with-tsc on ppc64 debug python (f16) http://pkgs.fedoraproject.org/gitweb/?p=python.git;a=commitdiff;h=0be4d5a7fc2fbfd4e558e7143abef04cb580c4b9 Building python-2.7.2-4.1.fc16 for f16-candidate Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3296788 python3 (f16) http://pkgs.fedoraproject.org/gitweb/?p=python3.git;a=commitdiff;h=0c0fcb4642f6d6b385b95231d868d676081e6299 Building python3-3.2.1-2.1.fc16 for f16-candidate Task info: http://koji.fedoraproject.org/koji/taskinfo?taskID=3296797
Test case: enable tscdump, and run some bytecodes (e.g. by "import logging") $ python-debug -c "import sys; sys.settscdump(True); import logging" $ python3-debug -c "import sys; sys.settscdump(True); import logging" Notes on --with-tsc: http://hg.python.org/cpython/file/f455bbe7ea7e/Misc/SpecialBuilds.txt
[All of the builds succeeded] Do I need to do a Bodhi update to F16 to pull the fix in for ppc64, or is this unneeded?
no, a bodhi update is not needed, I can have a different n-v-r on PPC than on the primary archs, although I try to avoid it if possible. The only requirements are that the patch is in git so that the next python update will work out of the box on PPC and that the next n-v-r on the primary archs is higher than what we have on PPC. Both requirements are met and I can just pull in the new package. Unfortunately python and python3 still fail to build in koji, although the builds progressed beyond the secfault issue. I've opened bugzilla 732998 to track the new problem
--with-tsc bug and patch reported upstream as http://bugs.python.org/issue12872