Created attachment 331563 [details] F-11-Alpha-ppc64 - boot.log Description of problem: Latest rawhide kernel.ppc64 fails to boot on power5 ppc system. F-11-Alpha worked. Version-Release number of selected component (if applicable): kernel-2.6.29-0.99.rc4.ppc64 How reproducible: Every time Steps to Reproduce: 1. Install F-11-Alpha 2. yum update 3. reboot Actual results: | Elapsed time since release of system processors: 50075 mins 22 secs Config file read, 1024 bytes Welcome to Fedora! Hit <TAB> for boot options Welcome to yaboot version 1.3.14 (Red Hat 1.3.14-9.fc11) Enter "help" to get some basic usage information boot: 2.6.29-0.99.rc4 Please wait, loading kernel... Elf64 kernel loaded... Loading ramdisk... ramdisk loaded at 03600000, size: 3515 Kbytes OF stdout device is: /vdevice/vty@30000000 Hypertas detected, assuming LPAR ! command line: ro console=hvc0 rhgb quiet root=/dev/SNAKEVG/SNAKEROOT memory layout at init: alloc_bottom : 000000000396f000 alloc_top : 0000000008000000 alloc_top_hi : 00000000f5000000 rmo_top : 0000000008000000 ram_top : 00000000f5000000 Looking for displays instantiating rtas at 0x00000000076a1000 ... done boot cpu hw idx 0000000000000000 starting cpu hw idx 0000000000000002... done starting cpu hw idx 0000000000000004... done starting cpu hw idx 0000000000000006... done copying OF device tree ... Building dt strings... Building dt structure... DEFAULT CATCH!, exception-handler=DEFAULT CATCH!, exception-handler=fffffffffffffff6 at %SRR0: 0000000000c3bf3c %SRR1: 800000000000b002 Call History ------------ throw - 93903c $call-method - 946d5c (poplocals) - 93a758 key-fillq - 94727c ?xoff - 947378 (poplocals) - 93a758 (stdout-write) - 9479a4 (type) - 947a30 _syscatch - 94df8c _exception - 94d500 <excp> - 939890 _syscatch - 94def0 _syscatch - 94def0 invalid pointer - 3d6010013ae96970 invalid pointer - 3aab52803d201001 invalid pointer - 3d6010013c00cccc Client's Fix Pt Regs: 00 00100000000001f4 0000000060000000 00000000deadbeef fffffffffffffffc 04 0000000000000000 0000000000000000 0000000000000000 0000000000000001 08 0000000000001000 0000030002001000 0000000000000003 0000000000007000 0c 0000000022000044 0000000000000000 000000000021a354 000000000021a3a4 10 0000000000e3dd70 0000000000e3dd70 0000000000c4e728 0000000000c546e8 14 0000000000000000 0000000000c81948 0000000000000028 00000000024d15f8 18 0000000000c13000 0000000000c38000 0000000000c15000 0000000000c16fc0 1c 0000000000c20000 0000000000c3fdf0 0000000000c11fd8 0000000000c10fe8 Special Regs: %IV: 00000300 %CR: 8a000044 %XER: 20000000 %DSISR: 08000000 %SRR0: 0000000000c3bf3c %SRR1: 800000000000b002 %LR: 0000000000c3bed0 %CTR: 0000000000000000 %DAR: 0000000060000000 Virtual PID = 2 PFW: Unable to send error log! ofdbg 0 > DEFAULT CATCH!, exception-handler=fff00700 :EFAULT CATCH!, at exception-handler= fff00300 %SRR0 at %SRR0:DEFAULT CATCH!, r=000000000000000GDEFAULT CATCH!, %SRR1: DEFAULT CATCH!, 0000000000000002 Call History ------------ evaluate - invalid pointer - invalid pointer - a invalid pointer - 0 eval - S catch - ? display-checkpoint - (poplocals) - ¯ (checkpoint) - ë ?xoff - 7 (poplocals) - ¯ (stdout-write) - (emit) - ó (cr - cr - 3 _syscatch - ï My Fix Pt Regs: 00 0000000000c479f0 0000000000000000 00000000deadbeef 0000000000e3dd00 04 0000000000000041 000000000000001a ffffffffffffffbf 0000000000c03010 08 0000000008000000 80000000001f9ca0 80000000001f9ca0 80000000001af548 0c 0000000000004000 0000000000000000 80000000001af46c 0000000000c00060 10 80000000747bf404 0000000000e3dd70 0000000000c4a630 0000000000c685a9 14 0000000000c174ff 0000000000000001 0000000000000000 0000000000000000 18 0000000000c13000 0000000000c38000 0000000000c14ec0 0000000000c16f40 1c 0000000000c20000 0000000000c3fdf0 0000000000c11f40 0000000000c10ff8 Special Regs: %IV: 00000700 %CR: 42000000 %XER: 00000002 %DSISR: 00000000 %SRR0: 0000000000c479f0 %SRR1: 8000000000023002 %LR: 0000000000c4a638 %CTR: 0000000000c479f0 %DAR: 0000000000000000 Virtual PID = 6 PFW: Unable to send error log! þí`0(m"¯t(`6Ð jµð, unknown word Expected results: * kernel should boot Additional info: * See attached successful boot log from F-11-Alpha-ppc64
Set Architecture to powerpc as that matches the bug description.
Set arch to ppc6, since that matches the bug description better. :) My YDL PowerStation is failing similarly, and jwb's G5 seems to be dying around the same spot too.
fail... ppc64, I meant... PowerStation isn't going into the exception handler, its just hanging, but its at more or less the exact same spot.
might be related to bug #485267 ?
(In reply to comment #4) > might be related to bug #485267 ? Don't think so, we're barely even getting from yaboot to the kernel here, and at least the PowerStation runs F10 just fine, including dual-head X and everything.
I've noticed odd behavior since the Alpha kernels. Oopsing on ssh, etc. So I've started the equivalent of 'koji bisect'. So far, the good kernels are: kernel-2.6.28-3.fc11.ppc64 kernel-2.6.29-0.18.rc0.git9.fc11.ppc64 kernel-2.6.29-0.40.rc1.git6.fc11.ppc64 Starting with kernel-2.6.29-0.53.rc2.git1.fc11.ppc64 I get oopses from sig 4s.
kernel-2.6.29-0.48.rc2.git1.fc11 works without oopses. It's the last successful build before 0.53. The changelog for 0.53 is: * Mon Jan 26 2009 Kyle McMartin <kyle> - Update git-linus.diff to bf50c903faba4ec7686ee8a570ac384b0f20814d. - drm-next.patch merged. - linux-2.6.28-sunrpc-ipv6-rpcbind.patch: update for Kconfig moves. * Sat Jan 24 2009 Hans de Goede <hdegoede> - Fix atk0110 sensor numbering * Fri Jan 23 2009 Hans de Goede <hdegoede> - Change acpi_enforce_resources default to strict, this will cause hwmon drivers which clash with io resources reserved by ACPI to no longer load, avoiding both the ACPI code and the native driver trying to drive the same IC at the same time - Add ASUS ACPI hwmon interface driver (atk0110), this will give (restore) hwmon functionality on most ASUS boards through the firmware Since acpi and atk0110 don't apply to this class of machine, perhaps either the drm-next.patch or git-linus.diff are the "bad" changes.
For grins, I tried a vanilla -rc5 build. This hangs after opening the console device, so whatever change causes this problem is in the upstream kernel. Seems somewhere between rc2 and rc5. Joy.
I spent most of yesterday doing a git bisect between -rc2 and -rc5. None of the kernels worked. They all hung after opening the display device (or more likely failed after but the output wasn't caught on the screen). I spoke to Ben Herrenschmidt a bit last night. He tried a g5_defconfig build and a build using the fedora .config file. Both of his kernels worked on his dual G5 which is mostly identical to mine. We have fairly sizeable toolchain differences though, since he was building on some Ubuntu box and I was building on rawhide. For grins this morning, I did a 'make local' of a devel kernel on my F9 box. Copied this over and rebooted and it works just fine. [jwboyer@localhost ~]$ uname -a Linux localhost.localdomain 2.6.29-0.119.rc5.fc11.ppc64 #1 SMP Tue Feb 17 07:00:20 EST 2009 ppc64 ppc64 ppc64 GNU/Linux I'm beginning to suspect that it might be the toolchain in rawhide.
[jwboyer@localhost boot]$ file vmlinuz-2.6.29-0.124.rc5.fc11.ppc64 vmlinuz-2.6.29-0.124.rc5.fc11.ppc64: ELF 64-bit MSB shared object, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, stripped [jwboyer@localhost boot]$ strings vmlinuz-2.6.29-0.124.rc5.fc11.ppc64 | grep gcc Linux version 2.6.29-0.124.rc5.fc11.ppc64 (jwboyer) (gcc version 4.4.0 20090213 (Red Hat 4.4.0-0.18) (GCC) ) #1 SMP Tue Feb 17 08:13:43 EST 2009 The above kernel fails to boot. Installing identical kernel built on F9: [jwboyer@localhost ~]$ sudo yum remove kernel-2.6.29-0.124.rc5.fc11.ppc64 Loaded plugins: refresh-packagekit Setting up Remove Process Resolving Dependencies --> Running transaction check ---> Package kernel.ppc64 0:2.6.29-0.124.rc5.fc11 set to be erased --> Finished Dependency Resolution Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Removing: kernel ppc64 2.6.29-0.124.rc5.fc11 installed 95 M Transaction Summary ================================================================================ Install 0 Package(s) Update 0 Package(s) Remove 1 Package(s) Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Erasing : kernel 1/1 Removed: kernel.ppc64 0:2.6.29-0.124.rc5.fc11 Complete! [jwboyer@localhost ~]$ sudo yum localinstall --nogpgcheck ./kernel-2.6.29-0.124.rc5.fc11.ppc64.rpm Loaded plugins: refresh-packagekit Setting up Local Package Process Examining ./kernel-2.6.29-0.124.rc5.fc11.ppc64.rpm: kernel-2.6.29-0.124.rc5.fc11.ppc64 Marking ./kernel-2.6.29-0.124.rc5.fc11.ppc64.rpm as an update to kernel-2.6.29-0.74.rc3.git3.fc11.ppc64 Marking ./kernel-2.6.29-0.124.rc5.fc11.ppc64.rpm as an update to kernel-2.6.29-0.119.rc5.fc11.ppc64 Resolving Dependencies --> Running transaction check ---> Package kernel.ppc64 0:2.6.29-0.124.rc5.fc11 set to be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: kernel ppc64 2.6.29-0.124.rc5.fc11 ./kernel-2.6.29-0.124.rc5.fc11.ppc64.rpm 91 M Transaction Summary ================================================================================ Install 1 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 91 M Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : kernel 1/1 Installed: kernel.ppc64 0:2.6.29-0.124.rc5.fc11 Complete! [jwboyer@localhost ~]$ cd /boot [jwboyer@localhost boot]$ file vmlinuz-2.6.29-0.124.rc5.fc11.ppc64 vmlinuz-2.6.29-0.124.rc5.fc11.ppc64: ELF 64-bit MSB shared object, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, stripped [jwboyer@localhost boot]$ strings vmlinuz-2.6.29-0.124.rc5.fc11.ppc64 | grep gcc Linux version 2.6.29-0.124.rc5.fc11.ppc64 (jwboyer.homelinux.org) (gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC) ) #1 SMP Tue Feb 17 08:05:39 EST 2009 That kernel boots. It does get an oops in some of the compat stuff, but it at least boots.
Jakub, any ideas on this one?
I have one more experiment I'd like to try. Rawhide has CONFIG_RELOCATABLE=y set for ppc64 kernels. Going to try to disable that, rebuild on both F9 and rawhide, and see what happens.
Try the usual stuff, unless it is obvious where the problem is, if it works compiled with gcc 4.3.2 and doesn't with 4.4.0, do a binary search among .o files built by both compilers to narrow down which .c file matters, then try to narrow it to a function, build a self-contained testcase from it. If e.g. building the kernel with 4.4.0 but say -O1 instead of -O2 (or some other option combination) works, you can do the binary search between -O1 and -O2 built objects etc.
(In reply to comment #12) > I have one more experiment I'd like to try. Rawhide has CONFIG_RELOCATABLE=y > set for ppc64 kernels. Going to try to disable that, rebuild on both F9 and > rawhide, and see what happens. CONFIG_RELOCATABLE not set still doesn't help rawhide/gcc 4.4 builds. It seems to make the oops on ssh I was seeing with gcc 4.3.2 go away though. Interesting, but not really that much help.
(In reply to comment #13) > Try the usual stuff, unless it is obvious where the problem is, if it works > compiled with gcc 4.3.2 and doesn't with 4.4.0, do a binary search among .o > files built by both compilers to narrow down which .c file matters, then try to > narrow it to a function, build a self-contained testcase from it. If e.g. > building the kernel with 4.4.0 but say -O1 instead of -O2 (or some other option > combination) works, you can do the binary search between -O1 and -O2 built > objects etc. OK. So I took "binary search among .o files" to mean: "copy the .o files from gcc 4.4 into a gcc 4.3 build tree. recompile vmlinux. test" Hopefully I got that part right. If not, please yell now because that's what I did. Going on the assumption above, and given the proximity of the failure/hang, I copied arch/powerpc/kernel/prom_init.o from the gcc 4.4 build tree to the gcc 4.3 build tree and redid 'make vmlinux'. Copied the resulting vmlinux onto the machine and rebooted, and it hangs just like a full kernel built with 4.4.
Created attachment 332273 [details] objdump -d output of gcc 4.3 prom_init.o
Created attachment 332274 [details] objdump -d output of gcc 4.4 prom_init.o
Attached the objdump output of the differing .o files above. I haven't had time to look at them in detail, but in my 5 minute glance at them I noticed a distinct lack of mtctr instructions in functions that use va_start/va_end/var args stuff in the gcc4.4 file. See call_prom_ret for example.
The lack of fewer mtctr is due to the compiler no longer using bdnz (ie, branch on count reg) loops with the particular gcc 4.4 revision.
I tried rebuilding the kernel with -O2 instead of -Os using gcc 4.4. The following was emitted during the build: drivers/md/bitmap.c: In function ‘bitmap_count_page’: drivers/md/bitmap.c:1070: internal compiler error: in reload, at reload1.c:1173 Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/cc5gfyny.out file, please attach this to your bugreport.
Oh, and: net/ipv6/addrconf_core.c: In function ‘__ipv6_addr_type’: net/ipv6/addrconf_core.c:77: internal compiler error: in reload, at reload1.c:1173 Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/cchwwh9u.out file, please attach this to your bugreport. I'll attach those files shortly
Created attachment 332296 [details] ice output
Created attachment 332297 [details] ice output2
This is all with gcc-4.4.0-0.18.ppc. I noticed there is an update in rawhide today to -19, but when looking at the changelog I can't see how any of the mentioned PRs would apply here.
Both ICEs are likely the same bug, reduced testcase: /* { dg-do compile } */ /* { dg-options "-O2" } */ /* { dg-options "-O2 -mtune=cell -mminimal-toc" { target { powerpc*-*-* && lp64 } } } */ struct A { char *a; unsigned int b : 1; unsigned int c : 31; }; struct B { struct A *d; }; void foo (struct B *x, unsigned long y) { if (x->d[y].c) return; if (x->d[y].b) x->d[y].a = 0; } Will look into this tomorrow. That said, kernel not booting with -Os is unrelated to this, so if you could make progress on finding in which function things went wrong in prom_init.o and ideally with what arguments it has been called (i.e. try to make a self-contained testcase from it), it would be greatly appreciated.
The ICEs are now tracked as http://gcc.gnu.org/PR39226 upstream. In the light of tho, can you try prom_init.c built with 4.4 and -Os, but without -mtune=cell (say -mtune=power4)? Also, do you have a rough idea into which function in prom_init.c to look? And, can you attach preprocessed prom_init.i and the list of gcc options used to compile it? Thanks.
(In reply to comment #26) > The ICEs are now tracked as http://gcc.gnu.org/PR39226 upstream. In the light > of tho, can you try prom_init.c built with 4.4 and -Os, but without -mtune=cell > (say -mtune=power4)? Yes. Actually, because of some config options set for the kernel, both -mtune=power4 and -mtune=cell are getting passed. I believe the latter one "wins". I'll unset the option that causes it to tuned for cell today. > Also, do you have a rough idea into which function in prom_init.c to look? Unfortunately, not yet. The sucky part about prom_init.c is that all of it runs before the kernel has relocated itself to the normal addresses. It is still doing calls into OF for various things in this file before doing that relocation. I made a bit more progress on finding a failing function by commenting out the prom_check_displays function, which was causing the screen to be blanked. After I did that, I see that I get an exception from OF during what I believe is the scan_dt_build_struct function, which is called from flatten_device_tree. More info as I find it. > And, can you attach preprocessed prom_init.i and the list of gcc options used > to compile it? I'll get the .i file as soon as I can. The compile options for both 4.3 and 4.4 are here: http://fpaste.org/paste/3900
Created attachment 332404 [details] prom_init.i output from gcc 4.4 prom_init.i output, generated with: gcc -m64 -Wp,-MD,arch/powerpc/kernel/.prom_init.o.d -nostdinc -isystem /usr/lib/gcc/ppc64-redhat-linux/4.4.0/include -Iinclude -I/home/jwboyer/src/kernel/devel/kernel-2.6.28/linux-2.6.28.ppc64/arch/powerpc/include -include include/linux/autoconf.h -D__KERNEL__ -Iarch/powerpc -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os -msoft-float -pipe -Iarch/powerpc -mminimal-toc -mtraceback=none -mcall-aixdesc -mtune=power4 -mtune=cell -mno-altivec -mno-spe -mspe=no -funit-at-a-time -mno-string -Wa,-maltivec -Wframe-larger-than=2048 -fno-stack-protector -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -mno-minimal-toc -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(prom_init)" -D"KBUILD_MODNAME=KBUILD_STR(prom_init)" -E -o arch/powerpc/kernel/prom_init.i arch/powerpc/kernel/prom_init.c
Thanks. Could you also try building prom_init.c with -O0 and see if that doesn't help? If -O0 works, GCC 4.4 has __attribute__((__optimize__(N))) for N 0-3, which you perhaps could use to narrow it down to a particular function (compile everything with -Os, and for a bunch of functions add __attribute__((__optimize__(0))) to compile them at -O0.
(In reply to comment #29) > Thanks. Could you also try building prom_init.c with -O0 and see if that > doesn't help? It does actually. Both -O0 and -O1 builds of prom_init.c allow the system to boot (when the .o is copied to the 4.3 kernel and the vmlinux is rebuilt with it.) -O2 and -Os fail similarly. > If -O0 works, GCC 4.4 has __attribute__((__optimize__(N))) for N > 0-3, which you perhaps could use to narrow it down to a particular function > (compile everything with -Os, and for a bunch of functions add > __attribute__((__optimize__(0))) to compile them at -O0. I'll try this now.
Oh, I did try removing -mtune=cell however that does not appear to make a difference.
Created attachment 332437 [details] Instrumented prom_init.c It seems prom_claim is the function that needs the -O0 optimization. Building this C file with: gcc -m64 -Wp,-MD,arch/powerpc/kernel/.prom_init.o.d -nostdinc -isystem /usr/lib/gcc/ppc64-redhat-linux/4.4.0/include -Iinclude -I/home/jwboyer/src/kernel/devel/kernel-2.6.28/linux-2.6.28.ppc64/arch/powerpc/include -include include/linux/autoconf.h -D__KERNEL__ -Iarch/powerpc -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os -msoft-float -pipe -Iarch/powerpc -mminimal-toc -mtraceback=none -mcall-aixdesc -mtune=power4 -mtune=cell -mno-altivec -mno-spe -mspe=no -funit-at-a-time -mno-string -Wa,-maltivec -Wframe-larger-than=2048 -fno-stack-protector -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -mno-minimal-toc -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(prom_init)" -D"KBUILD_MODNAME=KBUILD_STR(prom_init)" -c -o arch/powerpc/kernel/prom_init.o arch/powerpc/kernel/prom_init.c copying the resulting .o to the gcc 4.3 tree, and booting the resulting vmlinux works.
Looking at that function, I noticed that the bulk of it should be optimized out given that: if (align == 0 && (OF_WORKAROUNDS & OF_WA_CLAIM)) will always evaluate to false since OF_WORKAROUNDS is #defined to 0 on CONFIG_PPC64. So the function logically boils down to this: static unsigned int __init prom_claim(unsigned long virt, unsigned long size, unsigned long align) { struct prom_t *_prom = &RELOC(prom); return call_prom("claim", 3, 1, (prom_arg_t)virt, (prom_arg_t)size, (prom_arg_t)align); } Doing an objdump -d of the bad prom_init.o, I can't see anywhere that call_prom is actually called, and call_prom is not an inlined function in this case. The call_prom call is pretty important here. prom_claim is called from alloc_up and alloc_down, which is used to claim memory from OF (roughly speaking). These findings are also consistent with the little amount of crash data I was seeing, as the offending offset was nearby a make_room call, which calls alloc_up (which is supposed to call prom_claim->call_prom).
With -Os and no optimize attribute prom_claim is versioned (so that it is only called with 2 arguments instead of 3, align is not used) and in the tree optimized dump I still see it (and prom_printf ... trying: 0x... calls before that). Nothing in the assembly references those, so it is optimized out during RTL optimizations.
Actually no, it is just section anchoring, as all the strings are in .rodata and so they are referenced through .LANCHOR1 + offset. In my .s file "claim" string is at .LANCHOR1 + 898, and the T.657 function calls call_prom with that string as the first argument.
BTW, couldn't the struct prom_t *_prom = &RELOC(prom); line be moved into if (align == 0 && ...) body? Seems it isn't needed outside of that if's body and as it contains a function call, it can't be optimized out. That's just random optimization idea, I really still have no idea where to look for a problem.
(In reply to comment #35) > Actually no, it is just section anchoring, as all the strings are in .rodata > and so they are referenced through .LANCHOR1 + offset. > In my .s file "claim" string is at .LANCHOR1 + 898, and the T.657 function > calls call_prom with that string as the first argument. Yes, I think you're right. I see similar things in my objdump. I'll attach the good and bad objdumps i have.
Created attachment 332463 [details] objdump -d output of good prom_init.o
Created attachment 332465 [details] objdump -d output of failing prom_init.o
Just so that everyone sees what I'm seeing, here are some pictures of the early prom stuff from a good and bad kernel. (This isn't part of dmesg, hence the jpgs). Good: http://jwboyer.fedorapeople.org/ppc64-good.jpg Bad: http://jwboyer.fedorapeople.org/ppc64-bad.jpg You can definitely see oddness in the alloc_up call that is done. Bad returns ffffffffffffffff (-1?), while the good succeeds.
Thanks, that was enough to find out what's wrong. Surprisingly, this doesn't appear to be a regression, but a long standing ppc -m64 sibcall optimization bug. extern void abort (void); __attribute__ ((noinline)) static int foo (int x) { return x; } __attribute__ ((noinline)) unsigned int bar (int x) { return foo (x + 6); } unsigned long l = (unsigned int) -4; int main (void) { if (bar (-10) != l) abort (); return 0; } works when compiled with -m32 (any optimization level) or -m64 -O{0,1}, or -m64 -O{2,3,s} -fno-optimize-sibling-calls, but aborts for -m64 -O{2,3,s}, with all of 4.1.x, 4.3.x and trunk GCCs. PPC64 psABI says: "Functions shall return values of type int, long, enum, short, and char, or a pointer to any type, as unsigned or signed integers as appropriate, zero- or sign-extended to 64 bits if necessary, in r3." which really means that it is not valid to do a sibcall in between a function that returns < 64-bit signed integral and a function that returns < 64-bit unsigned integral, as the value must be sign-extended to 64-bits in the first case and zero-extended in the second case. Surprises me this wasn't discovered years ago.
(In reply to comment #41) > Thanks, that was enough to find out what's wrong. Surprisingly, this doesn't > appear to be a regression, but a long standing ppc -m64 sibcall optimization > bug. Awesome. Thanks Jakub. Doing some bugzilla housekeeping to mark this against gcc and put it as an F11-Beta blocker. As soon as we get a fixed gcc, I'll be happy to test.
I've built local kernels and used the kernels from koji built with gcc-4.4.0-21 and they all work now (aside from the unrelated module loading bug). I believe this bug can be closed out. Thanks again Jakub
Yup, recent kernels boot on the powerstation too, closing bug.
The -O2 -mtune=cell ICEs discussed in #c20 through #c26 should be now fixed in gcc-4.4.0-0.22.