Epiphany and galeon crash on ppc; backtrace looks like this (this one is epiphany)... #0 0x0eaf5e18 in signal_emit_unlocked_R (node=<value optimized out>, detail=<value optimized out>, instance=<value optimized out>, emission_return=<value optimized out>, instance_and_params=<value optimized out>) at gsignal.c:2285 #1 0x0eaf7abc in IA__g_signal_emit_valist (instance=<value optimized out>, signal_id=<value optimized out>, detail=<value optimized out>, var_args=<value optimized out>) at gsignal.c:2199 First inclination, given frame #1, was to look for va_list abuse, but I don't see anything obvious. The crash is in accumulate(), which is inlined in signal_emit_unlocked_R(). And the 'accumulator' variable seems to be NULL: 0xeaf5e04 <signal_emit_unlocked_R+1364>: lwz r9,112(r1) 0xeaf5e08 <signal_emit_unlocked_R+1368>: lwz r4,104(r1) 0xeaf5e0c <signal_emit_unlocked_R+1372>: addi r29,r1,176 0xeaf5e10 <signal_emit_unlocked_R+1376>: mr r3,r31 0xeaf5e14 <signal_emit_unlocked_R+1380>: mr r5,r29 0xeaf5e18 <signal_emit_unlocked_R+1384>: lwz r10,0(r9) 0xeaf5e1c <signal_emit_unlocked_R+1388>: lwz r6,4(r9) 0xeaf5e20 <signal_emit_unlocked_R+1392>: mtctr r10 0xeaf5e24 <signal_emit_unlocked_R+1396>: bctrl 0xeaf5e28 <signal_emit_unlocked_R+1400>: mr r31,r3 0xeaf5e2c <signal_emit_unlocked_R+1404>: mr r3,r29 (gdb) p accumulator $8 = <value optimized out> (gdb) p $r9 $9 = 0 Looks like r9 is 'accumulator', we're loading accumulator->data into r6 at 0xeaf5e1c and loading accumulator->func into r10 at 0xeaf5e18, where we crash. But since the accumulate() function is supposed to return TRUE if (!accumulator), how can we ever reach that line of code with it being NULL?
I modified the accumulate() function so it looks like this: if (!accumulator) return TRUE; g_warning("accumulator is %p\n", accumulator); continue_emission = accumulator->func (ihint, return_accu, handler_return, accumulator->data); I now see it crash like this: (galeon:12846): GLib-GObject-WARNING **: accumulator is (nil) Program received signal SIGSEGV, Segmentation fault. 0x0e7c6e80 in signal_emit_unlocked_R (node=0x1027e910, detail=0, instance=0x1093d400, emission_return=0x0, instance_and_params=0xff9e8978) at gsignal.c:2285 2285 continue_emission = accumulator->func (ihint, return_accu, handler_return, accumulator->data); Removing 'static inline' from the offending function makes galeon and epiphany work. As does building with -O1 or -Os instead of -O2. Building with '-O1 -fgcse' breaks it.
Created attachment 308912 [details] gobect.i gcc -pthread -O1 -fgcse -g -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -c gsignal.i -fPIC -o gsignal.o
Weird. I see every call to g_log with the "accumulator is %p\n" format string preceeded by beq 3, .LXXXX which jumps around it, and there is lwz 8,28(22) # <variable>.accumulator, stw 8,60(1) # accumulator, cmpwi 3,8,0 #, tmp1199, which should dominate all those jumps. As cr3 should be call-saved, I don't see a problem with that. Can you find out where between this and the g_log which prints accumulator is (nil) %cr3 changed? Thanks.
Curiouser and curiouser. I get to the point it's going to fail by putting a breakpoint on gtk_mozilla_realize, then when I hit that putting a breakpoint on signal_emit_unlocked_R. Hit 'continue' 86 times, and then the _next_ time I hit continue, it'll crash. However, if I step through it manually or do 'fini', it gets to the end of the function just fine. After doing this, I set breakpoints at the point that %cr3 is set, and the point it's going to crash. I 'continue' again, and I don't hit the breakpoint at the beginning of the function, or the breakpoint where %cr3 is set. But I _do_ hit the breakpoint right before it crashes. How did we get back into the function? Breakpoint 2, signal_emit_unlocked_R (node=0x1018c668, detail=0, instance=0x105e7980, emission_return=0x0, instance_and_params=0xfff0a748) at gsignal.c:2303 2303 GValue *return_accu, accu = { 0, }; (gdb) fini Run till exit from #0 signal_emit_unlocked_R (node=0x1018c668, detail=0, instance=0x105e7980, emission_return=0x0, instance_and_params=0xfff0a748) at gsignal.c:2303 IA__g_signal_emit_valist (instance=0x105e7980, signal_id=264874048, detail=0, var_args=0xfff0aa14) at gsignal.c:2220 2220 g_free (error); Value returned is $1 = 0 (gdb) break *signal_emit_unlocked_R+472 Breakpoint 3 at 0xe7c8e88: file gsignal.c, line 2334. (gdb) break *signal_emit_unlocked_R+1992 Breakpoint 4 at 0xe7c9478: file gsignal.c, line 2282. (gdb) disp/i $pc 1: x/i $pc 0xe7cae88 <IA__g_signal_emit_valist+1960>: lwz r29,28(r1) (gdb) disp/x $cr 2: /x $cr = 0x24042442 (gdb) c Continuing. Breakpoint 4, signal_emit_unlocked_R (node=0x101bb518, detail=0, instance=0x1065b4c0, emission_return=0x0, instance_and_params=0xfff0b018) at gsignal.c:2282 2282 if (!accumulator) 2: /x $cr = 0x22042424 1: x/i $pc 0xe7c9478 <signal_emit_unlocked_R+1992>: beq- cr3,0xe7c94e4 <signal_emit_unlocked_R+2100> (gdb) p accumulator $2 = (SignalAccumulator *) 0x0 (gdb) si 2284 g_warning("accumulator is %p\n", accumulator); 2: /x $cr = 0x22042424 1: x/i $pc 0xe7c947c <signal_emit_unlocked_R+1996>: lwz r3,-32768(r30) (gdb) 0x0e7c9480 2284 g_warning("accumulator is %p\n", accumulator); 2: /x $cr = 0x22042424 1: x/i $pc 0xe7c9480 <signal_emit_unlocked_R+2000>: li r4,16 (gdb) 0x0e7c9484 2284 g_warning("accumulator is %p\n", accumulator); 2: /x $cr = 0x22042424 1: x/i $pc 0xe7c9484 <signal_emit_unlocked_R+2004>: lwz r5,-32308(r30) (gdb) 0x0e7c9488 2284 g_warning("accumulator is %p\n", accumulator); 2: /x $cr = 0x22042424 1: x/i $pc 0xe7c9488 <signal_emit_unlocked_R+2008>: lwz r6,60(r1) (gdb) 0x0e7c948c 2284 g_warning("accumulator is %p\n", accumulator); 2: /x $cr = 0x22042424 1: x/i $pc 0xe7c948c <signal_emit_unlocked_R+2012>: crclr 4*cr1+eq (gdb) p/x $r6 $3 = 0x0 (gdb) p/x $r8 $4 = 0x0
*** Bug 449883 has been marked as a duplicate of this bug. ***
OK, I'm slightly less confused now. The function is recursing, through enough other functions that I hadn't noticed. During the offending call, %cr3 is getting stomped on during the call to g_closure_invoke(). It's supposed to be caller-saved, but it's not getting preserved. 0x0e7c9304 2384 g_closure_invoke (class_closure, 2: x/i $pc 0xe7c9304 <signal_emit_unlocked_R+1180>: bl 0xe7af570 <IA__g_closure_invoke> 1: /x $cr = 0x28022822 (gdb) IA__g_closure_invoke (closure=0x101bb600, return_value=0x0, n_param_values=1, param_values=0xff9120f8, invocation_hint=0xff912064) at gclosure.c:465 465 { 2: x/i $pc 0xe7af570 <IA__g_closure_invoke>: mflr r0 1: /x $cr = 0x28022822 (gdb) fini Run till exit from #0 0x0e7af574 in IA__g_closure_invoke (closure=0x101bb600, return_value=0x0, n_param_values=1, param_values=0xff9120f8, invocation_hint=0xff912064) at gclosure.c:465 signal_emit_unlocked_R (node=0x101bb618, detail=0, instance=0x1065c0c0, emission_return=0x0, instance_and_params=0xff9120f8) at gsignal.c:2283 2283 if (!accumulator) 2: x/i $pc 0xe7c9308 <signal_emit_unlocked_R+1184>: bne- cr3,0xe7c9318 <signal_emit_unlocked_R+1200> 1: /x $cr = 0x22042484
It's supposed to be callee-saved, I mean. More tea...
Any advice on finding _where_ %cr3 is getting corrupted? It could be anywhere -- there's a _huge_ stack of functions being called from g_closure_invoke(), and when we get back, it's been changed. It's probably not even gsignal.c which is actually at fault -- it's just that when it's compiled with -fgcse and actually _relies_ on %cr3 being callee-saved, it breaks due to the problem which is elsewhere. Is there a way to track %cr through the call stack somehow? I've tried building (just gsignal.c for now) with -finstrument-functions and printing %cr in my __cyg_profile_func_enter() and __cyg_profile_func_exit() routines, but that just gets me a lot of output and no smoking gun -- and it's particularly hard to interpret because gcc emits the call to __cyg_profile_func_exit() before actually restoring %cr anyway.
Is it feasible to hack the compiler to caller-save these condition registers, but rather than just restoring them, instead just _check_ that the called function didn't trample on them -- and trap immediately if it did?
I believe the function stomping on %cr3 is libm's lroundl. It uses %cr3 without saving and restoring it.
This is due to a bug in an lroundl patch I provided late last year. I used a callee-save cr field without saving/restoring. I'll correct it shortly with a patch to mainline GLIBC and will append the patch here.
Looks like Jakub beat me to it.
Jakub, any word on when we can have a glibc erratum pushed for F-9? I've built it locally with your patch and confirmed that inkscape, epiphany, galeon and vlc no longer crash.
ping
David, could you state for reference where this glibc patch is available publicly?
http://sourceware.org/ml/libc-hacker/2008-06/msg00001.html
Created attachment 311216 [details] Patch current for Fedora 9 glibc This patch modifies the one listed in Comment #16 so that it applies against glibc-2.8 (the version presently in Fedora 9). I have found that this patch fixes this bug on my 32-bit PowerPC laptop. I tested epiphany and inkscape.
glibc-2.8-7 has been submitted as an update for Fedora 9
glibc-2.8-7 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update glibc'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-6260
glibc-2.8-8 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report.