Bug 450790 - glibc lroundl() clobbers call-saved condition registers.
Summary: glibc lroundl() clobbers call-saved condition registers.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 9
Hardware: powerpc
OS: Linux
low
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 449883 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-11 01:11 UTC by David Woodhouse
Modified: 2008-07-23 07:08 UTC (History)
3 users (show)

Fixed In Version: 2.8-8
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-23 07:08:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
gobect.i (451.19 KB, text/plain)
2008-06-11 09:12 UTC, David Woodhouse
no flags Details
Patch current for Fedora 9 glibc (4.75 KB, patch)
2008-07-08 01:43 UTC, W. Michael Petullo
no flags Details | Diff

Description David Woodhouse 2008-06-11 01:11:07 UTC
Epiphany and galeon crash on ppc; backtrace looks like this (this one is
epiphany)...

#0  0x0eaf5e18 in signal_emit_unlocked_R (node=<value optimized out>, 
    detail=<value optimized out>, instance=<value optimized out>, 
    emission_return=<value optimized out>, 
    instance_and_params=<value optimized out>) at gsignal.c:2285
#1  0x0eaf7abc in IA__g_signal_emit_valist (instance=<value optimized out>, 
    signal_id=<value optimized out>, detail=<value optimized out>, 
    var_args=<value optimized out>) at gsignal.c:2199

First inclination, given frame #1, was to look for va_list abuse, but I don't
see anything obvious. The crash is in accumulate(), which is inlined in
signal_emit_unlocked_R(). And the 'accumulator' variable seems to be NULL:

0xeaf5e04 <signal_emit_unlocked_R+1364>:	lwz     r9,112(r1)
0xeaf5e08 <signal_emit_unlocked_R+1368>:	lwz     r4,104(r1)
0xeaf5e0c <signal_emit_unlocked_R+1372>:	addi    r29,r1,176
0xeaf5e10 <signal_emit_unlocked_R+1376>:	mr      r3,r31
0xeaf5e14 <signal_emit_unlocked_R+1380>:	mr      r5,r29
0xeaf5e18 <signal_emit_unlocked_R+1384>:	lwz     r10,0(r9)
0xeaf5e1c <signal_emit_unlocked_R+1388>:	lwz     r6,4(r9)
0xeaf5e20 <signal_emit_unlocked_R+1392>:	mtctr   r10
0xeaf5e24 <signal_emit_unlocked_R+1396>:	bctrl
0xeaf5e28 <signal_emit_unlocked_R+1400>:	mr      r31,r3
0xeaf5e2c <signal_emit_unlocked_R+1404>:	mr      r3,r29
(gdb) p accumulator
$8 = <value optimized out>
(gdb) p $r9
$9 = 0

Looks like r9 is 'accumulator', we're loading accumulator->data into r6 at
0xeaf5e1c and loading accumulator->func into r10 at 0xeaf5e18, where we crash.

But since the accumulate() function is supposed to return TRUE if
(!accumulator), how can we ever reach that line of code with it being NULL?

Comment 1 David Woodhouse 2008-06-11 09:09:58 UTC
I modified the accumulate() function so it looks like this:

  if (!accumulator)
    return TRUE;
  g_warning("accumulator is %p\n", accumulator);
  continue_emission = accumulator->func (ihint, return_accu, handler_return,
accumulator->data);


I now see it crash like this:

(galeon:12846): GLib-GObject-WARNING **: accumulator is (nil)


Program received signal SIGSEGV, Segmentation fault.
0x0e7c6e80 in signal_emit_unlocked_R (node=0x1027e910, detail=0,
instance=0x1093d400, emission_return=0x0, 
    instance_and_params=0xff9e8978) at gsignal.c:2285
2285	  continue_emission = accumulator->func (ihint, return_accu,
handler_return, accumulator->data);

Removing 'static inline' from the offending function makes galeon and epiphany
work. As does building with -O1 or -Os instead of -O2. Building with '-O1
-fgcse' breaks it.



Comment 2 David Woodhouse 2008-06-11 09:12:09 UTC
Created attachment 308912 [details]
gobect.i

gcc -pthread -O1 -fgcse -g -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m32 -c gsignal.i  -fPIC  -o gsignal.o

Comment 3 Jakub Jelinek 2008-06-11 10:45:59 UTC
Weird.  I see every call to g_log with the "accumulator is %p\n" format string
preceeded by beq 3, .LXXXX which jumps around it, and there is
        lwz 8,28(22)     # <variable>.accumulator,
        stw 8,60(1)      # accumulator,
        cmpwi 3,8,0      #, tmp1199,
which should dominate all those jumps.  As cr3 should be call-saved, I don't see
a problem with that.  Can you find out where between this and the g_log which
prints accumulator is (nil) %cr3 changed?  Thanks.

Comment 4 David Woodhouse 2008-06-11 14:48:51 UTC
Curiouser and curiouser. I get to the point it's going to fail by putting a
breakpoint on gtk_mozilla_realize, then when I hit that putting a breakpoint on
signal_emit_unlocked_R. Hit 'continue' 86 times, and then the _next_ time I hit
continue, it'll crash.

However, if I step through it manually or do 'fini', it gets to the end of the
function just fine. After doing this, I set breakpoints at the point that %cr3
is set, and the point it's going to crash. I 'continue' again, and I don't hit
the breakpoint at the beginning of the function, or the breakpoint where %cr3 is
set. But I _do_ hit the breakpoint right before it crashes. How did we get back
into the function?


Breakpoint 2, signal_emit_unlocked_R (node=0x1018c668, detail=0,
instance=0x105e7980, emission_return=0x0, 
    instance_and_params=0xfff0a748) at gsignal.c:2303
2303      GValue *return_accu, accu = { 0, };
(gdb) fini
Run till exit from #0  signal_emit_unlocked_R (node=0x1018c668, detail=0,
instance=0x105e7980, emission_return=0x0, 
    instance_and_params=0xfff0a748) at gsignal.c:2303
IA__g_signal_emit_valist (instance=0x105e7980, signal_id=264874048, detail=0,
var_args=0xfff0aa14) at gsignal.c:2220
2220              g_free (error);
Value returned is $1 = 0
(gdb)  break *signal_emit_unlocked_R+472
Breakpoint 3 at 0xe7c8e88: file gsignal.c, line 2334.
(gdb)  break *signal_emit_unlocked_R+1992
Breakpoint 4 at 0xe7c9478: file gsignal.c, line 2282.
(gdb) disp/i $pc
1: x/i $pc
0xe7cae88 <IA__g_signal_emit_valist+1960>:      lwz     r29,28(r1)
(gdb) disp/x $cr
2: /x $cr = 0x24042442
(gdb) c
Continuing.

Breakpoint 4, signal_emit_unlocked_R (node=0x101bb518, detail=0,
instance=0x1065b4c0, emission_return=0x0, 
    instance_and_params=0xfff0b018) at gsignal.c:2282
2282      if (!accumulator)
2: /x $cr = 0x22042424
1: x/i $pc
0xe7c9478 <signal_emit_unlocked_R+1992>:        beq-    cr3,0xe7c94e4
<signal_emit_unlocked_R+2100>
(gdb) p accumulator
$2 = (SignalAccumulator *) 0x0
(gdb) si
2284      g_warning("accumulator is %p\n", accumulator);
2: /x $cr = 0x22042424
1: x/i $pc
0xe7c947c <signal_emit_unlocked_R+1996>:        lwz     r3,-32768(r30)
(gdb) 
0x0e7c9480      2284      g_warning("accumulator is %p\n", accumulator);
2: /x $cr = 0x22042424
1: x/i $pc
0xe7c9480 <signal_emit_unlocked_R+2000>:        li      r4,16
(gdb) 
0x0e7c9484      2284      g_warning("accumulator is %p\n", accumulator);
2: /x $cr = 0x22042424
1: x/i $pc
0xe7c9484 <signal_emit_unlocked_R+2004>:        lwz     r5,-32308(r30)
(gdb) 
0x0e7c9488      2284      g_warning("accumulator is %p\n", accumulator);
2: /x $cr = 0x22042424
1: x/i $pc
0xe7c9488 <signal_emit_unlocked_R+2008>:        lwz     r6,60(r1)
(gdb) 
0x0e7c948c      2284      g_warning("accumulator is %p\n", accumulator);
2: /x $cr = 0x22042424
1: x/i $pc
0xe7c948c <signal_emit_unlocked_R+2012>:        crclr   4*cr1+eq
(gdb) p/x $r6
$3 = 0x0
(gdb) p/x $r8
$4 = 0x0

Comment 5 David Woodhouse 2008-06-12 08:05:00 UTC
*** Bug 449883 has been marked as a duplicate of this bug. ***

Comment 6 David Woodhouse 2008-06-12 11:28:54 UTC
OK, I'm slightly less confused now. The function is recursing, through enough
other functions that I hadn't noticed. During the offending call, %cr3 is
getting stomped on during the call to g_closure_invoke(). It's supposed to be
caller-saved, but it's not getting preserved. 

0x0e7c9304	2384	      g_closure_invoke (class_closure,
2: x/i $pc
0xe7c9304 <signal_emit_unlocked_R+1180>:	bl      0xe7af570 <IA__g_closure_invoke>
1: /x $cr = 0x28022822
(gdb) 
IA__g_closure_invoke (closure=0x101bb600, return_value=0x0, n_param_values=1,
param_values=0xff9120f8, invocation_hint=0xff912064)
    at gclosure.c:465
465	{
2: x/i $pc
0xe7af570 <IA__g_closure_invoke>:	mflr    r0
1: /x $cr = 0x28022822
(gdb) fini
Run till exit from #0  0x0e7af574 in IA__g_closure_invoke (closure=0x101bb600,
return_value=0x0, n_param_values=1, 
    param_values=0xff9120f8, invocation_hint=0xff912064) at gclosure.c:465
signal_emit_unlocked_R (node=0x101bb618, detail=0, instance=0x1065c0c0,
emission_return=0x0, instance_and_params=0xff9120f8)
    at gsignal.c:2283
2283	  if (!accumulator)
2: x/i $pc
0xe7c9308 <signal_emit_unlocked_R+1184>:	bne-    cr3,0xe7c9318
<signal_emit_unlocked_R+1200>
1: /x $cr = 0x22042484


Comment 7 David Woodhouse 2008-06-12 11:30:26 UTC
It's supposed to be callee-saved, I mean. More tea...

Comment 8 David Woodhouse 2008-06-12 14:53:52 UTC
Any advice on finding _where_ %cr3 is getting corrupted? 

It could be anywhere -- there's a _huge_ stack of functions being called from
g_closure_invoke(), and when we get back, it's been changed. 

It's probably not even gsignal.c which is actually at fault -- it's just that
when it's compiled with -fgcse and actually _relies_ on %cr3 being callee-saved,
it breaks due to the problem which is elsewhere.

Is there a way to track %cr through the call stack somehow? 

I've tried building (just gsignal.c for now) with -finstrument-functions and
printing %cr in my __cyg_profile_func_enter() and __cyg_profile_func_exit()
routines, but that just gets me a lot of output and no smoking gun -- and it's
particularly hard to interpret because gcc emits the call to
__cyg_profile_func_exit() before actually restoring %cr anyway.


Comment 9 David Woodhouse 2008-06-12 15:32:11 UTC
Is it feasible to hack the compiler to caller-save these condition registers,
but rather than just restoring them, instead just _check_ that the called
function didn't trample on them -- and trap immediately if it did?


Comment 10 David Woodhouse 2008-06-12 18:10:06 UTC
I believe the function stomping on %cr3 is libm's lroundl. It uses %cr3 without
saving and restoring it.



Comment 11 Ryan S. Arnold 2008-06-12 19:13:13 UTC
This is due to a bug in an lroundl patch I provided late last year.  I used a
callee-save cr field without saving/restoring.  I'll correct it shortly with a
patch to mainline GLIBC and will append the patch here.

Comment 12 Ryan S. Arnold 2008-06-12 19:23:30 UTC
Looks like Jakub beat me to it.

Comment 13 David Woodhouse 2008-06-14 08:19:01 UTC
Jakub, any word on when we can have a glibc erratum pushed for F-9? 

I've built it locally with your patch and confirmed that inkscape, epiphany,
galeon and vlc no longer crash.

Comment 14 David Woodhouse 2008-06-25 23:07:40 UTC
ping

Comment 15 W. Michael Petullo 2008-07-06 06:58:03 UTC
David, could you state for reference where this glibc patch is available publicly?

Comment 17 W. Michael Petullo 2008-07-08 01:43:09 UTC
Created attachment 311216 [details]
Patch current for Fedora 9 glibc

This patch modifies the one listed in Comment #16 so that it applies against
glibc-2.8 (the version presently in Fedora 9). I have found that this patch
fixes this bug on my 32-bit PowerPC laptop. I tested epiphany and inkscape.

Comment 18 Fedora Update System 2008-07-08 20:27:17 UTC
glibc-2.8-7 has been submitted as an update for Fedora 9

Comment 19 Fedora Update System 2008-07-09 21:46:01 UTC
glibc-2.8-7 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update glibc'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-6260

Comment 20 Fedora Update System 2008-07-23 07:08:23 UTC
glibc-2.8-8 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.