Bug 489290

Summary: Crash in fontconfig+pango+thunderbird with -O3
Product: [Fedora] Fedora Reporter: Behdad Esfahbod <behdad>
Component: thunderbirdAssignee: Jan Horak <jhorak>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: dirtyepic, esigra, gecko-bugs-nobody, jakub, matt
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-28 11:25:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Behdad Esfahbod 2009-03-09 11:23:13 UTC
Humm, a couple weeks ago I started facing a crasher with recompiled rawhide fontconfig+pango+thunderbird, and I'm fairly confident by now that it's a gcc bug.  It's not fixed with rawhide gcc yet.  My previous gcc update was from Feb 9.

It only happens with -O3.  So, I'm giving up for now.  But if you recompile rawhide fontconfig with -O3 and try to start thunderbird, you should hit it.  It's at the very beginning of the function FcFontSetMatchInternal.  If I optimize that function with less than 3, the crash is gone.

I can attach disassemble output.

Comment 1 Jakub Jelinek 2009-03-09 11:32:18 UTC
The confidence comes from?  Could be strict aliasing violation, other kind of undefined code.  In any case, try with __attribute__((__optimize__(2))) vs.
__attribute__((__optimize__(3))) on the function you suspect, if that makes a difference, find out with what arguments it is being called when it misbehaves and what the result difference is, try to create a self-contained testcase that
calls that function with those arguments and checks for the outcome.
Disassembly isn't very useful when you don't know what to look for.

Comment 2 Behdad Esfahbod 2009-03-09 11:54:25 UTC
__attribute__((__optimize__(2))) works fine, __attribute__((__optimize__(3))) makes it crash.  The crash happens when a local variable is written to.  This is the beginning of the function:

__attribute__((__optimize__(3)))
static FcPattern *
FcFontSetMatchInternal (FcConfig    *config,
                        FcFontSet   **sets,
                        int         nsets,
                        FcPattern   *p,
                        FcResult    *result)
{
    double          score[NUM_MATCH_VALUES], bestscore[NUM_MATCH_VALUES];
    int             f;
    FcFontSet       *s;
    FcPattern       *best;
    int             i;
    int             set;

    for (i = 0; i < NUM_MATCH_VALUES; i++)
        bestscore[i] = 0;


And the crash happens when bestscore is written to.  If I move the bestscore assignment further down, it crashes then.  Means, I can't really test the function much.  I'm mostly out of ideas on how to debug this.  Still investigating though.

Comment 3 Jakub Jelinek 2009-03-09 12:06:09 UTC
Is the function inlined into another one or not?  Use
__attribute__((__optimize__(3),__noinline__))
to find out.  Does it also fail if you drop the static from the function (and adjust a prototype as well if any)?
Do the arguments matter for the crash or not (e.g. if you call FnFontSetMatchInternal from the debugger early on with 0, 0, 0, 0, 0, does it crash in the same spot?

In any case, I'll need a preprocessed source and the complete set of command line options used to trigger it.

Comment 4 Behdad Esfahbod 2009-03-09 12:35:56 UTC
Humm, I think I see what's wrong now:

The code causing this is the SSE2 instruction:

0x00442f31 <FcFontSetMatchInternal+33>:	movapd %xmm0,0x40(%esp)

However, %esp has the value:

esp            0xbfffc554	0xbfffc554

which is not ALIGN16'ed and hence cannot be used with movapd operation as far as my googling skills suggest.

(Ok, yes, forgot to mention that I'm specifying command-line options for pentium-m)

Comment 5 Jakub Jelinek 2009-03-09 12:47:43 UTC
Then can you please find out what misaligns the stack pointer?  Normally on i?86 and x86_64 %esp/%rsp should be 16-byte aligned (in particular, the return address slot (what's pushed by call insn) on the stack should be 16 byte aligned).
Are you calling code from -mpreferred-stack-boundary=2 compiled functions or something similar?

Comment 6 Behdad Esfahbod 2009-03-09 12:59:00 UTC
The crash only happens with thunderbird and firefox, so I suspect they do something funky with the stack.  %esp is unaligned at the beginning of that function call.  It's unaligned some 20 function calls earlier also.

With rawhide:

- run "thunderbird -g"
- in gdb do:

    b gtk_entry_new
    r
    c
    info registers

Observe that esp is unaligned.

Should be reassigned to gecko'ish?

Comment 7 Jakub Jelinek 2009-03-09 13:08:27 UTC
Should be reassigned to whatever component misaligned the stack.
Just do a bt
and then go up one by one and look at what address the next frame eip is stored at, I guess main will be ok, probably a couple of functions below it in the call trace and then something that is misaligned.

Comment 8 Behdad Esfahbod 2009-03-09 13:37:02 UTC
Doing "info frame" suggests that all frames up to 99 are unaligned.  Trying it on frame 100 causes a gdb assertion failure.  Reassigning to thunderbird for further inspection.  Thanks for the help Jakub.

Comment 9 Bug Zapper 2009-06-09 12:02:22 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 10 Bug Zapper 2010-04-27 13:08:22 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 11 Bug Zapper 2010-06-28 11:25:38 UTC
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.