Bug 196157

Summary: mixed pic/non-pic with export-dynamic breaks tread-local storage (TLS)
Product: [Fedora] Fedora Reporter: Ben Liblit <liblit>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: binutils-2.17.50.0.2-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-06-30 08:34:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
source files and script to demonstrate the problem none

Description Ben Liblit 2006-06-21 16:58:01 UTC
Description of problem:

I have a collection of object files that manipulate a variable in thread-local
storage (TLS).  Some of these object files were compiled with "-fpic" and some
where not.  The final executable is being linked using "-Wl,--export-dynamic".

I find that the object files compiled without "-fpic" see one address for the
TLS variable, while the object files compiled without "-fpic" see a different
address.  Changing the value at one address has no affect on the other, of
course.  It's like I have two completely different variables.  Of the two, only
the one seen by the non-fpic code has the expected initial value.  That suggests
that the non-fpic one is the "right" one in some sense, and that the
pic-compiled code is getting a "wrong" address located somewhere else.

If I compile everything with "-fpic", the problem goes away.  If I compile
nothing with "-fpic", the problem goes away.  If I omit "-Wl,--export-dynamic"
at link time, the problem goes away.  If I don't make the variable thread-local,
the problem goes away.  Only with all of these things going on at the same time
does the bug appear.

Perhaps now you understand why the summary for this report is so densely worded.
 :-)

It's not clear to me whether this is a gcc bug, a linker bug, or perhaps even a
runtime loader bug.  I'm starting this with gcc, somewhat arbitrarily.  We might
need to reassign it if we determine the root cause is elsewhere.

Version-Release number of selected component (if applicable):

    gcc-4.1.1-1.fc5
    glibc-2.4-8
    binutils-2.16.91.0.6-5

How reproducible:

The problem as described is 100% reproducible.  I will attach a small collection
of files, including test script, that can be used to demonstrate the bug.

Steps to Reproduce:
1. Unpack the "bug.tar.gz" archive attached to this report.
2. Run the "run" script.
  
Actual results:

TLS variable has a different address in main.o and init.o.  Only the one in
main.o has the proper initial value (92), and assigning to one does not affect
the other:

    main.c: 7: in main(): before init:      *0xb7effa9c == 92
    init.c: 9: in init(): initial value:    *0xb7effaa4 == -1209008392
    init.c:11: in init(): after assignment: *0xb7effaa4 == 14
    main.c: 9: in main(): after init:       *0xb7effa9c == 92


Expected results:

All object files should agree on the variable's address and should see its
initial value as 92.  After init() changes this to 14, main() should also see
the value as 14:

    main.c: 7: in main(): before init:      *0xb7f46a9c == 92
    init.c: 9: in init(): initial value:    *0xb7f46a9c == 92
    init.c:11: in init(): after assignment: *0xb7f46a9c == 14
    main.c: 9: in main(): after init:       *0xb7f46a9c == 14

Additional info:

It doesn't matter which object is compiled with "-fpic".  Whichever object was
compiled using "-fpic", that's the object that will see the copy of the variable
that was not initialized to 92.

I named a few actions above that eliminate the problem (e.g. removing the linker
flag or not mixing pic/non-pic).  However, none of these are really viable
workarounds for me.  The example I've attached to this bug report is *much*
simplified from the original context in which I'm seeing the problem.  The
original context is an instrumented build of gnome-panel for the Cooperative Bug
Isolation Project (http://www.cs.wisc.edu/cbi/).  gnome-panel does not use
"-fpic" but does use "-Wl,--export-dynamic", whereas CBI's instrumentation
infrastructure has to be compiled using "-fpic" since it is sometimes linked in
with shared libraries.  So in this original context, I really don't have any
choice but to mix pic/non-pic and to use the "--export-dynamic" linker flag. 
Until this bug is resolved somehow, I cannot post working instrumented
gnome-panel packages.  :-(

Comment 1 Ben Liblit 2006-06-21 16:58:01 UTC
Created attachment 131294 [details]
source files and script to demonstrate the problem

Comment 2 Jakub Jelinek 2006-06-23 13:55:11 UTC
http://sources.redhat.com/ml/binutils/2006-06/msg00351.html
For FC-5 binutils the patch reversion can't be done (the bogus patch was applied
only after FC-5 froze), so only the last hunk in bfd/elf32-i386.c together
with ld/testsuite/ld-i386/tlsbin.dd fix is needed.

Comment 3 Jakub Jelinek 2006-06-30 08:34:06 UTC
Should be fixed in binutils-2.17.50.0.2-4 in rawhide.

Comment 4 Ben Liblit 2006-06-30 22:20:27 UTC
Thank you for the speedy response to this somewhat obscure problem, Jakub!

Regarding the fix in rawhide, can you tell me if this affects build-time tools
only, or is this also a change to the run-time loader?  That is, if I were to
use the rawhide binutils to *create* the executable, would it work properly on a
different machine that was still using the standard FC5 binutils without this fix?

Comment 5 Jakub Jelinek 2006-06-30 22:26:47 UTC
Yes, the bug is only in binutils, not glibc nor gcc.

Comment 6 Ben Liblit 2006-06-30 23:04:54 UTC
Ah, OK.  Somehow I'd convinced myself that binutils included the runtime loader.
 I see now that glibc provides that.  So a binutils-only bug means it only needs
to be fixed on the developer's machine.  Got it.

Thanks again!