Bug 87836

Summary: Static linked binary threading problems
Product: [Retired] Red Hat Linux Reporter: Shawn Walker <drevil>
Component: libcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: mitr
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-04-09 07:01:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Makefile used to build static binary none

Description Shawn Walker 2003-04-02 23:42:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
Recently I had the opportunity to port "Adventure Game Studio" to Linux. I've
been statically linking all the libraries to avoid the C++ ABI issues since it
is not an open source game.

Under RedHat 8.0, the statically linked binary I made worked perfectly, it also
worked under Debian Woody 3.0, and other distributions.

Under RedHat 9.0, it just spuriously aborts for some reason, running it through
gdb reveals this process:

(gdb) file ags
Reading symbols from ags...done.
(gdb) r
Starting program: /home/swalker/games/kq2vga/ags
Adventure Creator v2.54 Interpreter
Copyright (c) 1999-2001 Chris Jones
ACI version 2.54.525
 
Program received signal SIG32, Real-time event 32.
0x081cebc9 in sigsuspend ()
(gdb) bt
#0  0x081cebc9 in sigsuspend ()
#1  0xbfffe320 in ?? ()
#2  0x0815990d in __pthread_wait_for_restart_signal ()
#3  0x08159b6b in pthread_create ()
(gdb) c
Continuing.
CD-ROM Audio support enabled.
Pentium Pro CPU detected.
Music file found and initialized.
 
Program received signal SIG32, Real-time event 32.
0x081cebc9 in sigsuspend ()
(gdb) bt
#0  0x081cebc9 in sigsuspend ()
#1  0xbfffe4c0 in ?? ()
#2  0x0815990d in __pthread_wait_for_restart_signal ()
#3  0x08159b6b in pthread_create ()
(gdb) c
Continuing.
Checking sound inits.
 
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x4002e2e6 in ?? ()
#2  0x40033631 in ?? ()
#3  0x40028b99 in ?? ()
#4  0x082101ca in _dl_init ()
#5  0x081eba6c in dl_open_worker ()
#6  0x081eaddb in _dl_catch_error ()
#7  0x081eb647 in _dl_open ()
#8  0x081ecbb8 in do_dlopen ()
#9  0x081eaddb in _dl_catch_error ()
#10 0x081ecacd in __libc_dlopen_mode ()
#11 0x081e8f4c in __nss_lookup_function ()
#12 0x081e9906 in __nss_lookup ()
#13 0x081e9f0a in __nss_passwd_lookup ()
#14 0x081e575b in getpwuid_r ()
#15 0x081e5430 in getpwuid ()
#16 0x0813db16 in _xdga_private_create_screen ()
(gdb) c
Continuing.
 
Program received signal SIGABRT, Aborted.
0x081ceb61 in kill ()
(gdb) bt
#0  0x081ceb61 in kill ()
#1  0x0815aa92 in __pthread_raise ()
#2  0x081ceea9 in abort ()
#3  0x08132e8d in _xwin_signal_handler ()
#4  <signal handler called>
(gdb) c
Continuing.
 
Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
(gdb)

Now the *really* weird part is that this statically linked binary will run just
fine on Debian Woody 3.0, but it won't run under RedHat 9, that platform I
compiled it on!

Dynamically linked version of the executable also work just fine as well. I'll
be glad to share what linking process I use if that will help. But this is a
serious issue for me as it's preventing me from making another relase of the game.

On a side note, I'm seeing this same issue when compiling the latest version of
subversion from the svn repository.

*ANY* suggestions or feedback would be appreciated.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
See description.

Actual Results:  I get "aborted" at the command line.

Expected Results:  Statically linked binary should run fine on RedHat 9, just as
it does on Debian Woody 3.0.

Additional info:

Comment 1 Jakub Jelinek 2003-04-04 14:54:44 UTC
At the point where it segfaults under gdb, I'd need the content of
/proc/<pid_of_the_debuged_program>/maps
to find out where exactly it is crashing.
Also, you might want to check ftp://people.redhat.com/jakub/glibc/errata/9/
which is current RHL9 glibc errata candidate.

Comment 2 Shawn Walker 2003-04-04 16:01:33 UTC
Okay, here's a cat of the map file at the point that it seg faults in GDB:

08048000-082aa000 r-xp 00000000 03:07 906758     /home/swalker/games/kq2vga/ags
082aa000-082d4000 rw-p 00261000 03:07 906758     /home/swalker/games/kq2vga/ags
082d4000-085b1000 rwxp 00000000 00:00 0
40000000-40002000 rw-p 00000000 00:00 0
40002000-4001a000 r--p 00000000 03:08 295125     /etc/ld.so.cache
40025000-40034000 r-xp 00000000 03:08 1015896    /lib/libpthread-0.10.so
40034000-40037000 rw-p 0000f000 03:08 1015896    /lib/libpthread-0.10.so
40037000-40077000 rw-p 00000000 00:00 0
40077000-401aa000 r-xp 00000000 03:08 1015874    /lib/libc-2.3.2.so
401aa000-401ae000 rw-p 00132000 03:08 1015874    /lib/libc-2.3.2.so
401ae000-401b0000 rw-p 00000000 00:00 0
401b0000-401c5000 r-xp 00000000 03:08 1015867    /lib/ld-2.3.2.so
401c5000-401c6000 rw-p 00014000 03:08 1015867    /lib/ld-2.3.2.so
401c6000-408cb000 rw-p 00000000 00:00 0
408cb000-408d6000 r-xp 00000000 03:08 1015888    /lib/libnss_files-2.3.2.so
408d6000-408d7000 rw-p 0000a000 03:08 1015888    /lib/libnss_files-2.3.2.so
bf400000-bf401000 ---p 00000000 00:00 0
bf401000-bf600000 rwxp 00001000 00:00 0
bf600000-bf601000 ---p 00000000 00:00 0
bf601000-bf800000 rwxp 00001000 00:00 0
bfff1000-c0000000 rwxp ffff2000 00:00 0

BTW, what does the contents of maps tell you exactly? Forgive me for being ignorant.

I'll check out the glibc errata too. Thanks!

Comment 3 Shawn Walker 2003-04-04 16:42:53 UTC
Alright, I've installed/upgraded *all* of the glibc errata packages you pointed
to, still have the issue, here's a dump of the maps file for the process:

08048000-082ab000 r-xp 00000000 03:07 906758     /home/swalker/games/kq2vga/ags
082ab000-082d5000 rw-p 00262000 03:07 906758     /home/swalker/games/kq2vga/ags
082d5000-085b2000 rwxp 00000000 00:00 0
40000000-40002000 rw-p 00000000 00:00 0
40002000-4001a000 r--p 00000000 03:08 294943     /etc/ld.so.cache
4001a000-40025000 r-xp 00000000 03:08 1015887    /lib/libnss_files-2.3.2.so
40025000-40026000 rw-p 0000a000 03:08 1015887    /lib/libnss_files-2.3.2.so
40026000-40035000 r-xp 00000000 03:08 1015895    /lib/libpthread-0.10.so
40035000-40038000 rw-p 0000f000 03:08 1015895    /lib/libpthread-0.10.so
40038000-40078000 rw-p 00000000 00:00 0
40078000-401ad000 r-xp 00000000 03:08 1015873    /lib/libc-2.3.2.so
401ad000-401b1000 rw-p 00134000 03:08 1015873    /lib/libc-2.3.2.so
401b1000-401b3000 rw-p 00000000 00:00 0
401b3000-401c8000 r-xp 00000000 03:08 1015849    /lib/ld-2.3.2.so
401c8000-401c9000 rw-p 00014000 03:08 1015849    /lib/ld-2.3.2.so
401c9000-408ce000 rw-p 00000000 00:00 0
bf400000-bf401000 ---p 00000000 00:00 0
bf401000-bf600000 rwxp 00001000 00:00 0
bf600000-bf601000 ---p 00000000 00:00 0
bf601000-bf800000 rwxp 00001000 00:00 0
bfff2000-c0000000 rwxp ffff3000 00:00 0

I'll try static linking it against the glibc debug library next instead of the
normal one. Thanks.

Comment 4 Shawn Walker 2003-04-04 16:50:57 UTC
Okay, here's the results of a backtrance and maps dump when static linking
against the DEBUG versions of the libdl, libc, libm and libpthread using the
*errata* glibc packages:

gdb backtrace:
(gdb) r
Starting program: /home/swalker/games/kq2vga/ags
Adventure Creator v2.54 Interpreter
Copyright (c) 1999-2001 Chris Jones
ACI version 2.54.525
 
Program received signal SIG32, Real-time event 32.
0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at
../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54
54      ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c: No such file or
directory.
        in ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c
(gdb) bt
#0  0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at
../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54
#1  0x08159965 in __pthread_wait_for_restart_signal (self=0x82cdfc0) at
pthread.c:1151
#2  0x08159bc7 in __pthread_create_2_1 (thread=0xfffffffc, attr=0xfffffffc,
start_routine=0xfffffffc, arg=0xfffffffc) at restart.h:34
(gdb) c
Continuing.
CD-ROM Audio support enabled.
Pentium Pro CPU detected.
Music file found and initialized.
 
Program received signal SIG32, Real-time event 32.
0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at
../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54
54      in ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c
(gdb) bt
#0  0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at
../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54
#1  0x08159965 in __pthread_wait_for_restart_signal (self=0x82cdfc0) at
pthread.c:1151
#2  0x08159bc7 in __pthread_create_2_1 (thread=0xfffffffc, attr=0xfffffffc,
start_routine=0xfffffffc, arg=0xfffffffc) at restart.h:34
(gdb) c
Continuing.
Checking sound inits.
 
Program received signal SIG32, Real-time event 32.
0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at
../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54
54      in ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c
(gdb) bt
#0  0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at
../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54
#1  0x08159965 in __pthread_wait_for_restart_signal (self=0x82cdfc0) at
pthread.c:1151
#2  0x08156d54 in __pthread_cond_wait (cond=0x0, mutex=0x841a780) at restart.h:34
#3  0x08178564 in _XDisplayLockWait ()
#4  0x08178ab5 in XLockDisplay ()
#5  0x0813d42c in _xwin_set_window_title ()
#6  0x0812c323 in main ()
#7  0x081c8748 in __libc_start_main (main=0x812c300 <main>, argc=1,
ubp_av=0xbfffe114, init=0x81c88f4 <__libc_csu_init>, fini=0x81c8948
<__libc_csu_fini>,
    rtld_fini=0, stack_end=0x0) at ../sysdeps/generic/libc-start.c:193
(gdb) c
Continuing.
 
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()

maps dump:
08048000-082ab000 r-xp 00000000 03:07 906758     /home/swalker/games/kq2vga/ags
082ab000-082d5000 rw-p 00262000 03:07 906758     /home/swalker/games/kq2vga/ags
082d5000-085b2000 rwxp 00000000 00:00 0
40000000-40002000 rw-p 00000000 00:00 0
40002000-4001a000 r--p 00000000 03:08 294943     /etc/ld.so.cache
4001a000-40025000 r-xp 00000000 03:08 1015887    /lib/libnss_files-2.3.2.so
40025000-40026000 rw-p 0000a000 03:08 1015887    /lib/libnss_files-2.3.2.so
40026000-40035000 r-xp 00000000 03:08 1015895    /lib/libpthread-0.10.so
40035000-40038000 rw-p 0000f000 03:08 1015895    /lib/libpthread-0.10.so
40038000-40078000 rw-p 00000000 00:00 0
40078000-401ad000 r-xp 00000000 03:08 1015873    /lib/libc-2.3.2.so
401ad000-401b1000 rw-p 00134000 03:08 1015873    /lib/libc-2.3.2.so
401b1000-401b3000 rw-p 00000000 00:00 0
401b3000-401c8000 r-xp 00000000 03:08 1015849    /lib/ld-2.3.2.so
401c8000-401c9000 rw-p 00014000 03:08 1015849    /lib/ld-2.3.2.so
401c9000-408ce000 rw-p 00000000 00:00 0
bf400000-bf401000 ---p 00000000 00:00 0
bf401000-bf600000 rwxp 00001000 00:00 0
bf600000-bf601000 ---p 00000000 00:00 0
bf601000-bf800000 rwxp 00001000 00:00 0
bfff1000-c0000000 rwxp ffff2000 00:00 0


Comment 5 Shawn Walker 2003-04-04 16:53:38 UTC
Ooops, I forgot the backtrace right when it segfaults, here's that, sorry:

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x4002f3a6 in ?? ()
#2  0x40034761 in ?? ()
#3  0x40029b99 in ?? ()
#4  0x082109ea in _dl_init (main_map=0x85b1630, argc=1, argv=0xbfffe114,
env=0xbfffe11c) at dl-init.c:68
#5  0x081ebd44 in dl_open_worker (a=0xbfffd420) at dl-open.c:448
#6  0x081eb0af in _dl_catch_error (objname=0xbfffd418, errstring=0xbfffd41c,
operate=0x81eba28 <dl_open_worker>, args=0xbfffd420) at dl-error.c:162
#7  0x081eb917 in _dl_open (file=0xbfffd5b0 "libnss_files.so.2", mode=1,
caller=0x0) at dl-open.c:495
#8  0x081ece94 in do_dlopen (ptr=0xbfffd580) at dl-libc.c:80
#9  0x081eb0af in _dl_catch_error (objname=0xbfffd578, errstring=0xbfffd57c,
operate=0x81ece80 <do_dlopen>, args=0xbfffd580) at dl-error.c:162
#10 0x081ecda9 in __libc_dlopen_mode (name=0x0, mode=0) at dl-libc.c:42
#11 0x081e9208 in __nss_lookup_function (ni=0x8443378, fct_name=0x828fdf2
"getpwuid_r") at nsswitch.c:342
#12 0x081e9bd2 in __nss_lookup (ni=0xbfffd664, fct_name=0x828fdf2 "getpwuid_r",
fctp=0xbfffd668) at nsswitch.c:148
#13 0x081ea1d6 in __nss_passwd_lookup (ni=0xbfffd664, fct_name=0x828fdf2
"getpwuid_r", fctp=0xbfffd668) at XXX-lookup.c:73
#14 0x081e5a37 in __getpwuid_r (uid=500, resbuf=0x83a426c, buffer=0x85b0ca0
"?8\006\b", buflen=1024, result=0xbfffd6a8) at getXXbyYY_r.c:183
#15 0x081e5708 in getpwuid (uid=500) at getXXbyYY.c:109
#16 0x0813db16 in _xdga_private_create_screen ()


Comment 6 Jakub Jelinek 2003-04-04 16:55:22 UTC
Where is /lib/libpthread.so.0 coming in there? glibc NSS functions certainly
don't dlopen libpthread, so I guess it is your application which does that,
right?
As for whether you want statically linked binary or not, almost always you
don't want statically linked binary. If you want to link libstdc++ statically,
use -Bstatic -lstdc++ -Bdynamic .

Comment 7 Shawn Walker 2003-04-04 17:22:04 UTC
I'm not personally opening pthread in my program, however one of the libraries I
use may be, although I'm not personally aware of that.

As far as the static linking goes, it's required for my sanity. As I mentioned
earlier this is *not* a open source game, it's not a commercial game either
though. As such I can only distribute binaries to users. The reason I static
link the binaries is because of the C++ ABI issues introduced by GCC3.x, and
because I had problems with users being unable to run the binaries on their
systems if I didn't static link libc and everything else, they would get weird
unresolved symbols or other things. I do provide a dynamic binary as well, but
most people can only get the static binary to work because of the ABI issues.

The way you specified to static link against stdc++ is not the way I was told to
do it by people on the gcc-help list or involved with the gcc project, I was
told that I had to link specifically against libstdc++ manually and that I
shouldn't expect things to be easy as they were to static link in gcc2.95 (which
worked just the way you describe above).

Here's the main issue I'm trying to figure, why does this binary when statically
linked and compiled under RedHat 9, *not* work on RedHat 9 the box I compiled it
on, but if I reboot into Debian Woody 3.0, it works. That's why I'm trying to
figure out, and the fact that this same process worked great under RedHat 8.0.

I'm not mad or anything and I'm more than grateful for all the help and input
you've provided so far. I'm willing to do anything to help you get to the bottom
of this issue that I can legally get away with ;)

You can see my discussion with the people on gcc-help here:
http://gcc.gnu.org/ml/gcc-help/2002-11/msg00085.html

I had a few personal responses too from people I believe are directly involved
with the project that basically told me that I couldn't expect it to be "easy"
to do. Thanks.

I will gladly share with you which libraries I static link against if you'd
like, or attach the Makefile for your perusal (which I'm sure isn't the
prettiest, but it is small) if you so desire.


Comment 8 Shawn Walker 2003-04-04 17:26:51 UTC
By the way, it's completely statically linked, meaning ldd -r produces "not a
dynamic executable".

Comment 9 Jakub Jelinek 2003-04-07 12:05:56 UTC
Have you been using -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic at the end of the gcc command line
or at the start? Only the former can work.
If the statically linked program dlopens libpthread, it is really outside of the supported
usage (dlopen in statically linked programs has very limited support, just
so that NSS and iconv work).

Comment 10 Shawn Walker 2003-04-07 15:58:35 UTC
Created attachment 90960 [details]
Makefile used to build static binary

Well, we're starting to get somewhere, you're the first person that has managed
to show me how I could get libstdc++ to properly link under gcc3.2

However, it doesn't resolve the issue, again the binary will run on Debian
Woody 3.0, but not on RedHat 9, the platform it was compiled on.

I've attached the Makefile I use to build the binary, you'll want to look at
the static-release target. Please tell me if I've done anything wrong here.

One of the first things you'll notice is that I do not add -Bdynamic at the end
of the LDFLAGS_STATIC variable because I don't want *anything* dynamically
linked.

Lastly, I do not use dlopen in my program. However, maybe one of the libraries
I link to is. I have to link against the X11 libraries for this program, so
that may be the dlopen problem spot.

Comment 11 Shawn Walker 2003-04-07 16:00:39 UTC
"Well, we're starting to get somewhere, you're the first person that has managed
to show me how I could get libstdc++ to properly link under gcc3.2"

I should clarify that statement, what I mean is that you're the first person
that has been able to show me how to link it without resorting to
/usr/lib/libc.a and so forth.

Comment 12 Jakub Jelinek 2003-04-08 15:05:19 UTC
Can you please start by removing -ld from the LDFLAGS_STATIC line and see what
library/object uses dlopen, then check what they are dlopening?

Comment 13 Shawn Walker 2003-04-08 18:05:20 UTC
Ok. I removed -ldl from the flags line, allegro (http://alleg.sf.net) came up as
the one using dlopen. It *does not dlopen pthread though*. They dlopen a set of
modules they have, "drivers":

alleg-vga.so
alleg-fbcon.so
alleg-dga2.so
alleg-esddigi.so  - esd Sound driver
alleg-artsdigi.so - kde arts sound driver

They use RTLD_NOW when they do dlopen on them.

I finally found which library and under what specific condition causes the
porblem. For some reason, when going full screen in the application (which
activates DGA2) and dlopen'ing the arts module is where I have the problem. As
long as it doesn't load the "arts" module, all is well. It appears that an
invalid handle for some reason in that specific case gets returned for the arts
library? I'm not sure yet. I probably need to pursue this with the allegro
development team to see if we can figure out what's triggering it, but I'm not
sure where to even begin. Why only the arts library triggers this is beyond me.

Thanks for your help in pointing me in the right direction to get to the root of
this matter. I'm not sure that there's really anything you can do from here. I
don't really expect you to either. Although it still puzzles me why this worked
under 8.0 just fine, but not under 9.


Comment 14 Shawn Walker 2003-04-09 00:42:38 UTC
I should note that they're using the "artsc" library, because they're using the
C interface to Arts as Allegro is written in C.

Comment 15 Jakub Jelinek 2003-04-09 07:01:47 UTC
And artsc links against libpthrad. This means that you really cannot link
statically in your program and if it worked before it was by sheer luck.
When there are two different copies of libpthread in your program, things
really cannot work properly (one statically linked and one dynamically linked,
each using a different pthread_self scheme etc).
I don't understand why you insist on full statical link, everybody has libc.so,
libdl.so, libm.so, libpthread.so, libX11.so and they are upward binary compatible
using symbol versioning (or in case of libX11 that their external interface is
not changing). This means that if you link against say glibc 2.1.x, the program
ought to work just fine on glibc 2.[123].x and later.

Comment 16 Shawn Walker 2003-04-09 14:33:30 UTC
"I don't understand why you insist on full statical link, everybody has libc.so,
libdl.so, libm.so, libpthread.so, libX11.so and they are upward binary compatible
using symbol versioning (or in case of libX11 that their external interface is
not changing). This means that if you link against say glibc 2.1.x, the program
ought to work just fine on glibc 2.[123].x and later."

Like I said, I thought I should be able to as well, but I've had users report
that they get bizarre errors that pop up on their systems like "GLIBC 2.0
synbols not found" or something like that. Even though they're using a GLIBC 2.2
or 2.1 system. The binaries don't work. Now they got these errors with older
binaries I produced under RedHat 8.0, none of it makes sense to me, all I know
is since I started static linking it all, users have stopped complaining about
binaries not working right. Like I said, I didn't think there was much you could
do about it, but how does one resolve screwy issues like this when distributing
binaries?

Comment 17 Shawn Walker 2003-04-09 14:35:34 UTC
Again I just want to thank you for your assistance, I doubt I would have found
this issue and solved a few others without your additionak knowledge.

Under RedHat 9, if I want to produce libraries compatible with GLIBC2.1, 2.2
systems how can I?