Bug 87836
Summary: | Static linked binary threading problems | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Shawn Walker <drevil> | ||||
Component: | libc | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED NOTABUG | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 9 | CC: | mitr | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | athlon | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2003-04-09 07:01:47 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Shawn Walker
2003-04-02 23:42:38 UTC
At the point where it segfaults under gdb, I'd need the content of /proc/<pid_of_the_debuged_program>/maps to find out where exactly it is crashing. Also, you might want to check ftp://people.redhat.com/jakub/glibc/errata/9/ which is current RHL9 glibc errata candidate. Okay, here's a cat of the map file at the point that it seg faults in GDB: 08048000-082aa000 r-xp 00000000 03:07 906758 /home/swalker/games/kq2vga/ags 082aa000-082d4000 rw-p 00261000 03:07 906758 /home/swalker/games/kq2vga/ags 082d4000-085b1000 rwxp 00000000 00:00 0 40000000-40002000 rw-p 00000000 00:00 0 40002000-4001a000 r--p 00000000 03:08 295125 /etc/ld.so.cache 40025000-40034000 r-xp 00000000 03:08 1015896 /lib/libpthread-0.10.so 40034000-40037000 rw-p 0000f000 03:08 1015896 /lib/libpthread-0.10.so 40037000-40077000 rw-p 00000000 00:00 0 40077000-401aa000 r-xp 00000000 03:08 1015874 /lib/libc-2.3.2.so 401aa000-401ae000 rw-p 00132000 03:08 1015874 /lib/libc-2.3.2.so 401ae000-401b0000 rw-p 00000000 00:00 0 401b0000-401c5000 r-xp 00000000 03:08 1015867 /lib/ld-2.3.2.so 401c5000-401c6000 rw-p 00014000 03:08 1015867 /lib/ld-2.3.2.so 401c6000-408cb000 rw-p 00000000 00:00 0 408cb000-408d6000 r-xp 00000000 03:08 1015888 /lib/libnss_files-2.3.2.so 408d6000-408d7000 rw-p 0000a000 03:08 1015888 /lib/libnss_files-2.3.2.so bf400000-bf401000 ---p 00000000 00:00 0 bf401000-bf600000 rwxp 00001000 00:00 0 bf600000-bf601000 ---p 00000000 00:00 0 bf601000-bf800000 rwxp 00001000 00:00 0 bfff1000-c0000000 rwxp ffff2000 00:00 0 BTW, what does the contents of maps tell you exactly? Forgive me for being ignorant. I'll check out the glibc errata too. Thanks! Alright, I've installed/upgraded *all* of the glibc errata packages you pointed to, still have the issue, here's a dump of the maps file for the process: 08048000-082ab000 r-xp 00000000 03:07 906758 /home/swalker/games/kq2vga/ags 082ab000-082d5000 rw-p 00262000 03:07 906758 /home/swalker/games/kq2vga/ags 082d5000-085b2000 rwxp 00000000 00:00 0 40000000-40002000 rw-p 00000000 00:00 0 40002000-4001a000 r--p 00000000 03:08 294943 /etc/ld.so.cache 4001a000-40025000 r-xp 00000000 03:08 1015887 /lib/libnss_files-2.3.2.so 40025000-40026000 rw-p 0000a000 03:08 1015887 /lib/libnss_files-2.3.2.so 40026000-40035000 r-xp 00000000 03:08 1015895 /lib/libpthread-0.10.so 40035000-40038000 rw-p 0000f000 03:08 1015895 /lib/libpthread-0.10.so 40038000-40078000 rw-p 00000000 00:00 0 40078000-401ad000 r-xp 00000000 03:08 1015873 /lib/libc-2.3.2.so 401ad000-401b1000 rw-p 00134000 03:08 1015873 /lib/libc-2.3.2.so 401b1000-401b3000 rw-p 00000000 00:00 0 401b3000-401c8000 r-xp 00000000 03:08 1015849 /lib/ld-2.3.2.so 401c8000-401c9000 rw-p 00014000 03:08 1015849 /lib/ld-2.3.2.so 401c9000-408ce000 rw-p 00000000 00:00 0 bf400000-bf401000 ---p 00000000 00:00 0 bf401000-bf600000 rwxp 00001000 00:00 0 bf600000-bf601000 ---p 00000000 00:00 0 bf601000-bf800000 rwxp 00001000 00:00 0 bfff2000-c0000000 rwxp ffff3000 00:00 0 I'll try static linking it against the glibc debug library next instead of the normal one. Thanks. Okay, here's the results of a backtrance and maps dump when static linking against the DEBUG versions of the libdl, libc, libm and libpthread using the *errata* glibc packages: gdb backtrace: (gdb) r Starting program: /home/swalker/games/kq2vga/ags Adventure Creator v2.54 Interpreter Copyright (c) 1999-2001 Chris Jones ACI version 2.54.525 Program received signal SIG32, Real-time event 32. 0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54 54 ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c: No such file or directory. in ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c (gdb) bt #0 0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54 #1 0x08159965 in __pthread_wait_for_restart_signal (self=0x82cdfc0) at pthread.c:1151 #2 0x08159bc7 in __pthread_create_2_1 (thread=0xfffffffc, attr=0xfffffffc, start_routine=0xfffffffc, arg=0xfffffffc) at restart.h:34 (gdb) c Continuing. CD-ROM Audio support enabled. Pentium Pro CPU detected. Music file found and initialized. Program received signal SIG32, Real-time event 32. 0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54 54 in ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c (gdb) bt #0 0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54 #1 0x08159965 in __pthread_wait_for_restart_signal (self=0x82cdfc0) at pthread.c:1151 #2 0x08159bc7 in __pthread_create_2_1 (thread=0xfffffffc, attr=0xfffffffc, start_routine=0xfffffffc, arg=0xfffffffc) at restart.h:34 (gdb) c Continuing. Checking sound inits. Program received signal SIG32, Real-time event 32. 0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54 54 in ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c (gdb) bt #0 0x0815a6e0 in __pthread_sigsuspend (set=0xfffffffc) at ../linuxthreads/sysdeps/unix/sysv/linux/pt-sigsuspend.c:54 #1 0x08159965 in __pthread_wait_for_restart_signal (self=0x82cdfc0) at pthread.c:1151 #2 0x08156d54 in __pthread_cond_wait (cond=0x0, mutex=0x841a780) at restart.h:34 #3 0x08178564 in _XDisplayLockWait () #4 0x08178ab5 in XLockDisplay () #5 0x0813d42c in _xwin_set_window_title () #6 0x0812c323 in main () #7 0x081c8748 in __libc_start_main (main=0x812c300 <main>, argc=1, ubp_av=0xbfffe114, init=0x81c88f4 <__libc_csu_init>, fini=0x81c8948 <__libc_csu_fini>, rtld_fini=0, stack_end=0x0) at ../sysdeps/generic/libc-start.c:193 (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () maps dump: 08048000-082ab000 r-xp 00000000 03:07 906758 /home/swalker/games/kq2vga/ags 082ab000-082d5000 rw-p 00262000 03:07 906758 /home/swalker/games/kq2vga/ags 082d5000-085b2000 rwxp 00000000 00:00 0 40000000-40002000 rw-p 00000000 00:00 0 40002000-4001a000 r--p 00000000 03:08 294943 /etc/ld.so.cache 4001a000-40025000 r-xp 00000000 03:08 1015887 /lib/libnss_files-2.3.2.so 40025000-40026000 rw-p 0000a000 03:08 1015887 /lib/libnss_files-2.3.2.so 40026000-40035000 r-xp 00000000 03:08 1015895 /lib/libpthread-0.10.so 40035000-40038000 rw-p 0000f000 03:08 1015895 /lib/libpthread-0.10.so 40038000-40078000 rw-p 00000000 00:00 0 40078000-401ad000 r-xp 00000000 03:08 1015873 /lib/libc-2.3.2.so 401ad000-401b1000 rw-p 00134000 03:08 1015873 /lib/libc-2.3.2.so 401b1000-401b3000 rw-p 00000000 00:00 0 401b3000-401c8000 r-xp 00000000 03:08 1015849 /lib/ld-2.3.2.so 401c8000-401c9000 rw-p 00014000 03:08 1015849 /lib/ld-2.3.2.so 401c9000-408ce000 rw-p 00000000 00:00 0 bf400000-bf401000 ---p 00000000 00:00 0 bf401000-bf600000 rwxp 00001000 00:00 0 bf600000-bf601000 ---p 00000000 00:00 0 bf601000-bf800000 rwxp 00001000 00:00 0 bfff1000-c0000000 rwxp ffff2000 00:00 0 Ooops, I forgot the backtrace right when it segfaults, here's that, sorry: Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () #1 0x4002f3a6 in ?? () #2 0x40034761 in ?? () #3 0x40029b99 in ?? () #4 0x082109ea in _dl_init (main_map=0x85b1630, argc=1, argv=0xbfffe114, env=0xbfffe11c) at dl-init.c:68 #5 0x081ebd44 in dl_open_worker (a=0xbfffd420) at dl-open.c:448 #6 0x081eb0af in _dl_catch_error (objname=0xbfffd418, errstring=0xbfffd41c, operate=0x81eba28 <dl_open_worker>, args=0xbfffd420) at dl-error.c:162 #7 0x081eb917 in _dl_open (file=0xbfffd5b0 "libnss_files.so.2", mode=1, caller=0x0) at dl-open.c:495 #8 0x081ece94 in do_dlopen (ptr=0xbfffd580) at dl-libc.c:80 #9 0x081eb0af in _dl_catch_error (objname=0xbfffd578, errstring=0xbfffd57c, operate=0x81ece80 <do_dlopen>, args=0xbfffd580) at dl-error.c:162 #10 0x081ecda9 in __libc_dlopen_mode (name=0x0, mode=0) at dl-libc.c:42 #11 0x081e9208 in __nss_lookup_function (ni=0x8443378, fct_name=0x828fdf2 "getpwuid_r") at nsswitch.c:342 #12 0x081e9bd2 in __nss_lookup (ni=0xbfffd664, fct_name=0x828fdf2 "getpwuid_r", fctp=0xbfffd668) at nsswitch.c:148 #13 0x081ea1d6 in __nss_passwd_lookup (ni=0xbfffd664, fct_name=0x828fdf2 "getpwuid_r", fctp=0xbfffd668) at XXX-lookup.c:73 #14 0x081e5a37 in __getpwuid_r (uid=500, resbuf=0x83a426c, buffer=0x85b0ca0 "?8\006\b", buflen=1024, result=0xbfffd6a8) at getXXbyYY_r.c:183 #15 0x081e5708 in getpwuid (uid=500) at getXXbyYY.c:109 #16 0x0813db16 in _xdga_private_create_screen () Where is /lib/libpthread.so.0 coming in there? glibc NSS functions certainly don't dlopen libpthread, so I guess it is your application which does that, right? As for whether you want statically linked binary or not, almost always you don't want statically linked binary. If you want to link libstdc++ statically, use -Bstatic -lstdc++ -Bdynamic . I'm not personally opening pthread in my program, however one of the libraries I use may be, although I'm not personally aware of that. As far as the static linking goes, it's required for my sanity. As I mentioned earlier this is *not* a open source game, it's not a commercial game either though. As such I can only distribute binaries to users. The reason I static link the binaries is because of the C++ ABI issues introduced by GCC3.x, and because I had problems with users being unable to run the binaries on their systems if I didn't static link libc and everything else, they would get weird unresolved symbols or other things. I do provide a dynamic binary as well, but most people can only get the static binary to work because of the ABI issues. The way you specified to static link against stdc++ is not the way I was told to do it by people on the gcc-help list or involved with the gcc project, I was told that I had to link specifically against libstdc++ manually and that I shouldn't expect things to be easy as they were to static link in gcc2.95 (which worked just the way you describe above). Here's the main issue I'm trying to figure, why does this binary when statically linked and compiled under RedHat 9, *not* work on RedHat 9 the box I compiled it on, but if I reboot into Debian Woody 3.0, it works. That's why I'm trying to figure out, and the fact that this same process worked great under RedHat 8.0. I'm not mad or anything and I'm more than grateful for all the help and input you've provided so far. I'm willing to do anything to help you get to the bottom of this issue that I can legally get away with ;) You can see my discussion with the people on gcc-help here: http://gcc.gnu.org/ml/gcc-help/2002-11/msg00085.html I had a few personal responses too from people I believe are directly involved with the project that basically told me that I couldn't expect it to be "easy" to do. Thanks. I will gladly share with you which libraries I static link against if you'd like, or attach the Makefile for your perusal (which I'm sure isn't the prettiest, but it is small) if you so desire. By the way, it's completely statically linked, meaning ldd -r produces "not a dynamic executable". Have you been using -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic at the end of the gcc command line or at the start? Only the former can work. If the statically linked program dlopens libpthread, it is really outside of the supported usage (dlopen in statically linked programs has very limited support, just so that NSS and iconv work). Created attachment 90960 [details]
Makefile used to build static binary
Well, we're starting to get somewhere, you're the first person that has managed
to show me how I could get libstdc++ to properly link under gcc3.2
However, it doesn't resolve the issue, again the binary will run on Debian
Woody 3.0, but not on RedHat 9, the platform it was compiled on.
I've attached the Makefile I use to build the binary, you'll want to look at
the static-release target. Please tell me if I've done anything wrong here.
One of the first things you'll notice is that I do not add -Bdynamic at the end
of the LDFLAGS_STATIC variable because I don't want *anything* dynamically
linked.
Lastly, I do not use dlopen in my program. However, maybe one of the libraries
I link to is. I have to link against the X11 libraries for this program, so
that may be the dlopen problem spot.
"Well, we're starting to get somewhere, you're the first person that has managed to show me how I could get libstdc++ to properly link under gcc3.2" I should clarify that statement, what I mean is that you're the first person that has been able to show me how to link it without resorting to /usr/lib/libc.a and so forth. Can you please start by removing -ld from the LDFLAGS_STATIC line and see what library/object uses dlopen, then check what they are dlopening? Ok. I removed -ldl from the flags line, allegro (http://alleg.sf.net) came up as the one using dlopen. It *does not dlopen pthread though*. They dlopen a set of modules they have, "drivers": alleg-vga.so alleg-fbcon.so alleg-dga2.so alleg-esddigi.so - esd Sound driver alleg-artsdigi.so - kde arts sound driver They use RTLD_NOW when they do dlopen on them. I finally found which library and under what specific condition causes the porblem. For some reason, when going full screen in the application (which activates DGA2) and dlopen'ing the arts module is where I have the problem. As long as it doesn't load the "arts" module, all is well. It appears that an invalid handle for some reason in that specific case gets returned for the arts library? I'm not sure yet. I probably need to pursue this with the allegro development team to see if we can figure out what's triggering it, but I'm not sure where to even begin. Why only the arts library triggers this is beyond me. Thanks for your help in pointing me in the right direction to get to the root of this matter. I'm not sure that there's really anything you can do from here. I don't really expect you to either. Although it still puzzles me why this worked under 8.0 just fine, but not under 9. I should note that they're using the "artsc" library, because they're using the C interface to Arts as Allegro is written in C. And artsc links against libpthrad. This means that you really cannot link statically in your program and if it worked before it was by sheer luck. When there are two different copies of libpthread in your program, things really cannot work properly (one statically linked and one dynamically linked, each using a different pthread_self scheme etc). I don't understand why you insist on full statical link, everybody has libc.so, libdl.so, libm.so, libpthread.so, libX11.so and they are upward binary compatible using symbol versioning (or in case of libX11 that their external interface is not changing). This means that if you link against say glibc 2.1.x, the program ought to work just fine on glibc 2.[123].x and later. "I don't understand why you insist on full statical link, everybody has libc.so, libdl.so, libm.so, libpthread.so, libX11.so and they are upward binary compatible using symbol versioning (or in case of libX11 that their external interface is not changing). This means that if you link against say glibc 2.1.x, the program ought to work just fine on glibc 2.[123].x and later." Like I said, I thought I should be able to as well, but I've had users report that they get bizarre errors that pop up on their systems like "GLIBC 2.0 synbols not found" or something like that. Even though they're using a GLIBC 2.2 or 2.1 system. The binaries don't work. Now they got these errors with older binaries I produced under RedHat 8.0, none of it makes sense to me, all I know is since I started static linking it all, users have stopped complaining about binaries not working right. Like I said, I didn't think there was much you could do about it, but how does one resolve screwy issues like this when distributing binaries? Again I just want to thank you for your assistance, I doubt I would have found this issue and solved a few others without your additionak knowledge. Under RedHat 9, if I want to produce libraries compatible with GLIBC2.1, 2.2 systems how can I? |