This is not reproducible with FC3 and gcc-c++-3.4.3-22.fc3. Upstream developers have not heard any complaints so far either. Their software is built for Windows, MacOS and Linux regularly. I cannot rule out that GCC miscompiles something here. Hence the report (upstream has got a report earlier). [...] "wesnoth" 0.9.2 from Fedora Extras devel tree crashes when built for FC4 (gcc-c++-4.0.0-8) or Rawhide (gcc-c++-4.0.0-12): Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 54066096 (zombie)] 0x00a353a8 in ?? () (gdb) bt #0 0x00a353a8 in ?? () #1 0x0026ab7a in __nptl_deallocate_tsd () from /lib/libpthread.so.0 #2 0x0026bb8e in start_thread () from /lib/libpthread.so.0 #3 0x00c79dee in clone () from /lib/libc.so.6 (gdb) t [Current thread is 4 (Thread 54066096 (zombie))] Valgrind says: ==9255== ==9255== Thread 4: ==9255== Jump to the invalid address stated on the next line ==9255== at 0x1BAC23A8: ??? ==9255== by 0x26BB8D: start_thread (in /lib/libpthread-2.3.5.so) ==9255== by 0xC79DED: clone (in /lib/libc-2.3.5.so) ==9255== Address 0x1BAC23A8 is not stack'd, malloc'd or (recently) free'd ==9255== ==9255== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- The other three threads are in __kernel_vsyscall. The thread functions are global. Steps to reproduce: 1) build packages from Fedora Extras CVS devel/wesnoth 2) install "wesnoth" 3) start "wesnoth" 4) enter multiplayer menu and connect with official server [...] Building without optimisations makes no change. The program uses SDL as a pthread wrapper and SDL_net. Rebuilding SDL without optflags doesn't change anything either. Simple SDL thread and SDL network test code doesn't crash, so it is not an obvious fault in SDL. The first version of "wesnoth", 0.8.6, which introduced SDL and SDL_net usage and is several months old, is the first that crashes when built for FC4.
It is much more likely that this is an application bug (either wesnoth or SDL*) than glibc bug (and I'd say it surely has nothing to do with gcc). The above backtrace is what happens if some thread specific data destructor is invalid at thread creation time. See pthread_key_create(3p) and pthread_key_delete(3p). Most likely a shared library registered a destructor with pthread_key_create, forgot to delete that key (pthread_key_delete) in its destructor code and was unloaded (dlclose) by the application. Can you please debug whether this is the case?
SDL doesn't use pthread_key_{create,delete} at all. Wesnoth doesn't dlclose anything either. And Wesnoth only uses SDL's Thread API, where user provides a thread function ptr and ptr to user data. Downgrading SDL to a rebuild of FC3's SDL version doesn't change anything. The crash happens when a thread function returns, main program waiting in SDL_WaitThread. When debugging inside ddd I can confirm your theory at least. Inside __nptl_deallocate_tsd, when I step through the loop, the fourth data pointer is non-NULL. The called destructor apparently points into nirvana (it once stepped into a completely unrelated library, /usr/lib/libaudiofile.so.0). Now I need to find out where the destructor pointer comes from...
Hmm, calls of pthread_key_create return into libstdc++ while stepping through the code. Calls of pthread_key_delete are missing... Breakpoint 2, __pthread_key_create (key=0x45d400, destr=0x8000) at pthread_key_create.c:35 (gdb) bt #0 __pthread_key_create (key=0x45d400, destr=0x8000) at pthread_key_create.c:35 #1 0x003bc460 in __gnu_cxx::__pool<true>::_M_initialize () from /usr/lib/libstdc++.so.6 #2 0x003bd691 in __gnu_cxx::__pool<true>::_M_reclaim_block () from /usr/lib/libstdc++.so.6 #3 0x0026e92b in ?? () from /lib/libpthread.so.0 #4 0x003bd001 in __gnu_cxx::__pool<true>::_M_reclaim_block () from /usr/lib/libstdc++.so.6 #5 0x0040727d in std::string::_Rep::_S_create () from /usr/lib/libstdc++.so.6 #6 0x0040b25f in std::operator+<char, std::char_traits<char>, std::allocator<char> > () from /usr/lib/libstdc++.so.6 #7 0x0040be7c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string () from /usr/lib/libstdc++.so.6 #8 0x083998d5 in __static_initialization_and_destruction_0 (__initialize_p=Variable "__initialize_p" is not available. ) at widgets/slider.cpp:24 #9 0x083a25aa in __do_global_ctors_aux () #10 0x0804cf3d in _init () #11 0x083a2556 in __libc_csu_init () #12 0x00bc3d83 in __libc_start_main (main=0x8166626 <main>, argc=1, ubp_av=0xbf8d3a54, init=0x83a253c <__libc_csu_init>, fini=0x83a258c <__libc_csu_fini>, rtld_fini=0xb9ef2d <_dl_fini>, stack_end=0xbf8d3a4c) at ../sysdeps/generic/libc-start.c:183 #13 0x0804e271 in _start ()
Thanks. Then it really is a bug in the libstdc++ mt allocator.
Can you please attach whatever code you are using to reproduce?
cat > O.c <<EOF #include <dlfcn.h> #include <pthread.h> void * tf (void *arg) { void *h = dlopen ("./libO.so", RTLD_LAZY); void (*fn) (void); if (!h) return 0; fn = dlsym (h, "foo"); fn (); dlclose (h); return 0; } int main (void) { pthread_t th; pthread_create (&th, NULL, tf, NULL); pthread_join (th, NULL); return 0; } EOF cat > libO.C <<EOF #include <string> extern "C" void foo (void) { std::string s; s += "hello"; } EOF g++ -g -O2 -shared -fpic -o libO.so libO.C gcc -g -O2 -o O O.c -ldl -lpthread (when mt is the default allocator, not sure how the testcase would look like when e.g. new is the default allocator, but you want to forcibly use mt allocator). pthread_key_delete needs to be called somewhere in libstdc++.so's destructors (but late enough so that __pool<true>::_M_get_thread_id() is not called any longer.
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00478.html
Saw the libstdc++ rpm changelog that 4.0.1-1 is supposed to fix this, but it still crashes in the same way. Segfault in Blender (bug 157922) seems as if it occurs in the same part of libstdc++.
This patch is supposed to fix crashes that are seen when dlclosing libstdc++.so.6. The backtrace that is given in #157922 is different, has nothing to do with this bug.
Sorry. You misunderstood me. libstdc++-4.0.1-1 does not fix this.
The testcase I added in #6 certainly works just fine with libstdc++-4.0.1-[12], but segfaults with older libstdc++ rpms. So, what backtrace do you get on the crash, and if it from __nptl_deallocate_tsd what routine used to contain the $pc address?
This is what I see: (gdb) print destr $11 = (void (*)(void *)) 0xb61d48 <__gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true>::_S_destroy_thread_key(void*)> (gdb) stepi 0x00b61d48 in ?? () (gdb) bt #0 0x00b61d48 in ?? () #1 0x0092cb7a in __nptl_deallocate_tsd () at pthread_create.c:153 #2 0x0092db97 in start_thread (arg=0x43acbb0) at pthread_create.c:268 #3 0x0079d34e in ?? () from /lib/libc.so.6 pthread_create.c:153 /* Call the user-provided destructor. */ __pthread_keys[idx].destr (data);
As an additional observation, like in comment 3, breakpoint on pthread_key_delete doesn't catch anything.
Can you attach LD_DEBUG=all dump of the program? For me pthread_key_delete is certainly called: 18660: calling fini: /usr/lib/libstdc++.so.6 [0] 18660: 18660: symbol=__cxa_finalize; lookup in file=./O [0] 18660: symbol=__cxa_finalize; lookup in file=/lib/libdl.so.2 [0] 18660: symbol=__cxa_finalize; lookup in file=/lib/libpthread.so.0 [0] 18660: symbol=__cxa_finalize; lookup in file=/lib/libc.so.6 [0] 18660: binding file /usr/lib/libstdc++.so.6 [0] to /lib/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.1.3] 18660: symbol=pthread_key_delete; lookup in file=./O [0] 18660: symbol=pthread_key_delete; lookup in file=/lib/libdl.so.2 [0] 18660: symbol=pthread_key_delete; lookup in file=/lib/libpthread.so.0 [0] 18660: binding file /usr/lib/libstdc++.so.6 [0] to /lib/libpthread.so.0 [0]: normal symbol `pthread_key_delete' from within libstdc++.so.6's destructors.
http://home.arcor.de/ms2002sep/tmp/wesnoth-LD_DEBUG.txt.bz2 (730 K compressed, 34 M uncompressed) [...] Breakpoint here catches pthread_key_delete only after segfault has happened. [New Thread 34855856 (LWP 847)] [New Thread 106400688 (LWP 848)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 106400688 (zombie)] 0x00543d48 in ?? () (gdb) c Fatal signal: Segmentation Fault (SDL Parachute Deployed) [Thread 24366000 (LWP 846) exited] Breakpoint 3, pthread_key_delete (key=4) at pthread_key_delete.c:31 (gdb)
And libstdc++.so got unmapped before the segfault? That would make no sense, as pthread_key_delete then certainly couldn't be called afterwards. Perhaps what you are seeing is not libstdc++.so being unmapped before the crash, but simply some instruction in _S_destroy_thread_key causing the segfault. That could have two reasons IMHO: the far more likely is a bug in wesnoth's (or its libraries) memory management, where it would overflow some buffer into mt allocator's control structures (similarly how you crash in an unrelated malloc call if you overflow some heap buffer), the other possibility could be mt allocator bug. Running the program under strace could reveal whether libstdc++.so is mapped at that point or not.
Created attachment 116641 [details] wesnoth-strace.txt.gz Normal strace? Or any special options? [...] I don't think I would get "Cannot access memory at address 0x492d48" in gdb for just corrupted memory. Whatever user-provided destructor it tries to call, it's not there anymore. [...] Wesnoth is in Fedora Extras Development (both CVS and repository, also for debuginfo). You could have a look yourself. Memory corruption would imply that a bug is in there for almost a year (since the v0.8.6 which was the first to implement threaded networking) and has not been noticed or discovered for distributions older than FC4/gcc4.
strace -f at least, though perhaps strace -E LD_DEBUG=all -f would be even more helpful. I'm aware that wesnoth is in extras, but unfortunately I couldn't reproduce it remotely on my FC4 workstation (apparently requires running from local console), so it would need to wait a few days.
Created attachment 116645 [details] wesnoth-strace-f.txt.gz (71 K) http://home.arcor.de/ms2002sep/tmp/wesnoth-LD_DEBUG_strace-f.txt.bz2 (1.5 M compressed, >100 M uncompressed)
The wesnoth-LD_DEBUG_strace-f.txt dump tells me at the point of the segfault libstdc++.so.6 is still mapped and the thread with tid 1557 in which the segfault happens is currently being cancelled. You should be able to verify easily in the debugger that libstdc++.so.6 is still mapped, just wait in gdb till it segfaults and print /proc/<pid>/maps from another terminal. So the question really is on which exact instruction in the routine it segfaults, what were the registers etc.
http://savannah.nongnu.org/bugs/?func=detailitem&item_id=13404 Well, it doesn't crash inside the thread function. The thread function terminates with return 0, and then via pthread_exit it enters __nptl_deallocate_tsd () where the aforementioned crash happens. Valgrind says the called address "is not stack'd, malloc'd or (recently) free'd".
Of course the address is not supposed to be malloc'd, free'd or on the stack. The question is if that address corresponds to the text segment of libstdc++.so.6 and if that part of libstdc++.so.6's text segment contains the code it is supposed to contain.
Did the following: Create a breakpoint in pthread_key_create and collect all destructor addresses. The last one before I make the program crash is this: Breakpoint 5, __pthread_key_create (key=0x937144, destr=0x11cd48 <__gnu_cxx::__common_pool_policy<__gnu_cxx::__pool, true>::_S_destroy_thread_key(void*)>) at pthread_key_create.c:49 A look at /proc/.../maps does not show any memory area for that address. Libstdc++ was here: 033fe000-034de000 r-xp 00000000 03:08 909190 /usr/lib/libstdc++.so.6.0.5 034de000-034e3000 rwxp 000e0000 03:08 909190 /usr/lib/libstdc++.so.6.0.5 Then: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 46013360 (zombie)] 0x0011cd48 in ?? () Dump of assembler code from 0x11cd48 to 0x11ce48: 0x0011cd48: Cannot access memory at address 0x11cd48 And again, /proc does not show any memory area, which includes 0x11cd48: 00111000-00115000 r-xp 00000000 03:08 482527 /lib/libnss_dns-2.3.90.so 00115000-00116000 r-xp 00003000 03:08 482527 /lib/libnss_dns-2.3.90.so 00116000-00117000 rwxp 00004000 03:08 482527 /lib/libnss_dns-2.3.90.so 001b1000-001ba000 r-xp 00000000 03:08 482529 /lib/libnss_files-2.3.90.so 001ba000-001bb000 r-xp 00008000 03:08 482529 /lib/libnss_files-2.3.90.so 001bb000-001bc000 rwxp 00009000 03:08 482529 /lib/libnss_files-2.3.90.so [...] Repeated this and found destr to point into /usr/lib/libartscbackend.so.0.0.0 which was no longer mapped when it segfaulted. [...] Repeated this a third time and found destr to point into /usr/lib/libartscbackend.so.0.0.0 again which was no longer mapped when it segfaulted. $ ldd /usr/lib/libartscbackend.so.0.0.0 | grep -e 'thread\|std' libgthread-2.0.so.0 => /usr/lib/libgthread-2.0.so.0 (0x0011a000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00854000) libpthread.so.0 => /lib/libpthread.so.0 (0x00f1f000)
Hmm... SDL_audio dlopens libartscbackend, this triggers the aforementioned call of pthread_key_create, then SDL_audio dlcloses libartscbackend, but no pthread_key_delete call is catched by breakpoint.
Created attachment 116749 [details] tiny testcase As a status update, I've reached this point early this morning: $ g++ -g -Wall -I/usr/include/SDL testcase_161061.cpp -o testcase_161061 -lSDL $ ./testcase_161061 Segmentation fault
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00993.html