Description of problem: Simple and complex multithreaded applications which use dynamic libraries occasionally crash with "segmentation fault". Version-Release number of selected component (if applicable): glibc-2.3.4-2.9 How reproducible: Sometimes. Steps to Reproduce: 1. Copy and save the following C source files: -------------------- krol.c -------------------------- #include <pthread.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #define __USE_GNU #include <dlfcn.h> void dlcheck(void *p) { if(!p) { printf("[MAIN] ERROR: %s\n", dlerror()); exit(1); } } int main() { char library[] = "./libMyLib.so"; void *lib_handle; void (*lib_init)() = NULL; void (*lib_exit)() = NULL; printf("[MAIN] start\n"); lib_handle = dlopen(library, RTLD_LAZY); dlcheck(lib_handle); lib_init = dlsym(lib_handle, "lib_init"); dlcheck(lib_init); lib_exit = dlsym(lib_handle, "lib_exit"); dlcheck(lib_exit); printf("[MAIN] init ok\n"); printf("[MAIN] calling lib_init\n"); lib_init(); printf("[MAIN] lib_init ok\n"); printf("[MAIN] calling lib_exit\n"); lib_exit(); printf("[MAIN] lib_exit ok\n"); printf("[MAIN] exiting\n"); return 0; } ------------------------------------------------------ -------------------- lib1.c -------------------------- #include <pthread.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <unistd.h> #define N 2 pthread_t threads[N]; pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t cond = PTHREAD_COND_INITIALIZER; void check(const char *func, int err) { if(err != 0) { printf("%s FAILED: %s\n", func, strerror(err)); exit(2); } } void *thread_starter(void *d) { int err; printf("[thread] started\n"); err = pthread_mutex_lock (&mut); check("pthread_mutex_lock", err); while(1) { printf("[thread] working...\n"); err = pthread_cond_wait (&cond, &mut); check("pthread_cond_wait", err); } } void lib_init() { int err, i; printf("[lib] loading\n"); for(i = 0; i < N; i++) { err = pthread_create(&threads[i], NULL, thread_starter, NULL); check("pthread_cond_wait", err); } printf("[lib] loaded ok\n"); } void lib_exit() { int err, i; printf("[lib] unloading\n"); for(i = 0; i < N; i++) { err = pthread_cancel(threads[i]); check("pthread_cancel", err); } printf("[lib] unloaded ok\n"); } ------------------------------------------------------ 2. Compile them: gcc -Wall -shared lib1.c -o libMyLib.so -ldl -pthread gcc -Wall krol.c -o krol -ldl -pthread 3. Run the following command (I used the BASH shell): i=0 ; while true; do echo "---$i" ; ./krol || break ; (( ++i )) ; done Actual results: The program occasionally crash with SIGSEGV in different points of time. Below are two example outputs which I observed: ......---15 [MAIN] start [MAIN] init ok [MAIN] calling lib_init [lib] loading [lib] loaded ok [MAIN] lib_init ok [MAIN] calling lib_exit [lib] unloading [lib] unloaded ok [MAIN] lib_exit ok [MAIN] exiting [thread] started Segmentation fault ......---632 [MAIN] start [MAIN] init ok [MAIN] calling lib_init [lib] loading [lib] loaded ok [MAIN] lib_init ok [MAIN] calling lib_exit [lib] unloading [thread] started [thread] started [lib] unloaded ok [MAIN] lib_exit ok [MAIN] exiting Segmentation fault Expected results: The program must never crash, so the command must never finish. Additional info: I used the gdb debugger to catch the crash. In most cases, the program finishes successfully, but it nevertheless crashes sometime. Below are a few example session logs: - 1 ------------------------------------------------------------- Starting program: /home/jek/threads2/krol (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1208060224 (LWP 15709)] (no debugging symbols found) (no debugging symbols found) [MAIN] start (no debugging symbols found) [MAIN] init ok [MAIN] calling lib_init [lib] loading [New Thread -1208063056 (LWP 15712)] [thread] started [thread] working... [New Thread -1218552912 (LWP 15713)] [lib] loaded ok [MAIN] lib_init ok [MAIN] calling lib_exit [lib] unloading (no debugging symbols found) [thread] started Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1218552912 (zombie)] 0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1 (gdb) bt #0 0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1 #1 0x009c50ad in _Unwind_RaiseException () from /lib/libgcc_s.so.1 #2 0x009c514f in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1 #3 0x0096e2aa in _Unwind_ForcedUnwind () from /lib/tls/libpthread.so.0 #4 0x0096bf81 in __pthread_unwind () from /lib/tls/libpthread.so.0 #5 0x009663eb in sigcancel_handler () from /lib/tls/libpthread.so.0 #6 <signal handler called> #7 0x007eb2fc in __write_nocancel () from /lib/tls/libc.so.6 #8 0x00790bdf in _IO_new_file_write () from /lib/tls/libc.so.6 #9 0x0078f63b in _IO_new_do_write () from /lib/tls/libc.so.6 #10 0x007900e8 in _IO_new_file_overflow () from /lib/tls/libc.so.6 #11 0x00790d02 in _IO_new_file_xsputn () from /lib/tls/libc.so.6 #12 0x0076cef8 in vfprintf () from /lib/tls/libc.so.6 #13 0x00775450 in printf () from /lib/tls/libc.so.6 #14 0x0011f841 in thread_starter () from ./libMyLib.so #15 0x00967341 in start_thread () from /lib/tls/libpthread.so.0 #16 0x007f9fee in clone () from /lib/tls/libc.so.6 (gdb) - 2 ------------------------------------------------------------- Starting program: /home/jek/threads2/krol (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1208060224 (LWP 15867)] (no debugging symbols found) (no debugging symbols found) [MAIN] start (no debugging symbols found) [MAIN] init ok [MAIN] calling lib_init [lib] loading [New Thread -1208063056 (LWP 15868)] [thread] started [thread] working... [New Thread -1218552912 (LWP 15869)] [lib] loaded ok [MAIN] lib_init ok [MAIN] calling lib_exit [lib] unloading (no debugging symbols found) [thread] started [Thread -1208063056 (LWP 15868) exited] [thread] started [lib] unloaded ok [MAIN] lib_exit ok [MAIN] exiting Couldn't get registers: No such process. (gdb) - 3 ------------------------------------------------------------- Starting program: /home/jek/threads2/krol (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1208060224 (LWP 15870)] (no debugging symbols found) (no debugging symbols found) [MAIN] start (no debugging symbols found) [MAIN] init ok [MAIN] calling lib_init [lib] loading [New Thread -1208063056 (LWP 15871)] [thread] started [thread] working... [New Thread -1218552912 (LWP 15872)] [lib] loaded ok [MAIN] lib_init ok [MAIN] calling lib_exit [lib] unloading [thread] started [thread] working... (no debugging symbols found) [lib] unloaded ok [MAIN] lib_exit ok [MAIN] exiting Program exited normally. (gdb) - 4 ------------------------------------------------------------- Starting program: /home/jek/threads2/krol (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1208060224 (LWP 15990)] (no debugging symbols found) (no debugging symbols found) [MAIN] start (no debugging symbols found) [MAIN] init ok [MAIN] calling lib_init [lib] loading [New Thread -1208063056 (LWP 15991)] [thread] started [New Thread -1218552912 (LWP 15992)] [lib] loaded ok [MAIN] lib_init ok [MAIN] calling lib_exit [lib] unloading [thread] working... [thread] started (no debugging symbols found) [thread] working... Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1218552912 (zombie)] 0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1 (gdb) bt #0 0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1 #1 0x009c50ad in _Unwind_RaiseException () from /lib/libgcc_s.so.1 #2 0x009c514f in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1 #3 0x0096e2aa in _Unwind_ForcedUnwind () from /lib/tls/libpthread.so.0 #4 0x0096bf81 in __pthread_unwind () from /lib/tls/libpthread.so.0 #5 0x009663eb in sigcancel_handler () from /lib/tls/libpthread.so.0 #6 <signal handler called> #7 0x007eb2fc in __write_nocancel () from /lib/tls/libc.so.6 #8 0x00790bdf in _IO_new_file_write () from /lib/tls/libc.so.6 #9 0x0078f63b in _IO_new_do_write () from /lib/tls/libc.so.6 #10 0x007900e8 in _IO_new_file_overflow () from /lib/tls/libc.so.6 #11 0x00790d02 in _IO_new_file_xsputn () from /lib/tls/libc.so.6 #12 0x0076cef8 in vfprintf () from /lib/tls/libc.so.6 #13 0x00775450 in printf () from /lib/tls/libc.so.6 #14 0x00111877 in thread_starter () from ./libMyLib.so #15 0x00967341 in start_thread () from /lib/tls/libpthread.so.0 #16 0x007f9fee in clone () from /lib/tls/libc.so.6 (gdb) - 5 ------------------------------------------------------------- Starting program: /home/jek/threads2/krol (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1208060224 (LWP 17238)] (no debugging symbols found) (no debugging symbols found) [MAIN] start (no debugging symbols found) [MAIN] init ok [MAIN] calling lib_init [lib] loading [New Thread -1208063056 (LWP 17239)] [thread] started [thread] working... [New Thread -1218552912 (LWP 17240)] [lib] loaded ok [MAIN] lib_init ok [MAIN] calling lib_exit [lib] unloading (no debugging symbols found) [thread] started [thread] started [lib] unloaded ok [MAIN] lib_exit ok [MAIN] exiting [Thread -1218552912 (LWP 17240) exited] ptrace: No such process. [Switching to Thread -1218552912 (zombie)] Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x00966801 in __nptl_death_event () from /lib/tls/libpthread.so.0 (gdb) bt #0 0x00966801 in __nptl_death_event () from /lib/tls/libpthread.so.0 Error accessing memory address 0x966800: No such process.
This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 4.4 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 4.4 release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0510.html