Bug 168266 - Multithreaded applications which use dynamic libraries occasionally crash with SIGSEGV
Summary: Multithreaded applications which use dynamic libraries occasionally crash wit...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: glibc
Version: 4.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 181409
TreeView+ depends on / blocked
 
Reported: 2005-09-14 07:49 UTC by Evgeny Baskakov
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version: RHBA-2006-0510
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 21:33:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0510 0 normal SHIPPED_LIVE glibc bug fix update 2006-08-09 04:00:00 UTC

Description Evgeny Baskakov 2005-09-14 07:49:20 UTC
Description of problem:

Simple and complex multithreaded applications which use dynamic libraries
occasionally crash with "segmentation fault".

Version-Release number of selected component (if applicable):

glibc-2.3.4-2.9

How reproducible:

Sometimes.

Steps to Reproduce:

1. Copy and save the following C source files:

-------------------- krol.c --------------------------
#include <pthread.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>

#define __USE_GNU
#include <dlfcn.h>

void dlcheck(void *p) {
    if(!p) {
        printf("[MAIN] ERROR: %s\n", dlerror());
        exit(1);
    }
}

int main() {
    char library[] = "./libMyLib.so";
    void *lib_handle;
    void (*lib_init)() = NULL;
    void (*lib_exit)() = NULL;

    printf("[MAIN] start\n");

    lib_handle = dlopen(library, RTLD_LAZY);
    dlcheck(lib_handle);

    lib_init = dlsym(lib_handle, "lib_init");
    dlcheck(lib_init);

    lib_exit = dlsym(lib_handle, "lib_exit");
    dlcheck(lib_exit);

    printf("[MAIN] init ok\n");
    printf("[MAIN] calling lib_init\n");

    lib_init();

    printf("[MAIN] lib_init ok\n");
    printf("[MAIN] calling lib_exit\n");

    lib_exit();

    printf("[MAIN] lib_exit ok\n");
    printf("[MAIN] exiting\n");

    return 0;
}
------------------------------------------------------

-------------------- lib1.c --------------------------
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define N 2

pthread_t threads[N];

pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

void check(const char *func, int err) {
  if(err != 0) {
      printf("%s FAILED: %s\n", func, strerror(err));
      exit(2);
  }
}

void *thread_starter(void *d) {
  int err;

  printf("[thread] started\n");

  err = pthread_mutex_lock (&mut);
  check("pthread_mutex_lock", err);

  while(1) {
    printf("[thread] working...\n");

    err = pthread_cond_wait (&cond, &mut);
    check("pthread_cond_wait", err);
  }
}


void lib_init() {
  int err, i;

  printf("[lib] loading\n");

  for(i = 0; i < N; i++) {
     err = pthread_create(&threads[i], NULL, thread_starter, NULL);
     check("pthread_cond_wait", err);
  }

  printf("[lib] loaded ok\n");
}


void lib_exit() {
  int err, i;

  printf("[lib] unloading\n");
 
  for(i = 0; i < N; i++) {
    err = pthread_cancel(threads[i]);
    check("pthread_cancel", err);
  }

  printf("[lib] unloaded ok\n");
}
------------------------------------------------------

2. Compile them:

gcc -Wall -shared lib1.c -o libMyLib.so -ldl -pthread
gcc -Wall krol.c -o krol -ldl -pthread

3. Run the following command (I used the BASH shell):

i=0 ; while true; do echo "---$i" ; ./krol || break ; (( ++i )) ; done
  
Actual results:

The program occasionally crash with SIGSEGV in different points of time.
Below are two example outputs which I observed:

......---15
[MAIN] start
[MAIN] init ok
[MAIN] calling lib_init
[lib] loading
[lib] loaded ok
[MAIN] lib_init ok
[MAIN] calling lib_exit
[lib] unloading
[lib] unloaded ok
[MAIN] lib_exit ok
[MAIN] exiting
[thread] started
Segmentation fault

......---632
[MAIN] start
[MAIN] init ok
[MAIN] calling lib_init
[lib] loading
[lib] loaded ok
[MAIN] lib_init ok
[MAIN] calling lib_exit
[lib] unloading
[thread] started
[thread] started
[lib] unloaded ok
[MAIN] lib_exit ok
[MAIN] exiting
Segmentation fault

Expected results:

The program must never crash, so the command must never finish.

Additional info:

I used the gdb debugger to catch the crash. In most cases, the program 
finishes successfully, but it nevertheless crashes sometime.

Below are a few example session logs:

- 1 -------------------------------------------------------------
Starting program: /home/jek/threads2/krol
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1208060224 (LWP 15709)]
(no debugging symbols found)
(no debugging symbols found)
[MAIN] start
(no debugging symbols found)
[MAIN] init ok
[MAIN] calling lib_init
[lib] loading
[New Thread -1208063056 (LWP 15712)]
[thread] started
[thread] working...
[New Thread -1218552912 (LWP 15713)]
[lib] loaded ok
[MAIN] lib_init ok
[MAIN] calling lib_exit
[lib] unloading
(no debugging symbols found)
[thread] started

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218552912 (zombie)]
0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1
(gdb) bt
#0  0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1
#1  0x009c50ad in _Unwind_RaiseException () from /lib/libgcc_s.so.1
#2  0x009c514f in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1
#3  0x0096e2aa in _Unwind_ForcedUnwind () from /lib/tls/libpthread.so.0
#4  0x0096bf81 in __pthread_unwind () from /lib/tls/libpthread.so.0
#5  0x009663eb in sigcancel_handler () from /lib/tls/libpthread.so.0
#6  <signal handler called>
#7  0x007eb2fc in __write_nocancel () from /lib/tls/libc.so.6
#8  0x00790bdf in _IO_new_file_write () from /lib/tls/libc.so.6
#9  0x0078f63b in _IO_new_do_write () from /lib/tls/libc.so.6
#10 0x007900e8 in _IO_new_file_overflow () from /lib/tls/libc.so.6
#11 0x00790d02 in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#12 0x0076cef8 in vfprintf () from /lib/tls/libc.so.6
#13 0x00775450 in printf () from /lib/tls/libc.so.6
#14 0x0011f841 in thread_starter () from ./libMyLib.so
#15 0x00967341 in start_thread () from /lib/tls/libpthread.so.0
#16 0x007f9fee in clone () from /lib/tls/libc.so.6
(gdb)


- 2 -------------------------------------------------------------
Starting program: /home/jek/threads2/krol
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1208060224 (LWP 15867)]
(no debugging symbols found)
(no debugging symbols found)
[MAIN] start
(no debugging symbols found)
[MAIN] init ok
[MAIN] calling lib_init
[lib] loading
[New Thread -1208063056 (LWP 15868)]
[thread] started
[thread] working...
[New Thread -1218552912 (LWP 15869)]
[lib] loaded ok
[MAIN] lib_init ok
[MAIN] calling lib_exit
[lib] unloading
(no debugging symbols found)
[thread] started
[Thread -1208063056 (LWP 15868) exited]
[thread] started
[lib] unloaded ok
[MAIN] lib_exit ok
[MAIN] exiting
Couldn't get registers: No such process.
(gdb)


- 3 -------------------------------------------------------------
Starting program: /home/jek/threads2/krol
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1208060224 (LWP 15870)]
(no debugging symbols found)
(no debugging symbols found)
[MAIN] start
(no debugging symbols found)
[MAIN] init ok
[MAIN] calling lib_init
[lib] loading
[New Thread -1208063056 (LWP 15871)]
[thread] started
[thread] working...
[New Thread -1218552912 (LWP 15872)]
[lib] loaded ok
[MAIN] lib_init ok
[MAIN] calling lib_exit
[lib] unloading
[thread] started
[thread] working...
(no debugging symbols found)
[lib] unloaded ok
[MAIN] lib_exit ok
[MAIN] exiting

Program exited normally.
(gdb)


- 4 -------------------------------------------------------------
Starting program: /home/jek/threads2/krol
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1208060224 (LWP 15990)]
(no debugging symbols found)
(no debugging symbols found)
[MAIN] start
(no debugging symbols found)
[MAIN] init ok
[MAIN] calling lib_init
[lib] loading
[New Thread -1208063056 (LWP 15991)]
[thread] started
[New Thread -1218552912 (LWP 15992)]
[lib] loaded ok
[MAIN] lib_init ok
[MAIN] calling lib_exit
[lib] unloading
[thread] working...
[thread] started
(no debugging symbols found)
[thread] working...

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218552912 (zombie)]
0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1
(gdb) bt
#0  0x009c4733 in _Unwind_FindEnclosingFunction () from /lib/libgcc_s.so.1
#1  0x009c50ad in _Unwind_RaiseException () from /lib/libgcc_s.so.1
#2  0x009c514f in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1
#3  0x0096e2aa in _Unwind_ForcedUnwind () from /lib/tls/libpthread.so.0
#4  0x0096bf81 in __pthread_unwind () from /lib/tls/libpthread.so.0
#5  0x009663eb in sigcancel_handler () from /lib/tls/libpthread.so.0
#6  <signal handler called>
#7  0x007eb2fc in __write_nocancel () from /lib/tls/libc.so.6
#8  0x00790bdf in _IO_new_file_write () from /lib/tls/libc.so.6
#9  0x0078f63b in _IO_new_do_write () from /lib/tls/libc.so.6
#10 0x007900e8 in _IO_new_file_overflow () from /lib/tls/libc.so.6
#11 0x00790d02 in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#12 0x0076cef8 in vfprintf () from /lib/tls/libc.so.6
#13 0x00775450 in printf () from /lib/tls/libc.so.6
#14 0x00111877 in thread_starter () from ./libMyLib.so
#15 0x00967341 in start_thread () from /lib/tls/libpthread.so.0
#16 0x007f9fee in clone () from /lib/tls/libc.so.6
(gdb)


- 5 -------------------------------------------------------------
Starting program: /home/jek/threads2/krol
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1208060224 (LWP 17238)]
(no debugging symbols found)
(no debugging symbols found)
[MAIN] start
(no debugging symbols found)
[MAIN] init ok
[MAIN] calling lib_init
[lib] loading
[New Thread -1208063056 (LWP 17239)]
[thread] started
[thread] working...
[New Thread -1218552912 (LWP 17240)]
[lib] loaded ok
[MAIN] lib_init ok
[MAIN] calling lib_exit
[lib] unloading
(no debugging symbols found)
[thread] started
[thread] started
[lib] unloaded ok
[MAIN] lib_exit ok
[MAIN] exiting
[Thread -1218552912 (LWP 17240) exited]
ptrace: No such process.
[Switching to Thread -1218552912 (zombie)]
Cannot remove breakpoints because program is no longer writable.
It might be running in another process.
Further execution is probably impossible.
0x00966801 in __nptl_death_event () from /lib/tls/libpthread.so.0
(gdb) bt
#0  0x00966801 in __nptl_death_event () from /lib/tls/libpthread.so.0
Error accessing memory address 0x966800: No such process.

Comment 7 Bob Johnson 2006-04-11 16:13:33 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.

Comment 12 Red Hat Bugzilla 2006-08-10 21:33:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0510.html



Note You need to log in before you can comment on or make changes to this bug.