Description of problem: My application is a multi-threaded program. The main thread waits for user input on the console and a few other threads wait for user requests coming from clients. Since we moved to Advanced Server 2.1, something peculiar is happening. If the client sends a shutdown command, which is picked up by one of the request listener threads, the thread does some shutdown related processing and then calls 'exit()'. All other existing threads are gone, but the thread which is handling the shutdown request, hangs in exit. If I do a ps it shows up as a zombie ( with ppid as 1 ). I took the same binaries and ran the same test on Linux 7.3 and things seem to work fine. When I try to debug this on AS 2.1 using gdb, gdb also hangs when the thread calls 'exit()' and if I try to break it, gdb gives 'internal error'. Since things work fine on RedHat 7.3 I am assuming it is most likely something to do with glibc or the kernel ?? I am not doing any async IO. The stack looks like below....... #0 0x402dbbe5 in __sigsuspend (set=0x42c278bc) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x40271249 in __pthread_wait_for_restart_signal (self=0x42c27be0) at pthread.c:1019 #2 0x40272a9c in __pthread_lock (lock=0x403e1df0, self=0x42c27be0) at spinlock.c:149 #3 0x4026fd06 in __pthread_mutex_lock (mutex=0x403e1de0) at mutex.c:109 #4 0x40272072 in __flockfile (stream=0x403e1ec0) at lockfile.c:39 #5 0x4032aa49 in _IO_flush_all () at genops.c:825 #6 0x4032b739 in _IO_cleanup () at genops.c:903 #7 0x402de5c2 in exit (status=0) at exit.c:74 #8 0x080551f8 in adRequest (tctx=0x80a64e4) at agtreq.c:1694 #9 0x4026ec2f in pthread_start_thread (arg=0x42c27be0) at manager.c:279 Version-Release number of selected component (if applicable): I am using 2.4.9-e.27smp kernel and glibc-2.2.4-32.8 How reproducible: I tried writing a reproducible case independent of my app, but couldn't. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: The same application runs fine on RedHat 7.3 with the 2.4.18* kernel and 2.2.5- 43 glibc version. I tried both, compiling on RedHat 7.3 and running on RedHat 7.3 AND compiling on AS 2.1 and running RedHat 7.3. It works fine on 7.3 either way. When I compile on AS 2.1 and run on AS 2.1 it reproduces the bug.
Now I have a reproducible case I am attaching a simple test program which reproduces this #include <stdio.h> #include <pthread.h> #ifndef AIX #define LOOPER 100000000 #define MOD 10000000 #else #define LOOPER 400000000 #define MOD 40000000 #endif int thr_fun(void) { int i; #if 1 sleep(10); #endif printf("from thread just before calling exit\n"); exit(0); } int main(void) { pthread_t thread; long i; pthread_attr_t attr; pthread_attr_init(&attr); /* initialize attr with default attributes */ if(pthread_create(&thread, &attr,thr_fun, NULL)) printf("thread create failed\n"); getchar(); } to compile it gcc -o linuxhang linuxhang.c -L/usr/lib -lpthread; run this program and it should die in 10 seconds, but one thread hangs on exit (); - Shailesh
Please try ftp://people.redhat.com/jakub/glibc/errata/2.2.4-32.11/
I tried downloading this patch but it is missing glibc-common for i686 ?? - Shailesh
Of course, that's how it has been since introduction of glibc-common (which was introduced for this reason). If you are on i686 machine, you need to install *.i686.rpm packages where they are available and *.i386.rpm versions of the remaining ones.
Ping! Can you confirm your problem went away? I'll close the bug soon if I don't hear anything.