Bug 106960 - multi threaded process hangs on exit().
multi threaded process hangs on exit().
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: glibc (Show other bugs)
2.1
i686 Linux
high Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
Depends On:
Blocks: 106715
  Show dependency treegraph
 
Reported: 2003-10-13 19:22 EDT by Shailesh Phansalkar
Modified: 2007-11-30 17:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-10-30 05:37:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shailesh Phansalkar 2003-10-13 19:22:34 EDT
Description of problem:

My application is a multi-threaded program. The main thread waits for user 
input on the console and a few other threads wait for user requests coming from 
clients. Since we moved to Advanced Server 2.1, something peculiar is 
happening. If the client sends a shutdown command, which is picked up by one of 
the request listener threads, the thread does some shutdown related processing 
and then calls 'exit()'. All other existing threads are gone, but the thread 
which is handling the shutdown request, hangs in exit. If I do a ps it shows up 
as a zombie ( with ppid as 1 ). I took the same binaries and ran the same test 
on Linux 7.3 and things seem to work fine. When I try to debug this on AS 2.1 
using gdb, gdb also hangs when the thread calls 'exit()' and if I try to break 
it, gdb gives 'internal error'.
Since things work fine on RedHat 7.3 I am assuming it is most likely something 
to do with glibc or the kernel ??

I am not doing any async IO.

The stack looks like below.......

#0  0x402dbbe5 in __sigsuspend (set=0x42c278bc)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x40271249 in __pthread_wait_for_restart_signal (self=0x42c27be0)
    at pthread.c:1019
#2  0x40272a9c in __pthread_lock (lock=0x403e1df0, self=0x42c27be0)
    at spinlock.c:149
#3  0x4026fd06 in __pthread_mutex_lock (mutex=0x403e1de0) at mutex.c:109
#4  0x40272072 in __flockfile (stream=0x403e1ec0) at lockfile.c:39
#5  0x4032aa49 in _IO_flush_all () at genops.c:825
#6  0x4032b739 in _IO_cleanup () at genops.c:903
#7  0x402de5c2 in exit (status=0) at exit.c:74
#8  0x080551f8 in adRequest (tctx=0x80a64e4) at agtreq.c:1694
#9  0x4026ec2f in pthread_start_thread (arg=0x42c27be0) at manager.c:279

Version-Release number of selected component (if applicable):

I am using  2.4.9-e.27smp kernel and glibc-2.2.4-32.8

How reproducible:

I tried writing a reproducible case independent of my app, but couldn't.

Steps to Reproduce:
1.
2.
3.
    
Actual results:


Expected results:


Additional info:

The same application runs fine on RedHat 7.3 with the 2.4.18* kernel and 2.2.5-
43 glibc version. I tried both, compiling on RedHat 7.3 and running on RedHat 
7.3 AND compiling on AS 2.1 and running RedHat 7.3. It works fine on 7.3 either 
way.

When I compile on AS 2.1 and run on AS 2.1 it reproduces the bug.
Comment 1 Shailesh Phansalkar 2003-10-17 14:30:44 EDT
Now I have a reproducible case I am attaching a simple test program which 
reproduces this

#include <stdio.h>
#include <pthread.h>
#ifndef AIX
#define LOOPER 100000000
#define MOD     10000000
#else
#define LOOPER 400000000
#define MOD     40000000
#endif
int thr_fun(void)
{
    int i;
#if 1
    sleep(10);
#endif
    printf("from thread just before calling exit\n");
    exit(0);
}
int main(void)
{
    pthread_t thread;
    long i;
    pthread_attr_t attr;
    pthread_attr_init(&attr); /* initialize attr with default attributes */
    if(pthread_create(&thread, &attr,thr_fun, NULL))
        printf("thread create failed\n");

    getchar();
}

to compile it
gcc  -o linuxhang linuxhang.c -L/usr/lib -lpthread; 

run this program and it should die in 10 seconds, but one thread hangs on exit
();

- Shailesh
Comment 2 Jakub Jelinek 2003-11-03 17:59:59 EST
Please try ftp://people.redhat.com/jakub/glibc/errata/2.2.4-32.11/
Comment 3 Shailesh Phansalkar 2003-11-03 18:50:13 EST
I tried downloading this patch but it is missing glibc-common for 
i686 ??

- Shailesh
Comment 4 Jakub Jelinek 2003-11-03 18:54:53 EST
Of course, that's how it has been since introduction of glibc-common
(which was introduced for this reason).
If you are on i686 machine, you need to install *.i686.rpm packages
where they are available and *.i386.rpm versions of the remaining
ones.
Comment 6 Ulrich Drepper 2004-09-28 03:49:13 EDT
Ping!  Can you confirm your problem went away?  I'll close the bug
soon if I don't hear anything.

Note You need to log in before you can comment on or make changes to this bug.