Bug 106960

Summary: multi threaded process hangs on exit().
Product: Red Hat Enterprise Linux 2.1 Reporter: Shailesh Phansalkar <shailesh_phansalkar>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: drepper, fweimer, shailesh_phansalkar
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-30 09:37:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 106715    

Description Shailesh Phansalkar 2003-10-13 23:22:34 UTC
Description of problem:

My application is a multi-threaded program. The main thread waits for user 
input on the console and a few other threads wait for user requests coming from 
clients. Since we moved to Advanced Server 2.1, something peculiar is 
happening. If the client sends a shutdown command, which is picked up by one of 
the request listener threads, the thread does some shutdown related processing 
and then calls 'exit()'. All other existing threads are gone, but the thread 
which is handling the shutdown request, hangs in exit. If I do a ps it shows up 
as a zombie ( with ppid as 1 ). I took the same binaries and ran the same test 
on Linux 7.3 and things seem to work fine. When I try to debug this on AS 2.1 
using gdb, gdb also hangs when the thread calls 'exit()' and if I try to break 
it, gdb gives 'internal error'.
Since things work fine on RedHat 7.3 I am assuming it is most likely something 
to do with glibc or the kernel ??

I am not doing any async IO.

The stack looks like below.......

#0  0x402dbbe5 in __sigsuspend (set=0x42c278bc)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x40271249 in __pthread_wait_for_restart_signal (self=0x42c27be0)
    at pthread.c:1019
#2  0x40272a9c in __pthread_lock (lock=0x403e1df0, self=0x42c27be0)
    at spinlock.c:149
#3  0x4026fd06 in __pthread_mutex_lock (mutex=0x403e1de0) at mutex.c:109
#4  0x40272072 in __flockfile (stream=0x403e1ec0) at lockfile.c:39
#5  0x4032aa49 in _IO_flush_all () at genops.c:825
#6  0x4032b739 in _IO_cleanup () at genops.c:903
#7  0x402de5c2 in exit (status=0) at exit.c:74
#8  0x080551f8 in adRequest (tctx=0x80a64e4) at agtreq.c:1694
#9  0x4026ec2f in pthread_start_thread (arg=0x42c27be0) at manager.c:279

Version-Release number of selected component (if applicable):

I am using  2.4.9-e.27smp kernel and glibc-2.2.4-32.8

How reproducible:

I tried writing a reproducible case independent of my app, but couldn't.

Steps to Reproduce:
1.
2.
3.
    
Actual results:


Expected results:


Additional info:

The same application runs fine on RedHat 7.3 with the 2.4.18* kernel and 2.2.5-
43 glibc version. I tried both, compiling on RedHat 7.3 and running on RedHat 
7.3 AND compiling on AS 2.1 and running RedHat 7.3. It works fine on 7.3 either 
way.

When I compile on AS 2.1 and run on AS 2.1 it reproduces the bug.

Comment 1 Shailesh Phansalkar 2003-10-17 18:30:44 UTC
Now I have a reproducible case I am attaching a simple test program which 
reproduces this

#include <stdio.h>
#include <pthread.h>
#ifndef AIX
#define LOOPER 100000000
#define MOD     10000000
#else
#define LOOPER 400000000
#define MOD     40000000
#endif
int thr_fun(void)
{
    int i;
#if 1
    sleep(10);
#endif
    printf("from thread just before calling exit\n");
    exit(0);
}
int main(void)
{
    pthread_t thread;
    long i;
    pthread_attr_t attr;
    pthread_attr_init(&attr); /* initialize attr with default attributes */
    if(pthread_create(&thread, &attr,thr_fun, NULL))
        printf("thread create failed\n");

    getchar();
}

to compile it
gcc  -o linuxhang linuxhang.c -L/usr/lib -lpthread; 

run this program and it should die in 10 seconds, but one thread hangs on exit
();

- Shailesh


Comment 2 Jakub Jelinek 2003-11-03 22:59:59 UTC
Please try ftp://people.redhat.com/jakub/glibc/errata/2.2.4-32.11/

Comment 3 Shailesh Phansalkar 2003-11-03 23:50:13 UTC
I tried downloading this patch but it is missing glibc-common for 
i686 ??

- Shailesh


Comment 4 Jakub Jelinek 2003-11-03 23:54:53 UTC
Of course, that's how it has been since introduction of glibc-common
(which was introduced for this reason).
If you are on i686 machine, you need to install *.i686.rpm packages
where they are available and *.i386.rpm versions of the remaining
ones.

Comment 6 Ulrich Drepper 2004-09-28 07:49:13 UTC
Ping!  Can you confirm your problem went away?  I'll close the bug
soon if I don't hear anything.