Bug 126111 - pthread_key_create destructor function, and pthread_join don't work during shared library destructors
Summary: pthread_key_create destructor function, and pthread_join don't work during sh...
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: glibc
Version: 3.0
Hardware: i686
OS: Linux
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2004-06-16 07:02 UTC by Noam Lampert
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Clone Of:
Last Closed: 2004-09-10 19:53:20 UTC

Attachments (Terms of Use)
source of reproducer sample (5.72 KB, text/plain)
2004-06-16 07:03 UTC, Noam Lampert
no flags Details

Description Noam Lampert 2004-06-16 07:02:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 
MyIE2; .NET CLR 1.1.4322)

Description of problem:
When a share library performs during its destructors the sequence 
pthread_cancel(tid), pthread_join(tid) then according to the debugger 
the thread dies, but destructor functions registered during 
pthread_key_create are not executed, and pthread_join() hangs 
indefinitely. It is possible the debugger is not telling the complete 
truth and the thread is still alive.

A sample program demonstrating this is attached.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. extract the sample shar file by executing it
2. build the program using 'gmake'
3. setenv LD_LIBRARY_PATH .
4. ./exe

Actual Results:  Error: key destructor not called
(and infinite hang)

Expected Results:  Success
(and process exit)

Additional info:

Comment 1 Noam Lampert 2004-06-16 07:03:21 UTC
Created attachment 101183 [details]
source of reproducer sample

Comment 2 Noam Lampert 2004-06-16 07:05:51 UTC
I forgot to mention that this is a regression.

In RH EL 3.0 update 1 the sample works (the bug does not occur)
In RH EL 3.0 update 2 this bug is easily reproduced.

Comment 3 Jakub Jelinek 2004-06-16 08:56:45 UTC
I wonder how this could work in U1.
The problem is:
1) shared library destructors are executed with the dl_load_lock
   held to ensure no new shared libraries are loaded during running
   of the destructors.
   This is in the initial thread
2) when a thread is to be cancelled, it uses the unwinder in libgcc_s
   to unwind through the frames, run any pthread cleanups and class
   destructors on the way up
3) the unwinder in libgcc_s uses dl_iterate_phdr interface to query
   all currently loaded shared libraries (this is executed in the
   context of the child thread)
4) dl_iterate_phdr acquires the dl_load_lock, to make sure no new
   shared library is loaded and especially that no shared library
   is unloaded while executing this function.
   But, dl_load_lock, although it is a recursive lock, is already held
   by the initial thread, so the child thread gets stuck here until
   the initial thread releases it after it is done with its constructors

Comment 4 Yuval Kfir 2004-07-01 11:48:32 UTC
I think it worked in U1 because the regression was introduced by the 
patch glibc-dladdr-locking.patch of 2004-02-20.  Removing this patch 
fixes the problem.  I don't suppose this is the way you want to 
proceed though.

Comment 5 Jakub Jelinek 2004-09-10 19:53:20 UTC
This should be fixed in U3.

Note You need to log in before you can comment on or make changes to this bug.