Bug 126111 - pthread_key_create destructor function, and pthread_join don't work during shared library destructors
pthread_key_create destructor function, and pthread_join don't work during sh...
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: glibc (Show other bugs)
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2004-06-16 03:02 EDT by Noam Lampert
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version: 2.3.2-95.24
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-09-10 15:53:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
source of reproducer sample (5.72 KB, text/plain)
2004-06-16 03:03 EDT, Noam Lampert
no flags Details

  None (edit)
Description Noam Lampert 2004-06-16 03:02:17 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 
MyIE2; .NET CLR 1.1.4322)

Description of problem:
When a share library performs during its destructors the sequence 
pthread_cancel(tid), pthread_join(tid) then according to the debugger 
the thread dies, but destructor functions registered during 
pthread_key_create are not executed, and pthread_join() hangs 
indefinitely. It is possible the debugger is not telling the complete 
truth and the thread is still alive.

A sample program demonstrating this is attached.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. extract the sample shar file by executing it
2. build the program using 'gmake'
3. setenv LD_LIBRARY_PATH .
4. ./exe

Actual Results:  Error: key destructor not called
(and infinite hang)

Expected Results:  Success
(and process exit)

Additional info:
Comment 1 Noam Lampert 2004-06-16 03:03:21 EDT
Created attachment 101183 [details]
source of reproducer sample
Comment 2 Noam Lampert 2004-06-16 03:05:51 EDT
I forgot to mention that this is a regression.

In RH EL 3.0 update 1 the sample works (the bug does not occur)
In RH EL 3.0 update 2 this bug is easily reproduced.
Comment 3 Jakub Jelinek 2004-06-16 04:56:45 EDT
I wonder how this could work in U1.
The problem is:
1) shared library destructors are executed with the dl_load_lock
   held to ensure no new shared libraries are loaded during running
   of the destructors.
   This is in the initial thread
2) when a thread is to be cancelled, it uses the unwinder in libgcc_s
   to unwind through the frames, run any pthread cleanups and class
   destructors on the way up
3) the unwinder in libgcc_s uses dl_iterate_phdr interface to query
   all currently loaded shared libraries (this is executed in the
   context of the child thread)
4) dl_iterate_phdr acquires the dl_load_lock, to make sure no new
   shared library is loaded and especially that no shared library
   is unloaded while executing this function.
   But, dl_load_lock, although it is a recursive lock, is already held
   by the initial thread, so the child thread gets stuck here until
   the initial thread releases it after it is done with its constructors
Comment 4 Yuval Kfir 2004-07-01 07:48:32 EDT
I think it worked in U1 because the regression was introduced by the 
patch glibc-dladdr-locking.patch of 2004-02-20.  Removing this patch 
fixes the problem.  I don't suppose this is the way you want to 
proceed though.
Comment 5 Jakub Jelinek 2004-09-10 15:53:20 EDT
This should be fixed in U3.

Note You need to log in before you can comment on or make changes to this bug.