Bug 111548 - calling pthread_cancel in a muti-thread c++ application abort()s the app.
calling pthread_cancel in a muti-thread c++ application abort()s the app.
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: glibc (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2003-12-05 01:31 EST by Dan Nuffer
Modified: 2007-11-30 17:06 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-09-28 05:23:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Program that demonstrates the problem. (550 bytes, text/plain)
2003-12-05 01:38 EST, Dan Nuffer
no flags Details

  None (edit)
Description Dan Nuffer 2003-12-05 01:31:54 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)

Description of problem:
OpenWBEM (http://openwbem.sf.net/) is a multi-threaded c++ program. 
When running the unit tests for the Thread class, one of the tests
cancels a thread by calling pthread_cancel().  This normally works
just fine, but on RH 3.0 (and Fedora 1) quite often abort() will be
called after the following message has been printed to stderr:
FATAL: exception not rethrown

This seems to happen quite often (~3/4 of the time) on a dual-cpu box,
and a little more rare (~1/10th of the time) on a single.

There is a section of code in the first function called from the
thread function which is essentially
catch (...)
This is to prevent any unexpected exceptions from propagating up any
further, which would cause a segfault.

Judging from the message it seems to imply that the exception would
need to be rethrown.  

I think this is wrong.  The new forced stack unwinding for thread
cancellation hasn't made cancellation any easier or safer to use.  It
seems as if it will /always/ abort the app.  If the exception's not
caught, then you segfault, if it is caught, then you segfault.  How do
you stop it?

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Check out the code from OpenWBEM CVS.
2. Build it.
3. run "make check" in the test/unit subdir

Actual Results:  The test aborted with the following error:
FATAL: exception not rethrown

Expected Results:  It should have finished sucessfully.

Additional info:
I did read the small section in the release notes about this, and
found the information rather sparse, and I couldn't find any other
information anywhere.
OpenWBEM doesn't use throw() or -fno-exceptions.
The whole thing about disabling/enabling cancellation whenever calling
a C function is completely impractical.  If cancellation can't unwind
the stack correctly while ignoring any throw() or catch(...){} code
that would impede a normal exception, then it should work as it did
before and not bother to unwind the stack.
Comment 1 Dan Nuffer 2003-12-05 01:38:39 EST
Created attachment 96366 [details]
Program that demonstrates the problem.

This program will seem to work most of the time, but if you run it repeatedly,
it will eventually fail:

[dan@heather tmp]$ while ./a.out >/dev/null; do true; done
FATAL: exception not rethrown
Comment 2 Jakub Jelinek 2003-12-11 16:32:29 EST
Well, the std::cout << line certainly cannot come after setting
exceptions to asynchronous (see http://www.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_09.html#tag_02_09_05_04 )
But moving it before the two pthread_* calls doesn't seem to cure the
situation, nor adding -fasynchronous-unwind-tables commandline option.
Comment 3 Ulrich Drepper 2003-12-11 18:15:02 EST
The problem seems to be that the I/O code in libstdc++ calls
cancelable functions but doesn't have unwind info for the entire call
path.  This is the backtrace:

#0  unwind_cleanup (reason=_URC_FOREIGN_EXCEPTION_CAUGHT, exc=0x40b52dd0)
    at unwind.c:100
#1  0x4bd4f198 in _Unwind_DeleteException (exc=0x40b52dd0) at
#2  0x4bd203b0 in __cxa_end_catch ()
    at ../../../../libstdc++-v3/libsupc++/eh_catch.cc:117
#3  0x4bd04e77 in std::ostream::write(char const*, int) (this=0x8049de0,
    __s=0x4001ecec "U\211�WS�", __n=8) at ios_base.h:121
#4  0x4bd055bf in std::basic_ostream<char, std::char_traits<char> >&
std::operator<< <std::char_traits<char> >(std::basic_ostream<char,
std::char_traits<char> >&, char const*) (__out=@0x8049de0,
__s=0x8048b5c "started\n")
    at ostream.tcc:651
#5  0x0804892a in the_thread(void*) () at u.cc:12

This is in the corrected version where async cancellation is only
enabled later.

We need to look at every place in libstdc++ where is calls cancelable
functions and make sure all callers of those functions (transitively)
are compiled with unwind info.
Comment 4 Ulrich Drepper 2003-12-11 19:45:33 EST
Complete backrace from the point the cancellation was thrown:

/usr/src/libc/obj/nptl/libpthread.so.0 [0x400236f0]
/usr/src/libc/obj/elf/ld.so [0x40000c22]
/usr/src/libc/obj/libc.so.6(__write+0x4b) [0x4010b9db]
/usr/src/libc/obj/libc.so.6(_IO_file_write+0x3f) [0x400a6e2f]
/usr/src/libc/obj/libc.so.6 [0x400a5dbe]
/usr/src/libc/obj/libc.so.6(_IO_do_write+0x36) [0x400a5d56]
/usr/src/libc/obj/libc.so.6(_IO_file_overflow+0x159) [0x400a6469]
/usr/src/libc/obj/libc.so.6(_IO_file_xsputn+0xc1) [0x400a6f51]
/usr/src/libc/obj/libc.so.6(_IO_fwrite+0x12f) [0x4009bb5f]
/usr/lib/libstdc++.so.5(_ZNSo5writeEPKci+0x53) [0x6fff43]
/tmp/W(_Z10the_threadPv+0x20) [0x8048c0a]
/usr/src/libc/obj/nptl/libpthread.so.0 [0x4001cc5c]
/usr/src/libc/obj/libc.so.6(__clone+0x5a) [0x401198ca]
Comment 5 Gav Wood 2004-03-16 18:16:25 EST
i get this problem too, using a glibc 2.3.3 snapshot (dated 
2004-02-07) and (vanilla) kernel 2.6.3 with nptl enabled. 
this is the only reference to this bug i can find on google, but 
it's hampering my coding :-(. 
how to produce: 
take two threads, A & B and a mutex M and condition C: 
A aquires M, sleeps for a second then frees M and exits normally. 
B aquires M, then waits on C indefinately. 
main() starts A, then B, then cancels/joins B, then cancels A, if 
still running. 
what should happen: 
0. both threads start 
1. A aquires M; B blocks, waiting for M to become unlocked. 
2. B is cancelled, which is deferred, blocking main(). 
3. A unlocks M. 
4. A exits; B aquires M, blocks on C indefinately. 
5. B, having reached a cancellation point is cancelled. 
6. Program exits. 
what actually happens: 
0-3. Correct. 
4. Error given immediately after A's "main" function is exitted 
"FATAL: exception not rethrown". Program immediately aborts. 
i'm having trouble getting gdb to function correctly, so i cant 
really give an accurate backtrace, but it sppears to be much the 
same problem described above. 
is there any news on a workaround/fix? 
Comment 6 Jakub Jelinek 2004-03-16 18:30:12 EST
Gav, if you don't use PTHREAD_CANCEL_ASYNCHRONOUS, it is unrelated
and you should file a new bugreport instead of appending to an unrelated
one.  Can you come up with a simple testcase which you can reproduce
things on?
Comment 7 Gav Wood 2004-03-16 19:18:41 EST
done - bug number is #118490. 
Comment 8 Ulrich Drepper 2004-09-28 05:23:21 EDT
I'm closing the bug.  The original poster never got back and all
points to using library functions while async cancel mode is enabled.
 This is always, 100% of the time, forbidden.
Comment 9 Raj Devanesan 2005-04-15 02:56:44 EDT
Hi in Lehman we are porting our machines to Redhat AS3.0. When I try to port my 
code from previous version to AS3.0 I am having the same probelem. I could not 
come up with any decent solution to address this issue. If the thread code is 
calling a "non-yielding" method from a vendor library, there is no way we can 
exit the thread other than calling the pthread_cancel. 

Modifying the non-yielding vendor API to yielding ( giving up control ) is not 
possible. Is there any way I can make use of pthread_exit() , or any other 
method to solve this problem. My temporary solution is to set the 
LD_KERNEL_ASSUME=2.49.. which makes use of Linuxthreads.  

Note You need to log in before you can comment on or make changes to this bug.