Bug 111548
Summary: | calling pthread_cancel in a muti-thread c++ application abort()s the app. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Dan Nuffer <redhatbugzilla> | ||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | bkoz, drepper, francois-xavier.kowalski, gav, tdevanes | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-09-28 09:23:21 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Dan Nuffer
2003-12-05 06:31:54 UTC
Created attachment 96366 [details]
Program that demonstrates the problem.
This program will seem to work most of the time, but if you run it repeatedly,
it will eventually fail:
[dan@heather tmp]$ while ./a.out >/dev/null; do true; done
FATAL: exception not rethrown
Aborted
Well, the std::cout << line certainly cannot come after setting exceptions to asynchronous (see http://www.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_09.html#tag_02_09_05_04 ) But moving it before the two pthread_* calls doesn't seem to cure the situation, nor adding -fasynchronous-unwind-tables commandline option. The problem seems to be that the I/O code in libstdc++ calls cancelable functions but doesn't have unwind info for the entire call path. This is the backtrace: #0 unwind_cleanup (reason=_URC_FOREIGN_EXCEPTION_CAUGHT, exc=0x40b52dd0) at unwind.c:100 #1 0x4bd4f198 in _Unwind_DeleteException (exc=0x40b52dd0) at unwind.inc:268 #2 0x4bd203b0 in __cxa_end_catch () at ../../../../libstdc++-v3/libsupc++/eh_catch.cc:117 #3 0x4bd04e77 in std::ostream::write(char const*, int) (this=0x8049de0, __s=0x4001ecec "U\211�WS�", __n=8) at ios_base.h:121 #4 0x4bd055bf in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (__out=@0x8049de0, __s=0x8048b5c "started\n") at ostream.tcc:651 #5 0x0804892a in the_thread(void*) () at u.cc:12 This is in the corrected version where async cancellation is only enabled later. We need to look at every place in libstdc++ where is calls cancelable functions and make sure all callers of those functions (transitively) are compiled with unwind info. Complete backrace from the point the cancellation was thrown: /usr/src/libc/obj/nptl/libpthread.so.0 [0x400236f0] /usr/src/libc/obj/elf/ld.so [0x40000c22] /usr/src/libc/obj/libc.so.6(__write+0x4b) [0x4010b9db] /usr/src/libc/obj/libc.so.6(_IO_file_write+0x3f) [0x400a6e2f] /usr/src/libc/obj/libc.so.6 [0x400a5dbe] /usr/src/libc/obj/libc.so.6(_IO_do_write+0x36) [0x400a5d56] /usr/src/libc/obj/libc.so.6(_IO_file_overflow+0x159) [0x400a6469] /usr/src/libc/obj/libc.so.6(_IO_file_xsputn+0xc1) [0x400a6f51] /usr/src/libc/obj/libc.so.6(_IO_fwrite+0x12f) [0x4009bb5f] /usr/lib/libstdc++.so.5(_ZNSt12__basic_fileIcE6xsputnEPKci+0x38) [0x71abb8] /usr/lib/libstdc++.so.5(_ZNSt13basic_filebufIcSt11char_traitsIcEE22_M_convert_to_externalEPciRiS4_+0x1d1) [0x6cd361] /usr/lib/libstdc++.so.5(_ZNSt13basic_filebufIcSt11char_traitsIcEE18_M_really_overflowEi+0xf1) [0x6cd0f1] /usr/lib/libstdc++.so.5(_ZNSt13basic_filebufIcSt11char_traitsIcEE8overflowEi+0x9c) [0x6ccffc] /usr/lib/libstdc++.so.5(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKci+0x94) [0x709ee4] /usr/lib/libstdc++.so.5(_ZNSt13basic_filebufIcSt11char_traitsIcEE6xsputnEPKci+0x38) [0x6cd8b8] /usr/lib/libstdc++.so.5(_ZNSo5writeEPKci+0x53) [0x6fff43] /usr/lib/libstdc++.so.5(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0xff) [0x7006ef] /tmp/W(_Z10the_threadPv+0x20) [0x8048c0a] /usr/src/libc/obj/nptl/libpthread.so.0 [0x4001cc5c] /usr/src/libc/obj/libc.so.6(__clone+0x5a) [0x401198ca] i get this problem too, using a glibc 2.3.3 snapshot (dated 2004-02-07) and (vanilla) kernel 2.6.3 with nptl enabled. this is the only reference to this bug i can find on google, but it's hampering my coding :-(. how to produce: take two threads, A & B and a mutex M and condition C: A aquires M, sleeps for a second then frees M and exits normally. B aquires M, then waits on C indefinately. main() starts A, then B, then cancels/joins B, then cancels A, if still running. what should happen: 0. both threads start 1. A aquires M; B blocks, waiting for M to become unlocked. 2. B is cancelled, which is deferred, blocking main(). 3. A unlocks M. 4. A exits; B aquires M, blocks on C indefinately. 5. B, having reached a cancellation point is cancelled. 6. Program exits. what actually happens: 0-3. Correct. 4. Error given immediately after A's "main" function is exitted "FATAL: exception not rethrown". Program immediately aborts. i'm having trouble getting gdb to function correctly, so i cant really give an accurate backtrace, but it sppears to be much the same problem described above. is there any news on a workaround/fix? cheers, gav Gav, if you don't use PTHREAD_CANCEL_ASYNCHRONOUS, it is unrelated and you should file a new bugreport instead of appending to an unrelated one. Can you come up with a simple testcase which you can reproduce things on? done - bug number is #118490. gav I'm closing the bug. The original poster never got back and all points to using library functions while async cancel mode is enabled. This is always, 100% of the time, forbidden. Hi in Lehman we are porting our machines to Redhat AS3.0. When I try to port my code from previous version to AS3.0 I am having the same probelem. I could not come up with any decent solution to address this issue. If the thread code is calling a "non-yielding" method from a vendor library, there is no way we can exit the thread other than calling the pthread_cancel. Modifying the non-yielding vendor API to yielding ( giving up control ) is not possible. Is there any way I can make use of pthread_exit() , or any other method to solve this problem. My temporary solution is to set the LD_KERNEL_ASSUME=2.49.. which makes use of Linuxthreads. |