Bug 104689

Summary: Crash in ld-linux.so test IBM Tivoli Access Manager WebSEAL
Product: Red Hat Enterprise Linux 3 Reporter: Matt Rodkey <mrodkey>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: bugproxy, tfinucan
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-09-26 13:19:30 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Matt Rodkey 2003-09-19 01:14:38 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.1; Linux)

Description of problem:
This occoured using the "junction delete" feature of WebSEAL
Here is the gdb stack trace:

(gdb) run -foreground
[Thread debugging using libthread_db enabled]
[New Thread -1218532992 (LWP 17193)]

Access Manager WebSEAL Version (Build 030917, Debug)

Copyright (C) IBM Corporation 1994-2002.  All Rights Reserved.

[New Thread 82824112 (LWP 17200)]
[New Thread 129706928 (LWP 17201)]
[New Thread 28683184 (LWP 17202)]
[New Thread 43838384 (LWP 17203)]
[New Thread 54328240 (LWP 17204)]
[New Thread 64818096 (LWP 17205)]
[New Thread 93313968 (LWP 17206)]
[New Thread 103803824 (LWP 17207)]
[New Thread 8244144 (LWP 17208)]
[New Thread 11500464 (LWP 17209)]
[New Thread 13581232 (LWP 17210)]
[New Thread 13847472 (LWP 17211)]
[New Thread 14449584 (LWP 17212)]
[New Thread 15825840 (LWP 17213)]
[New Thread 16092080 (LWP 17214)]
[New Thread 16571312 (LWP 17215)]
[New Thread 28949424 (LWP 17216)]
[New Thread 111356848 (LWP 17217)]
[New Thread 29215664 (LWP 17218)]
[New Thread 29481904 (LWP 17219)]
[New Thread 29748144 (LWP 17220)]
[New Thread 30014384 (LWP 17221)]
[New Thread 30280624 (LWP 17222)]
[New Thread 30546864 (LWP 17223)]
[New Thread 30813104 (LWP 17224)]
[New Thread 31079344 (LWP 17225)]
[New Thread 31345584 (LWP 17226)]
[New Thread 31611824 (LWP 17227)]
[New Thread 31878064 (LWP 17228)]
[New Thread 166095792 (LWP 17229)]
[New Thread 165227440 (LWP 17230)]
[New Thread 32144304 (LWP 17231)]
[New Thread 65084336 (LWP 17232)]
[New Thread 168557488 (LWP 17233)]
[New Thread 65350576 (LWP 17234)]
[New Thread 65616816 (LWP 17235)]
[New Thread 65883056 (LWP 17236)]
[New Thread 160132016 (LWP 17237)]
[New Thread 66149296 (LWP 17238)]
[New Thread 158419888 (LWP 17239)]
[New Thread 66415536 (LWP 17240)]
[New Thread 66681776 (LWP 17241)]
[New Thread 66948016 (LWP 17242)]
[New Thread 67214256 (LWP 17243)]
[New Thread 67480496 (LWP 17244)]
[New Thread 70867888 (LWP 17245)]
[New Thread 67746736 (LWP 17246)]
[New Thread 68012976 (LWP 17247)]
[New Thread 116358064 (LWP 17248)]
[New Thread 68279216 (LWP 17249)]
[New Thread 68545456 (LWP 17250)]
[New Thread 68811696 (LWP 17251)]
[New Thread 69077936 (LWP 17252)]
[New Thread 69344176 (LWP 17253)]
[New Thread 69610416 (LWP 17254)]
[New Thread 132475824 (LWP 17255)]
[New Thread 69876656 (LWP 17256)]
[New Thread 70142896 (LWP 17257)]
[New Thread 147082160 (LWP 17258)]
[New Thread 115551152 (LWP 17259)]
[New Thread 70409136 (LWP 17260)]
[New Thread 183159728 (LWP 17261)]

Program received signal SIG32, Real-time event 32.
[Switching to Thread 70409136 (LWP 17260)]
0x0094ac02 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) c

Program received signal SIG32, Real-time event 32.
[Switching to Thread 115551152 (LWP 17259)]
0x0094ac02 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) c

Program received signal SIG32, Real-time event 32.
[Switching to Thread 8244144 (LWP 17208)]
0x0094ac02 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) c
[New Thread 140544944 (LWP 17262)]

Program received signal SIG32, Real-time event 32.
[Switching to Thread 140544944 (LWP 17262)]
0x0094ac02 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) c

Program received signal SIGABRT, Aborted.
0x0094ac02 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0x0094ac02 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00988c79 in raise () from /lib/tls/libc.so.6
#2  0x0098a4a3 in abort () from /lib/tls/libc.so.6
#3  0x0069d487 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.5
#4  0x0069d4d4 in std::terminate() () from /usr/lib/libstdc++.so.5
#5  0x0069d2fc in __gxx_personality_v0 () from /usr/lib/libstdc++.so.5
#6  0x00ab2332 in _Unwind_RaiseException () from /lib/libgcc_s.so.1
#7  0x00ab23cb in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1
#8  0x00f87cb4 in _Unwind_ForcedUnwind () from /lib/tls/libpthread.so.0
#9  0x00f85eb6 in __pthread_unwind () from /lib/tls/libpthread.so.0
#10 0x00f81408 in sigcancel_handler () from /lib/tls/libpthread.so.0
#11 <signal handler called>
#12 0x086084f4 in ?? ()
Previous frame inner to this frame (corrupt stack?)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start WebSEAL	
2. Create Junction
3. Delete Junction

Actual Results:  WebSEAL core dumps with the above stack trace

Expected Results:  Junction should be deleted

Additional info:
Comment 1 Jakub Jelinek 2003-09-19 09:40:02 EDT
This is not a crash in ld-linux.so, it is std::terminate() aborting (the normal
std::terminate() behaviour).
From the backtrace, most likely because a thread was cancelled while
which was not expected to throw exceptions. That would most likely point to
an application bug.
As workaround, you can force Linuxthreads, which never use exceptions for
cancellation (which means that destructors and cleanups are not run when
cancelling a thread).
Comment 2 Travis Finucane 2003-09-19 11:32:00 EDT
Thanks for your quick feedback.
Some additional information not available at the time the bug was opened:
Sure enough, the SIG32 signal (or SIGABORT) comes some short time following
calls to pthread_cancel(). 

When the thread leading to the SIGABORT is created, here are the attribute settings:

    LINUX_THREAD_STACK_SIZE = (256*1024)

    pthread_attr_t thread_attr;
    pthread_attr_setstacksize(&thread_attr, LINUX_THREAD_STACK_SIZE);
    rc = pthread_create(tp, &thread_attr, func, arg);

We never call pthread_setcanceltype(). The default cancel type should be
Comment 3 Matt Rodkey 2003-09-19 12:26:50 EDT
This is a blocking bug for the Tivoli Access Manager WebSEAL Team
Comment 4 Matt Rodkey 2003-09-19 13:12:12 EDT
From Jakub:  "As workaround, you can force Linuxthreads, which never use
exceptions for
cancellation (which means that destructors and cleanups are not run when
cancelling a thread)."

How would we do this?  Link options?

Comment 5 Matt Rodkey 2003-09-19 13:44:23 EDT
using LD_ASSUME_KERNEL=2.4.0  makes this problem go away.  Looks like it is a
problem in the NPTL
Comment 6 Matt Rodkey 2003-09-19 13:46:25 EDT
FYI this does not happen on RH9 which also uses NTPL just RHEL 3
Comment 7 Jakub Jelinek 2003-09-19 14:03:49 EDT
A thread can be PTHREAD_CANCEL_ASYNCHRONOUS either because you set it so
using pthread_setcanceltype, or temporarily when executing a cancellable
system call.
All functions which call cancellable glibc functions (those without
throw() on their prototype (maybe with the exception of functions which
are without throw() just because they call user defined callbacks which
might throw exceptions) must either allow exceptions (ie. avoid throw()) or
must be compiled without exceptions (-fno-exceptions (the default for C code,
unlike C++)).
This is necessary so that thread cancellation can DTRT with C++ destructors
and is new to post RHL9 NPTL.
If it is too hard to change your application to this, you can either put small
C wrappers around cancellable glibc functions (or C++ compiled without -fexceptions),
in which way you'll get the old linuxthreads behaviour where during pthread_cancel
no destructors are run for objects created in the thread, or use linuxthreads
(LD_ASSUME_KERNEL=2.4.19 gives you faster linuxthreads implementation than 2.4.0).
Comment 8 Matt Wilson 2003-09-25 20:35:45 EDT
Is this explanation acceptable?
Comment 9 Matt Rodkey 2003-09-26 13:19:30 EDT
We are happy with the LD_ASSUME_KERNEL solution, Closing this bug.