Bug 37933

Summary: incompatibility when C++ code is dynamically loaded into a process that is not a C++ program and subsequently throws an exception
Product: [Retired] Red Hat Linux Reporter: IBM Bug Proxy <bugproxy>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: byron, dcampbell, nphilipp
Target Milestone: ---   
Target Release: ---   
Hardware: i586   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-02 17:58:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test case
none
testcase for uncaught exception none

Description IBM Bug Proxy 2001-04-26 22:11:10 UTC
The incompatibility is encountered when C++ code is dynamically loaded into
a 
process that is not a C++ program and subsequently throws an exception
(error or 
normal).  __throw() in libstdc++ is called who subsequently calls 
__frame_state_for() which is in glibc.  __frame_state_for() fills in a
structure 
in storage provided by __throw() who thinks the size of the structure is
smaller 
than does __frame_state_for().   This overwrites the return address and the 
pointer on the stack to the previous stack frame.   __throw erroneously
decides 
that it has reached the top of the stack without finding an exception
handler so 
it terminates the process.   The problem is avoided in most C++ programs
because 
__frame_state_for() also exists in libstdc++ and is therefore compatible
with 
__throw().   If the program is C++, libstdc++ is loaded prior to glibc so
the 
loader happens to choose compatible versions.  

Tried to find a programatic way to force the version of __frame_state_for()
in 
libstdc++ to be used but it is too late to do this when the C++ code is
loaded 
dynamically which is the first opportunity that our code has. It is
important to 
note that this problem is the result of libstdc++ and glibc-2.2.x having
been 
compiled with different versions of the compiler and their associated
runtime 
routines.   The routine in question, __frame_state_for(), is part of the 
compiler runtime and gets statically linked with libstdc++ and glibc.

One of the main reasons for this bug is that we encountered the following 
situation:

1) We wrote a PERL program, and used PERL to C extenditions
2) Those extend called some C++ object files

Thus the result is a C program in this case the PERL interpreter calling
C++ 
code resulting in a CORE file.

Specifically the problem is in the runtime lib's of the GCC compiler being 
statically linked at there compile time ( this needs to happen ). Thus the
C lib 
is out of sync with the C++ libs. Thus the CORE when switching between the
two.

This results in NO backward compatibility of GCC code. Thus programs
compiled on 
RedHat 6.2 will cease to work on 7.0 or 7.1 ( in the context of stack
unwinding 
from, for example a C++ through). 

Thus if RedHat does not fix this problem we need to know what there
statement on 
compatibility is? We as IBM need to understand how many version of a piece
of 
code we need to maintain. This is critical! Is it per release (AKA 6.2 or
7.0 or 
7.1) or patch family or patch? 

If they fix this, then our code will work on 6.2 and 7.1. But if this is
not 
fixed we may need to maintain code streams for every unique version of GCC.
So 
we need to understand what that is, since we need to plan our streams.

Comment 1 Jakub Jelinek 2001-04-28 18:00:25 UTC
We're working on a fix (for the time being I have a gcc patch where
__frame_state_for looks up its parent to see whether it should or should not
fill in base_offset and indirect fields, in the common case this seems to work
just in a few additional instructions; glibc would need to be recompiled
with that gcc afterwards).
Changing compat-libstdc++ would be cleaner, but would only cover some cases
(ie. when linked dynamically against that libstdc++; linking in libstdc++
statically or not linking libstdc++ at all while using throw would still
cause problems).
Until errata for this is released I can just assure that to my knowledge
gcc-2.96-RH is binary compatible in the whole series, so you'd basically
just need to maintain two sets of packages - for 6.x and for 7.x.
Doing a separate 7.x packages have the advantage that you don't force people
to install compatibility packages which they will not have usually installed
(unlike standard 2.96 libstdc++).

Comment 2 Jakub Jelinek 2001-06-06 14:38:58 UTC
Ouch, forgot to update this.
This has been fixed since at least glibc-2.2.3-7 and gcc-2.96-84.

Comment 3 Byron Guernsey 2001-06-06 16:04:00 UTC
We've had a similar bug which might be a seperate issue.  I tried upgrading our 
systems to the gcc/glibc/stdc++ from Rawhide 1.0 and even rebuilt the glibc and 
stdc++ with the new compiler with no luck.

The problem is similar to this, except that we have this problem with normal 
static linking. If we have a function which throws a C++ exception and we put 
that function in its own .o or .a file, neither the main program nor the 
function itself can catch() the exception.  The program segfaults with a stack 
trace like:

#1  0x4006d432 in raise (sig=6) at signals.c:65 
#2  0x400fdec8 in abort () at ../sysdeps/generic/abort.c:88 
#3  0x40043f1b in __default_terminate () at ../../gcc/libgcc2.c:3034 
#4  0x4004728e in terminate () from /usr/lib/libstdc++-libc6.2-2.so.3 
#5  0x0805f81d in otl_tmpl_cursor<otl_exc, otl_conn, otl_cur, 
otl_var>::describe_column (this=0x8085e10, col=@0x805f818, 
    column_num=134678472) at otl.h:3087 
#6  0x40044831 in find_exception_handler (pc=0x805f7eb, table=0x80707c8, 
eh_info=0x8085e10, rethrow=1, cleanup=0xbfffac7c) 
    at ../../gcc/libgcc2.c:3168 
#7  0x40044a82 in throw_helper (eh=0x8085e38, pc=0x805f814, 
my_udata=0xbfffae80, offset_p=0xbfffae7c) at ../../gcc/libgcc2.c:3168 
#8  0x40044f6f in __rethrow (index=0x806fb98) at ../../gcc/libgcc2.c:3168 
#9  0x0805f815 in otl_tmpl_cursor<otl_exc, otl_conn, otl_cur, 
otl_var>::describe_column (this=0x8083ef0, col=@0x402dd008, 
    column_num=1) at otl.h:3087 
#10 0x0805d1d4 in otl_tmpl_select_stream<otl_exc, otl_conn, otl_cur, otl_var, 
otl_sel, tagTIMESTAMP_STRUCT>::get_select_list ( 
    this=0x8083ef0) at otl.h:3832 
#11 0x0805b784 in otl_tmpl_select_stream<otl_exc, otl_conn, otl_cur, otl_var, 
otl_sel, tagTIMESTAMP_STRUCT>::otl_tmpl_select_stream 
    (this=0x8083ef0, aoverride=0x8080d54, arr_size=50, 
    sqlstm=0xbfffe6e0 "select  type_1, domain, currencycode, amount, 
recuramount, periodtype, periodlength,  recurringflags, recurringcount, 
billflags, locationcode, privilegegranted, privilegerevoked, diskquotagranted,  
ma"..., db=@0x80765c0, implicit_select=0) 
    at otl.h:3729 
#12 0x0805a5d4 in otl_stream::open (this=0x8080b58, arr_size=50, 
    sqlstm=0xbfffe6e0 "select  type_1, domain, currencycode, amount, 
recuramount, periodtype, periodlength,  recurringflags, recurringcount, 
billflags, locationcode, privilegegranted, privilegerevoked, diskquotagranted,  
ma"..., db=@0x80765c0, implicit_select=0) 
    at otl.h:8577 
#13 0x08052602 in ISunRawGetChargeType (pResult=0x80720dc) at charge.cpp:239 
#14 0x080519f4 in ISunGetChargeType (iType=1, pInfo=0xbffff650, 
pAsyncResult=0x80720dc) at charge.cpp:63 
#15 0x0804a109 in main (argc=1, argv=0xbffffb2c) at billingapi_test.cpp:156 
#16 0x400eb0be in __libc_start_main (main=0x8049c70 <main>, argc=1, 
ubp_av=0xbffffb2c, init=0x80493d0 <_init>, 

The same code, in 1 .cpp file, will catch this exception fine. I thought it was 
related or attributed to this bugsomehow, but the solution for this bug did 
not solve our problem. 

What did solve our problem was using "kgcc" from the compat packages to compile 
out program.  When compiled with kgcc it works perfectly.  It took long nights 
to figure this out.  We have tested gcc-2.96.85 from Rawhide and it did not 
compile the code correctly.  Attached is some test code which generates the 
error using an ODBC driver.  See the readme in the tar/bz for compile 
instructions (short).  It does require unixodbc and a mysql database, but may 
work with iodbc another database as long as the #define's are changed.


Comment 4 Byron Guernsey 2001-06-06 16:09:44 UTC
Created attachment 20432 [details]
test case

Comment 5 Rolf Mueller 2001-11-23 09:26:19 UTC
We have the impression that the bug still exists and would like to have a 
clarification whether - to your knowledge - there are still situations where 
problems can occur.

We are using the Java 2 SDK from Sun Version 1.3.1_01 (with native threads).
The Java VM is started first, later on - via JNI - a shared "bridging" library 
written in C++ is loaded which then pulls in other C++ libraries.

When Java calls C++, an exception handler (try/catch) is always set up before 
other C++ routines are called. But when one of these nested C++ routines throws 
an exception, the whole process crashes with core dump. This effect does not
occur if the highest level C++ interface is called from a simple C++ main 
program instead of the Java VM.

We are using Red Hat 7.1 with the updated gcc suite 2.96-85 and glibc 2.2.4-19
(where this bug #37933 should have been fixed). All C++ code has been 
recompiled, but as a matter of fact the Java VM comes in binary form.


Comment 6 Jakub Jelinek 2001-11-23 10:44:38 UTC
I cannot reproduce report 2 (and it has nothing to do with report 1,
unless -lodbc has been compiled with egcs++ and even then, it would simply
segfault, not end up in terminate()).
As for report 3, I cannot do anything unless I have self-contained testcase.
The interesting thing to know is what EH registry are Java libs and your C++ libs
registering at (ie. what __register_frame_info are they calling; is it
in glibc, or in libstdc++ (which one?), what routines throw the exception
(your 2.96-RH compiled C++ libs?), what all routines should the exception
go through and where should it be caught (in JDK?).
Note that guaranteed to work is only throwing exception from g++ version X
routine which will be caught by routine compiled with the same g++ version
(here version means egcs 1.x, gcc 2.95.x, gcc 2.96-RH, gcc 3).

Comment 7 Need Real Name 2002-01-02 21:49:20 UTC
I have a similar problem using Linux 2.4.2-2smp with
gcc-2.96-85
glibc-2.2.4-19.3

An exception thrown from a shared library isn't caught by the caller. I have 
created a testcase which reproduces this. I will attach it.

Comment 8 Need Real Name 2002-01-02 21:50:59 UTC
Created attachment 41651 [details]
testcase for uncaught exception

Comment 9 Need Real Name 2002-01-04 18:30:14 UTC
Excuse me, it seems my problem is not related to this issue -- it disappears 
when I link with 'g++' instead of 'ld'. Is this documented anywhere so I can 
kick myself for not having read it? :-)

Comment 10 IBM Bug Proxy 2002-04-29 17:28:53 UTC
We would like to verify, is the original problem fixed in glibc-2.2.3-7 and
gcc-2.96-84 in so far as rolf.mueller thinks problem still
exists.

Comment 11 Jakub Jelinek 2004-10-02 17:58:47 UTC
If you are able to reproduce with current releases, please reopen.