The incompatibility is encountered when C++ code is dynamically loaded into a process that is not a C++ program and subsequently throws an exception (error or normal). __throw() in libstdc++ is called who subsequently calls __frame_state_for() which is in glibc. __frame_state_for() fills in a structure in storage provided by __throw() who thinks the size of the structure is smaller than does __frame_state_for(). This overwrites the return address and the pointer on the stack to the previous stack frame. __throw erroneously decides that it has reached the top of the stack without finding an exception handler so it terminates the process. The problem is avoided in most C++ programs because __frame_state_for() also exists in libstdc++ and is therefore compatible with __throw(). If the program is C++, libstdc++ is loaded prior to glibc so the loader happens to choose compatible versions. Tried to find a programatic way to force the version of __frame_state_for() in libstdc++ to be used but it is too late to do this when the C++ code is loaded dynamically which is the first opportunity that our code has. It is important to note that this problem is the result of libstdc++ and glibc-2.2.x having been compiled with different versions of the compiler and their associated runtime routines. The routine in question, __frame_state_for(), is part of the compiler runtime and gets statically linked with libstdc++ and glibc. One of the main reasons for this bug is that we encountered the following situation: 1) We wrote a PERL program, and used PERL to C extenditions 2) Those extend called some C++ object files Thus the result is a C program in this case the PERL interpreter calling C++ code resulting in a CORE file. Specifically the problem is in the runtime lib's of the GCC compiler being statically linked at there compile time ( this needs to happen ). Thus the C lib is out of sync with the C++ libs. Thus the CORE when switching between the two. This results in NO backward compatibility of GCC code. Thus programs compiled on RedHat 6.2 will cease to work on 7.0 or 7.1 ( in the context of stack unwinding from, for example a C++ through). Thus if RedHat does not fix this problem we need to know what there statement on compatibility is? We as IBM need to understand how many version of a piece of code we need to maintain. This is critical! Is it per release (AKA 6.2 or 7.0 or 7.1) or patch family or patch? If they fix this, then our code will work on 6.2 and 7.1. But if this is not fixed we may need to maintain code streams for every unique version of GCC. So we need to understand what that is, since we need to plan our streams.
We're working on a fix (for the time being I have a gcc patch where __frame_state_for looks up its parent to see whether it should or should not fill in base_offset and indirect fields, in the common case this seems to work just in a few additional instructions; glibc would need to be recompiled with that gcc afterwards). Changing compat-libstdc++ would be cleaner, but would only cover some cases (ie. when linked dynamically against that libstdc++; linking in libstdc++ statically or not linking libstdc++ at all while using throw would still cause problems). Until errata for this is released I can just assure that to my knowledge gcc-2.96-RH is binary compatible in the whole series, so you'd basically just need to maintain two sets of packages - for 6.x and for 7.x. Doing a separate 7.x packages have the advantage that you don't force people to install compatibility packages which they will not have usually installed (unlike standard 2.96 libstdc++).
Ouch, forgot to update this. This has been fixed since at least glibc-2.2.3-7 and gcc-2.96-84.
We've had a similar bug which might be a seperate issue. I tried upgrading our systems to the gcc/glibc/stdc++ from Rawhide 1.0 and even rebuilt the glibc and stdc++ with the new compiler with no luck. The problem is similar to this, except that we have this problem with normal static linking. If we have a function which throws a C++ exception and we put that function in its own .o or .a file, neither the main program nor the function itself can catch() the exception. The program segfaults with a stack trace like: #1 0x4006d432 in raise (sig=6) at signals.c:65 #2 0x400fdec8 in abort () at ../sysdeps/generic/abort.c:88 #3 0x40043f1b in __default_terminate () at ../../gcc/libgcc2.c:3034 #4 0x4004728e in terminate () from /usr/lib/libstdc++-libc6.2-2.so.3 #5 0x0805f81d in otl_tmpl_cursor<otl_exc, otl_conn, otl_cur, otl_var>::describe_column (this=0x8085e10, col=@0x805f818, column_num=134678472) at otl.h:3087 #6 0x40044831 in find_exception_handler (pc=0x805f7eb, table=0x80707c8, eh_info=0x8085e10, rethrow=1, cleanup=0xbfffac7c) at ../../gcc/libgcc2.c:3168 #7 0x40044a82 in throw_helper (eh=0x8085e38, pc=0x805f814, my_udata=0xbfffae80, offset_p=0xbfffae7c) at ../../gcc/libgcc2.c:3168 #8 0x40044f6f in __rethrow (index=0x806fb98) at ../../gcc/libgcc2.c:3168 #9 0x0805f815 in otl_tmpl_cursor<otl_exc, otl_conn, otl_cur, otl_var>::describe_column (this=0x8083ef0, col=@0x402dd008, column_num=1) at otl.h:3087 #10 0x0805d1d4 in otl_tmpl_select_stream<otl_exc, otl_conn, otl_cur, otl_var, otl_sel, tagTIMESTAMP_STRUCT>::get_select_list ( this=0x8083ef0) at otl.h:3832 #11 0x0805b784 in otl_tmpl_select_stream<otl_exc, otl_conn, otl_cur, otl_var, otl_sel, tagTIMESTAMP_STRUCT>::otl_tmpl_select_stream (this=0x8083ef0, aoverride=0x8080d54, arr_size=50, sqlstm=0xbfffe6e0 "select type_1, domain, currencycode, amount, recuramount, periodtype, periodlength, recurringflags, recurringcount, billflags, locationcode, privilegegranted, privilegerevoked, diskquotagranted, ma"..., db=@0x80765c0, implicit_select=0) at otl.h:3729 #12 0x0805a5d4 in otl_stream::open (this=0x8080b58, arr_size=50, sqlstm=0xbfffe6e0 "select type_1, domain, currencycode, amount, recuramount, periodtype, periodlength, recurringflags, recurringcount, billflags, locationcode, privilegegranted, privilegerevoked, diskquotagranted, ma"..., db=@0x80765c0, implicit_select=0) at otl.h:8577 #13 0x08052602 in ISunRawGetChargeType (pResult=0x80720dc) at charge.cpp:239 #14 0x080519f4 in ISunGetChargeType (iType=1, pInfo=0xbffff650, pAsyncResult=0x80720dc) at charge.cpp:63 #15 0x0804a109 in main (argc=1, argv=0xbffffb2c) at billingapi_test.cpp:156 #16 0x400eb0be in __libc_start_main (main=0x8049c70 <main>, argc=1, ubp_av=0xbffffb2c, init=0x80493d0 <_init>, The same code, in 1 .cpp file, will catch this exception fine. I thought it was related or attributed to this bugsomehow, but the solution for this bug did not solve our problem. What did solve our problem was using "kgcc" from the compat packages to compile out program. When compiled with kgcc it works perfectly. It took long nights to figure this out. We have tested gcc-2.96.85 from Rawhide and it did not compile the code correctly. Attached is some test code which generates the error using an ODBC driver. See the readme in the tar/bz for compile instructions (short). It does require unixodbc and a mysql database, but may work with iodbc another database as long as the #define's are changed.
Created attachment 20432 [details] test case
We have the impression that the bug still exists and would like to have a clarification whether - to your knowledge - there are still situations where problems can occur. We are using the Java 2 SDK from Sun Version 1.3.1_01 (with native threads). The Java VM is started first, later on - via JNI - a shared "bridging" library written in C++ is loaded which then pulls in other C++ libraries. When Java calls C++, an exception handler (try/catch) is always set up before other C++ routines are called. But when one of these nested C++ routines throws an exception, the whole process crashes with core dump. This effect does not occur if the highest level C++ interface is called from a simple C++ main program instead of the Java VM. We are using Red Hat 7.1 with the updated gcc suite 2.96-85 and glibc 2.2.4-19 (where this bug #37933 should have been fixed). All C++ code has been recompiled, but as a matter of fact the Java VM comes in binary form.
I cannot reproduce report 2 (and it has nothing to do with report 1, unless -lodbc has been compiled with egcs++ and even then, it would simply segfault, not end up in terminate()). As for report 3, I cannot do anything unless I have self-contained testcase. The interesting thing to know is what EH registry are Java libs and your C++ libs registering at (ie. what __register_frame_info are they calling; is it in glibc, or in libstdc++ (which one?), what routines throw the exception (your 2.96-RH compiled C++ libs?), what all routines should the exception go through and where should it be caught (in JDK?). Note that guaranteed to work is only throwing exception from g++ version X routine which will be caught by routine compiled with the same g++ version (here version means egcs 1.x, gcc 2.95.x, gcc 2.96-RH, gcc 3).
I have a similar problem using Linux 2.4.2-2smp with gcc-2.96-85 glibc-2.2.4-19.3 An exception thrown from a shared library isn't caught by the caller. I have created a testcase which reproduces this. I will attach it.
Created attachment 41651 [details] testcase for uncaught exception
Excuse me, it seems my problem is not related to this issue -- it disappears when I link with 'g++' instead of 'ld'. Is this documented anywhere so I can kick myself for not having read it? :-)
We would like to verify, is the original problem fixed in glibc-2.2.3-7 and gcc-2.96-84 in so far as rolf.mueller thinks problem still exists.
If you are able to reproduce with current releases, please reopen.