Description of problem: Sybase ODBC driver is encoutering Crashes when moved to RHEL3.x with unixODBC2.2.8-2.3.0.2 . With RHEL2.1 unixODBC 2.0.7 it works and when we move to RHEL4.x unixODBC 2.2.11 it works ..so something regressed between RHEL2.1 and RHEL4.x for unixODBC . We cannot move to RHEL4.x where it works as its not a option for some of our clients. Attached is a tar file for the repro. You will need to have Sybase ASE to run the repro. Version-Release number of selected component (if applicable): RHEL3.x unixODBC2.2.8-2.3.0.2 How reproducible: ODBC Basic Sample Purpose ------- This directory holds a sample ODBC application that illustrates how to connect to a datasource and execute a SQL statement. Source code for the application is in simple.cpp. ODBC Driver Manager location ---------------------------- Linux / MacOSX -------------- The sample code assumes ODBC Driver Manager libraries to be installed in /usr/lib directory. If they are installed in a different directory, edit "build" and "makefile" files to correct the location. Procedure --------- Linux / MacOSX -------------- To compile the application: # ./build sample To run the application: # ./build run To delete all generated files: # ./build clean ---------------- Before run the sample code, we should edit the /etc/odbcinst.ini file to add the driver (you should have the su permission to edit that file) [Adaptive Server Enterprise] Description = Sybase ODBC Driver Driver = /the path you put the driver so/libsybdrvodb.so FileUsage = 1 You also need to add an sampledsn in $HOME/.odbc.ini For example: [sampledsn] Driver = Adaptive Server Enterprise UserID = sa Server = qablade2 Port = 6001 Database = pubs2 Password = UseCursor = 1 Steps to Reproduce: see above for reproducible. Actual results: should not get a crash Expected results: Additional info:
Created attachment 150054 [details] repro file
Sorry, but I don't have Sybase. Please provide a reproducer that does not involve any proprietary software, and I'll be glad to look into it.
Red Hat QA has access to Sybase as well as having the Sybase engineer come on site in Westford. We can setup the reproducer and give you access and instructions to demonstrate the problem. Barry
Yipes, you put a root password into a publicly readable bugzilla entry? Please change it forthwith, preferably to something less guessable, and then send me the password in private mail.
The crash appears to be happening in an atexit callback routine: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1076472512 (LWP 18428)] 0x403007d2 in ?? () (gdb) bt #0 0x403007d2 in ?? () #1 0x401888f3 in exit () from /lib/tls/libc.so.6 #2 0x401737fc in __libc_start_main () from /lib/tls/libc.so.6 #3 0x08048a09 in _start () Now, there are no atexit callbacks in unixODBC, nor in the "simple.cpp" source code you provided. A string search suggests that libsybdrvodb.so contains atexit calls. So at this point my position is that you have the source code needed to debug the problem, and I don't ...
yongzhi I am looking at it now. But there should some in unixODBC related to the segmentation fault(Maybe indirectly cause the issue). For our driver works fine with RHEL 4.0, and our unit tests work well with our driver if call ODBC driver directly but if calls go throught unixODBC(driver manager), the same Segmentation fault will happen. Could you research on what happens in unixODBC after our Driver's SQLDisconnect return.
yongzhi Ran the test with LD_DEBUG set to symbols, the exit() was called from unixODBC. And it was called after shmdt method. How do I get this? First, I ran the test with our driver build on rehl3.0, get information like this: 20636: symbol=uodbc_close_stats; lookup in file=./simple 20636: symbol=uodbc_close_stats; lookup in file=/usr/lib/libodbc.so.1 20636: symbol=shmdt; lookup in file=./simple 20636: symbol=shmdt; lookup in file=/usr/lib/libodbc.so.1 20636: symbol=shmdt; lookup in file=/usr/lib/libstdc++.so.5 20636: symbol=shmdt; lookup in file=/lib/tls/libm.so.6 20636: symbol=shmdt; lookup in file=/lib/libgcc_s.so.1 20636: symbol=shmdt; lookup in file=/lib/tls/libc.so.6 Segmentation fault (core dumped) Then I ran the test in the same machine with the driver build on rehl2(the driver works fine): 17642: symbol=shmdt; lookup in file=./simple 17642: symbol=shmdt; lookup in file=/usr/lib/libodbc.so.1 17642: symbol=shmdt; lookup in file=/usr/lib/libstdc++-libc6.2-2.so.3 17642: symbol=shmdt; lookup in file=/lib/i686/libm.so.6 17642: symbol=shmdt; lookup in file=/lib/i686/libc.so.6 17642: symbol=exit; lookup in file=./simple 17642: symbol=exit; lookup in file=/usr/lib/libodbc.so.1 17642: symbol=exit; lookup in file=/usr/lib/libstdc++-libc6.2-2.so.3 17642: symbol=exit; lookup in file=/lib/i686/libm.so.6 17642: symbol=exit; lookup in file=/lib/i686/libc.so.6 17642: symbol=__deregister_frame_info; lookup in file=./simple 17642: symbol=__deregister_frame_info; lookup in file=/usr/lib/libodbc.so.1 17642: symbol=__deregister_frame_info; lookup in file=/usr/lib/libstdc++-libc6.2-2.so.3 17642: 17642: calling fini: /usr/lib/libodbc.so.1 Because the driver manager is the same, it should behavors consistently. So I think the exit is the one who caused the segmentation fault. What's more, in the two cases, the libsybdrvodb.so has already "calling fini" before reach shmdt
yongzhi I should correct my words, I should say exit() was not called from libsybdrvodb.so, it happens between "calling fini libsybdrvodb.so" and "calling fini: /usr/lib/libodbc.so"
OK, so reading between the lines I guess you are saying that (a) the sybase driver relies on some shared memory, and (b) the crash is happening because it tries to touch the shared memory after it's already been shmdt()'d? The stack trace I showed indicates that the test program isn't calling exit() explicitly at all, but rather that is happening implicitly after return from main(). So I think that the problem must be one of atexit callbacks happening in a different order than you are expecting. The man page says that atexit callbacks are supposed to happen in reverse order of registration, so either that's not happening (in which case this is a glibc bug) or there is some difference in the order in which the callbacks get set up. So I recommend tracing the startup part of the test to see what order things happen in.
yongzhi After making more research, I think the I know what's the issue is. But I need your fix. When our libsybdrvodb.so 's size is bigger or equal 3173295, the segmentation fault will happen; but if the libsybdrvodb.so is less or equal 3108985, the segmentation is gone. So it crashes on some magic number between the previous two numbers. It maybe a redhat 3.0 issue or a unixODBC issue, could you have a look at it?
That seems moderately unlikely --- I am not aware of anything that would depend directly on the size of a .so file. What exactly did you change to cause the change in .so file size?
I made changes in odbc driver, and make it really do nothing in connect and disconnect. So I can random remove any objections from a static (say foo.a) library which our driver will link to in build time. Note, the changed driver just use no library in the foo.a in the test(the test only does connect and disconnect). By that way, I can get different size of .so. When the .so size is small enough, the segmentation fault gone. The SQLConnect and SQLDisconnect did nothing but let the unixODBC load and unload odbc driver .so, and only difference in each test is the size. And segmentation fault is exactly the same one as the real functional driver
Hmmm ... maybe the size of the .so affects the memory layout, specifically the address at which the shmem segment gets mapped? Not sure how that would translate into a problem for you, but something to think about.
yongzhi I ran a test which just dlopen("libsybdrvodb.so") and dlclose it. There is a segmentation fault at the exit()if the libsybdrvodb.so build in RHDL3.0. I also noticed that: unixODBC2.2.8(the one in RHDL3.0) called dlclose to unload libsybdrvodb.so, so the segmentation fault happen in exit(); (I knows that by checking bindings) unixODBC2.2.11(the one in RHDL4.0) somehow does not call dlclose, so the repro code could pass in RHDL4.0 machine. There maybe two ways to solve the problem: 1.Some linux expert tell us how to fix our driver to solve the dlclose cause segmentation fault issue. (the same source code built in RHDL2 does not have the dlclose problem). or 2. Fix unixODBC2.2.8 to do something similar to 2.2.11
So what's your code doing at dlopen and dlclose times? (These will call _init and _fini functions, or constructor/destructor routines, if you have them...) This sounds to me like nothing so much as a bug in the _fini function --- perhaps depending on a variable that isn't really initialized, but chances to have the right value in the RHEL2 environment?
yongzhi We change a build flag to fix our drive. Now the issue is solved.
Problem solved by Sybase. Closing as "Notabug" for Red Hat.