Bug 91100 - glibc 2.0 compatibility still seems problematic
glibc 2.0 compatibility still seems problematic
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
9
athlon Linux
medium Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-05-18 00:03 EDT by Kyle Ferrio
Modified: 2016-11-24 09:59 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-11-05 13:47:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kyle Ferrio 2003-05-18 00:03:49 EDT
Description of problem:

I suspect a problem with glibc 2.0.x compatibility using shared libs.  
I currently have RH9 updated via up2date to version 2.3.2, release 27.9.
I do <em>not</em> experience any of the problems described in report <a
href=https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=88456>88456</a>. 
Indeed, I can do some pretty brutal compiles without a hitch.  But I do have a
problem compiling <a href=http://gmt.soest.hawaii.edu>GMT</a> which is an
important suite of GPL'ed GIS visualization tools.  The GMT tools build fine,
but segault extensively.  Here's a typical backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x420747b1 in _int_free () from /lib/tls/libc.so.6
(gdb) bt
#0  0x420747b1 in _int_free () from /lib/tls/libc.so.6
#1  0x42073786 in free () from /lib/tls/libc.so.6
#2  0x4206b831 in fclose@GLIBC_2.0 () from /lib/tls/libc.so.6
#3  0x400a1510 in GMT_epsinfo (program=0xbffff8db
"/home/kbf/GMT/GMT3.4.3/bin/psbasemap")
    at gmt_support.c:1836
#4  0x08049aae in main (argc=7, argv=0xbfffdf14) at psbasemap.c:160
#5  0x420156a4 in __libc_start_main () from /lib/tls/libc.so.6

Darn!  Seeing free() segfault is scary.  As you see, this was triggered by a
call to fclose() on a descriptor which I confirmed was valid and even written
before the fclose() set off the bomb.

Version-Release number of selected component (if applicable):

glibc-2.3.2-27.9 (and friends)

How reproducible:
100%

Steps to Reproduce:
1. Build GMT 3.4.x with shared libs.
2. Run example01 as part of the build-verification. 
3. Watch it segfault.
4. Repeat for most progams in the GMT suite.
    
Actual results: 
segfault

Expected results: 
no segfault :)

Additional info: 
Compiling GMT statically sidesteps the whole issue.  
But I want my shared libs.  
I can provide more detail on the GMT source.  
I have not produced a minimal example.  I may try, depending on the response here.  
Thanks for your attention.
Comment 1 Kyle Ferrio 2003-05-18 11:26:17 EDT
I forgot to mention some details:

GMT builds libgmt.so, with which psbasemap et al. are linked.

The compilation flags for modules of libgmt.so include -ansi and -pedantic.

The only option passed to ld is -shared.

Comment 2 Kyle Ferrio 2003-05-18 17:56:04 EDT
Updates:  

Item #1
Someone suggested that the problem might not be with libc at all and that I
might want to link with mcheck.  So I did.  That caught some unrelated problems,
but does not appear to have shed any light on this report.

Item #2
The 'psbasemap' program consults a GMT configuration file and may therefore run
differently the first time through.  The stacktrace in the original report
occurs on the second and subsequent runs.  The first run executes different
code, which also segfaults:

# first time after removing old config files!
$gdb psbasemap
(gdb) run -R0/6.5/0/9 -Jx1i -B0 -P -K -U"Example 1 in Cookbook" > example_01.ps
Program received signal SIGSEGV, Segmentation fault.
0x420744f5 in _int_malloc () from /lib/tls/libc.so.6
(gdb) bt
#0  0x420744f5 in _int_malloc () from /lib/tls/libc.so.6
#1  0x42073d0e in calloc () from /lib/tls/libc.so.6
#2  0x40009a23 in _dl_new_object () from /lib/ld-linux.so.2
#3  0x40005b2b in _dl_map_object_from_fd () from /lib/ld-linux.so.2
#4  0x400046fb in _dl_map_object_internal () from /lib/ld-linux.so.2
#5  0x4210efab in dl_open_worker () from /lib/tls/libc.so.6
#6  0x4000c816 in _dl_catch_error_internal () from /lib/ld-linux.so.2
#7  0x4210ee19 in _dl_open () from /lib/tls/libc.so.6
#8  0x42110b78 in do_dlopen () from /lib/tls/libc.so.6
#9  0x4000c816 in _dl_catch_error_internal () from /lib/ld-linux.so.2
#10 0x42110a3e in __libc_dlopen_mode () from /lib/tls/libc.so.6
#11 0x420ef6dc in __nss_lookup_function () from /lib/tls/libc.so.6
#12 0x420ef31b in __nss_lookup () from /lib/tls/libc.so.6
#13 0x420f1099 in __nss_passwd_lookup () from /lib/tls/libc.so.6
#14 0x420abdb9 in getpwuid_r@@GLIBC_2.1.2 () from /lib/tls/libc.so.6
#15 0x420ab692 in getpwuid () from /lib/tls/libc.so.6
#16 0x400a1647 in GMT_epsinfo (
    program=0xbffff8db "/home/kbf/GMT/GMT3.4.3/bin/psbasemap")
    at gmt_support.c:1868
#17 0x08049a76 in main (argc=7, argv=0xbfffed94) at psbasemap.c:160
#18 0x420156a4 in __libc_start_main () from /lib/tls/libc.so.6

I subsequently single-stepped through GMT_epsinfo() right up to the getpwuid()
call.  Nothing looks obviously wrong to me yet.

As before, I don't have this problem if I link eveything statically.
I am presently holding no clue.  Any help interpreting these traces is welcome.  

Thanks,
Kyle
Comment 3 Kyle Ferrio 2003-05-20 19:59:43 EDT
Temporary workaround is export LD_ASSUME_KERNEL=2.4.1

Possibly resolved in CVS, per Ulrich's note on May 6th in the changelog:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/ChangeLog?rev=1.7625&content-type=text/x-cvsweb-markup&cvsroot=glibc

I have not built a new glibc, but having traced every error back to a preceding
fclose() makes me suspect that this is it.

Kyle
Comment 4 Jakub Jelinek 2003-05-21 06:23:59 EDT
What does this have in common with glibc 2.0 compatibility?
Is at least one of the shared libraries or the main program linked on RHL 5.x
or earlier?
Comment 5 Kyle Ferrio 2003-05-21 15:53:55 EDT
Jakub,

Everything is compiled and linked from source on RH9.  No exceptions.

Both the GMT shared lib (libgmt.so) and the main program (in this case called 
psbasemap) are linked with glibc on RH9.  The exampels I am reporting occur are 
triggered by calls to a function called GMT_epsinfo, which is a member of 
libgmt, which is linked against glibc on RH9.  

The problematic calls appear to be

fclose@GLIBC_2.0
getpwuid_r@@GLIBC_2.1.2 [not just fclose, as I erroneously stated above]

I suspect Ulrich's May 6th commit to the glibc cvs *might* resolve at least the 
flcose issue; I'm not sure.  Meanwhile, setting LD_ASSUME_KERNEL=2.4.1 seems to 
be an effective workaround.

I don't expect RH to debug 3rd-party software.  You guys are already doing an 
amazing job.  GMT did compile and run on RH8.  Others have also expereinced 
this problem with GMT when moving from RH8 to RH9, and it is my belief that 
some people who need GMT are holding off on adoption of RH9.  So I bring this 
to RH's attention as a courtesy, not as a complaint.  Please let me know if I 
can be of further assistance, and keep up the great work!

Thanks,
Kyle
Comment 6 Kyle Ferrio 2003-05-21 16:01:48 EDT
Jakub,

I think I mis-understood your question.  This system does not have any old 
glibc installed.  It is a fresh install of RH9 on a brand-new disk.  I thought 
of this as a "glibc 2 compatability" issue because the traces included symbold 
which =appear= to reference earlier glibc versions in the 2.x series.  It is 
quite possible that I have chosen a poor name for the bug.  If I understood it, 
I'd probably be submitting a patch instead of a bug.  I apologize for any 
confusion my malapropism caused.

Thanks,
Kyle
Comment 7 Ulrich Drepper 2003-10-03 16:54:33 EDT
We've made a number of changes in glibc which are all in the Fedora Core glibc.
 Can you try that glibc version?
Comment 8 Ulrich Drepper 2003-11-05 13:47:08 EST
No reply in more than a month.  I'm closing the bug now.  There is a
test versio of the next RHL9 errata at

  ftp://people.redhat.com/jakub/glibc/errata/2.3.2-27.9.4/

which does fix a few more compatiblity issues with programs compiled
against glibc 2.0.
Comment 9 Eric Desjardins 2004-06-04 09:21:26 EDT
It seems that I have a very similar problem when not using
LD_ASSUME_KERNEL on RHEL WS 3.0.

I have a very similar stack trace and smartheap is telling me that
fclose is trying to free invalid memory.

Is this has been reported again? Is there a patch or a fix?

Thanks,
Eric

Note You need to log in before you can comment on or make changes to this bug.