Red Hat Bugzilla – Bug 91100
glibc 2.0 compatibility still seems problematic
Last modified: 2016-11-24 09:59:24 EST
Description of problem:
I suspect a problem with glibc 2.0.x compatibility using shared libs.
I currently have RH9 updated via up2date to version 2.3.2, release 27.9.
I do <em>not</em> experience any of the problems described in report <a
Indeed, I can do some pretty brutal compiles without a hitch. But I do have a
problem compiling <a href=http://gmt.soest.hawaii.edu>GMT</a> which is an
important suite of GPL'ed GIS visualization tools. The GMT tools build fine,
but segault extensively. Here's a typical backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x420747b1 in _int_free () from /lib/tls/libc.so.6
#0 0x420747b1 in _int_free () from /lib/tls/libc.so.6
#1 0x42073786 in free () from /lib/tls/libc.so.6
#2 0x4206b831 in fclose@GLIBC_2.0 () from /lib/tls/libc.so.6
#3 0x400a1510 in GMT_epsinfo (program=0xbffff8db
#4 0x08049aae in main (argc=7, argv=0xbfffdf14) at psbasemap.c:160
#5 0x420156a4 in __libc_start_main () from /lib/tls/libc.so.6
Darn! Seeing free() segfault is scary. As you see, this was triggered by a
call to fclose() on a descriptor which I confirmed was valid and even written
before the fclose() set off the bomb.
Version-Release number of selected component (if applicable):
glibc-2.3.2-27.9 (and friends)
Steps to Reproduce:
1. Build GMT 3.4.x with shared libs.
2. Run example01 as part of the build-verification.
3. Watch it segfault.
4. Repeat for most progams in the GMT suite.
no segfault :)
Compiling GMT statically sidesteps the whole issue.
But I want my shared libs.
I can provide more detail on the GMT source.
I have not produced a minimal example. I may try, depending on the response here.
Thanks for your attention.
I forgot to mention some details:
GMT builds libgmt.so, with which psbasemap et al. are linked.
The compilation flags for modules of libgmt.so include -ansi and -pedantic.
The only option passed to ld is -shared.
Someone suggested that the problem might not be with libc at all and that I
might want to link with mcheck. So I did. That caught some unrelated problems,
but does not appear to have shed any light on this report.
The 'psbasemap' program consults a GMT configuration file and may therefore run
differently the first time through. The stacktrace in the original report
occurs on the second and subsequent runs. The first run executes different
code, which also segfaults:
# first time after removing old config files!
(gdb) run -R0/6.5/0/9 -Jx1i -B0 -P -K -U"Example 1 in Cookbook" > example_01.ps
Program received signal SIGSEGV, Segmentation fault.
0x420744f5 in _int_malloc () from /lib/tls/libc.so.6
#0 0x420744f5 in _int_malloc () from /lib/tls/libc.so.6
#1 0x42073d0e in calloc () from /lib/tls/libc.so.6
#2 0x40009a23 in _dl_new_object () from /lib/ld-linux.so.2
#3 0x40005b2b in _dl_map_object_from_fd () from /lib/ld-linux.so.2
#4 0x400046fb in _dl_map_object_internal () from /lib/ld-linux.so.2
#5 0x4210efab in dl_open_worker () from /lib/tls/libc.so.6
#6 0x4000c816 in _dl_catch_error_internal () from /lib/ld-linux.so.2
#7 0x4210ee19 in _dl_open () from /lib/tls/libc.so.6
#8 0x42110b78 in do_dlopen () from /lib/tls/libc.so.6
#9 0x4000c816 in _dl_catch_error_internal () from /lib/ld-linux.so.2
#10 0x42110a3e in __libc_dlopen_mode () from /lib/tls/libc.so.6
#11 0x420ef6dc in __nss_lookup_function () from /lib/tls/libc.so.6
#12 0x420ef31b in __nss_lookup () from /lib/tls/libc.so.6
#13 0x420f1099 in __nss_passwd_lookup () from /lib/tls/libc.so.6
#14 0x420abdb9 in getpwuid_r@@GLIBC_2.1.2 () from /lib/tls/libc.so.6
#15 0x420ab692 in getpwuid () from /lib/tls/libc.so.6
#16 0x400a1647 in GMT_epsinfo (
#17 0x08049a76 in main (argc=7, argv=0xbfffed94) at psbasemap.c:160
#18 0x420156a4 in __libc_start_main () from /lib/tls/libc.so.6
I subsequently single-stepped through GMT_epsinfo() right up to the getpwuid()
call. Nothing looks obviously wrong to me yet.
As before, I don't have this problem if I link eveything statically.
I am presently holding no clue. Any help interpreting these traces is welcome.
Temporary workaround is export LD_ASSUME_KERNEL=2.4.1
Possibly resolved in CVS, per Ulrich's note on May 6th in the changelog:
I have not built a new glibc, but having traced every error back to a preceding
fclose() makes me suspect that this is it.
What does this have in common with glibc 2.0 compatibility?
Is at least one of the shared libraries or the main program linked on RHL 5.x
Everything is compiled and linked from source on RH9. No exceptions.
Both the GMT shared lib (libgmt.so) and the main program (in this case called
psbasemap) are linked with glibc on RH9. The exampels I am reporting occur are
triggered by calls to a function called GMT_epsinfo, which is a member of
libgmt, which is linked against glibc on RH9.
The problematic calls appear to be
getpwuid_r@@GLIBC_2.1.2 [not just fclose, as I erroneously stated above]
I suspect Ulrich's May 6th commit to the glibc cvs *might* resolve at least the
flcose issue; I'm not sure. Meanwhile, setting LD_ASSUME_KERNEL=2.4.1 seems to
be an effective workaround.
I don't expect RH to debug 3rd-party software. You guys are already doing an
amazing job. GMT did compile and run on RH8. Others have also expereinced
this problem with GMT when moving from RH8 to RH9, and it is my belief that
some people who need GMT are holding off on adoption of RH9. So I bring this
to RH's attention as a courtesy, not as a complaint. Please let me know if I
can be of further assistance, and keep up the great work!
I think I mis-understood your question. This system does not have any old
glibc installed. It is a fresh install of RH9 on a brand-new disk. I thought
of this as a "glibc 2 compatability" issue because the traces included symbold
which =appear= to reference earlier glibc versions in the 2.x series. It is
quite possible that I have chosen a poor name for the bug. If I understood it,
I'd probably be submitting a patch instead of a bug. I apologize for any
confusion my malapropism caused.
We've made a number of changes in glibc which are all in the Fedora Core glibc.
Can you try that glibc version?
No reply in more than a month. I'm closing the bug now. There is a
test versio of the next RHL9 errata at
which does fix a few more compatiblity issues with programs compiled
against glibc 2.0.
It seems that I have a very similar problem when not using
LD_ASSUME_KERNEL on RHEL WS 3.0.
I have a very similar stack trace and smartheap is telling me that
fclose is trying to free invalid memory.
Is this has been reported again? Is there a patch or a fix?