Bug 89698

Summary: Statically linked programs segfault
Product: [Retired] Red Hat Linux Reporter: Trent Piepho <tpiepho>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NEXTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: fweimer, michael.wilkes, rjd
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-04 20:23:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Trent Piepho 2003-04-26 01:30:31 UTC
Description of problem:
A static executable linked against glibc-2.2.x or older will
segfault when calling gethostbyname with glibc-2.3.x.  

Compile this program with static linking under redhat 6.x or 7.x
and try to run it on redhat 9:

#include <netdb.h>
main() { gethostbyname("localhost"); }

It will segfault.

This makes it almost impossible to distribute binaries for redhat
linux.  If I compile dynamically on redhat 7.x, users of 9 won't have the right
version of libgmp or a dozen other libraries.  And users of 6.x won't have the
right version of glibc.  Compile dynamically on 9, and 7.x uses won't have the
right glibc version or a dozen other libraries.  Compile statically, and it
still doesn't work because all the libnss stuff isn't backward compatible.


Version-Release number of selected component (if applicable):
glibc-2.3.2-27.9

Comment 1 Ulrich Drepper 2003-04-27 18:28:06 UTC
First of all, static linking does *not* guarantee the binary runs everywhere if
the program uses NSS.  To the contrary, it makes it less likely.

Second, I cannot reproduce any problem.  Running statically linked binary on
RHL7.2 works just fine on a RHL9 system.   You have to be much more specific. 
What services are used?

Comment 2 Trent Piepho 2003-04-28 22:47:54 UTC
The steps to reproduce are quite simple.
On a system with glibc-2.2.5-43 (or glibc-2.2.4-32) and gcc-2.96-113 compile the
following two line program with the gcc option -static (no other options are
necessary):

#include <netdb.h>
main() {gethostbyname("localhost");}

Then run the program on a redhat 9 system with glibc-2.3.2-27.9.  The program
will segfault.  /etc/hosts is the default (files, dns).  The backtrace from gdb is:
#0  0x08082901 in _dl_relocate_object () at ../sysdeps/i386/dl-machine.h:348
#1  0x0806e3bf in dl_open_worker (a=0xbfffed10) at dl-open.c:294
#2  0x08052a1b in _dl_catch_error (objname=0xbfffed08, errstring=0xbfffed0c,
operate=0x806dfa8 <dl_open_worker>, 
    args=0xbfffed10) at dl-error.c:152
#3  0x0806e4fb in _dl_open (file=0xbfffee80 "libnss_files.so.2", mode=1,
caller=0x0) at dl-open.c:407
#4  0x0805ab5e in do_dlopen (ptr=0xbfffee58) at dl-libc.c:78
#5  0x08052a1b in _dl_catch_error (objname=0xbfffee50, errstring=0xbfffee54,
operate=0x805ab48 <do_dlopen>, 
    args=0xbfffee58) at dl-error.c:152
#6  0x0805aa51 in __libc_dlopen (__name=0xbfffee80 "libnss_files.so.2") at
dl-libc.c:42
#7  0x080588ba in __nss_lookup_function (ni=0x80afd28, fct_name=0x80a7c60
"gethostbyname_r") at nsswitch.c:340
#8  0x0805922a in __nss_lookup (ni=0xbfffef90, fct_name=0x80a7c60
"gethostbyname_r", fctp=0xbfffef94)
    at nsswitch.c:147
#9  0x0804d27f in __gethostbyname_r (name=0x8097ec8 "localhost",
resbuf=0x80ae3d8, buffer=0x80af630 "", buflen=1024, 
    result=0xbfffefd4, h_errnop=0xbfffefd8) at ../nss/getXXbyYY_r.c:168
#10 0x0804d077 in gethostbyname (name=0x8097ec8 "localhost") at
../nss/getXXbyYY.c:131
#11 0x080481f3 in main ()
#12 0x080482da in __libc_start_main (main=0x80481e0 <main>, argc=1,
ubp_av=0xbffff074, init=0x80480b4 <_init>, 
    fini=0x8097ea0 <_fini>, rtld_fini=0, stack_end=0xbffff06c) at
../sysdeps/generic/libc-start.c:129

If the libnss_files.so from glibc 2.3 isn't backward compatible, then why does
it have the same major version number?  With the version number the same, it is
_impossible_ to run a program compiled statically with glibc 2.2 on a glibc 2.3
system.

Comment 3 Ulrich Drepper 2003-04-30 07:34:25 UTC
I have told you that

a) I cannot reproduce the problem.  I'm using that glibc-2.2.4-32 and
gcc-2.96-112.7.2

and

b) that statically linked problems have no right to assume compatibility if they
are using NSS.

Anyway, since I cannot reproduce any problem I have to assume it's some local
bogosity on your system.

Comment 4 Louie 2004-01-07 21:56:57 UTC
I am an author of the distributed computing project Seventeen or Bust 
(http://www.seventeenorbust.com/) and this problem has been the main 
reason I have stopped supporting Linux.  I got tired of having to 
recompile dozens of binaries for every version of glibc when they 
just end up seg faulting due to buggy code I can't fix in NSS.

This problem has existed for over a year.  This report is not a 
localized problem to him.  I find that any statically linked program 
which uses gethostbyname() will segfault on any machine except one 
built with the exact same version of libc.  This makes it impossible 
to distribute static binaries for Linux which require networking 
support.  Incredibly frustrating.

Oddly enough, this problem disappears if gethostbyname is passed an 
IP address in dot notation (ie 127.0.0.1) instead of a hostname (ie 
localhost).

Here is a bt from one of the many machines I can cause this problem 
on:

Program received signal SIGSEGV, Segmentation fault.
0x080b08f5 in _dl_relocate_object ()
(gdb) bt
#0  0x080b08f5 in _dl_relocate_object ()
#1  0x080a7643 in dl_open_worker ()
#2  0x08092253 in _dl_catch_error ()
#3  0x080a7837 in _dl_open ()
#4  0x08093306 in do_dlopen ()
#5  0x08092253 in _dl_catch_error ()
#6  0x080931f9 in __libc_dlopen ()
#7  0x0808c4c6 in __nss_lookup_function ()
#8  0x0808cb4e in __nss_lookup ()
#9  0x0808d337 in __nss_hosts_lookup ()
#10 0x08070fbc in gethostbyname_r ()
#11 0x08070db7 in gethostbyname ()

Cheers,
Louie

Comment 5 Jakub Jelinek 2004-01-07 22:09:27 UTC
So why you link statically instead of dynamically?
That's almost always a bad idea.
If you look at e.g. Solaris, you cannot link a statically linked
application using gethostbyname at all (and for a reason).
That routine in /usr/lib/libnsl.a uses dlopen/dlsym/dlerror and
Solaris provides no libdl.a library.
In GLIBC, you can link such programs but they are guaranteed to work
only if run against the same glibc as they have been linked against.
Current glibc even issues a link time warning about it.

If you need to link some specific library into the program, you should
use -Bstatic -lthatlibrary -Bdynamic instead and keep libraries included
in glibc linked in dynamically.  That way symbol versioning ensures binary
compatibility.

Comment 6 Trent Piepho 2004-01-08 06:09:49 UTC
I have two reason to link statically instead of dynamically.

1. Binary portability.  If you dynamically link against glibc-2.3,
then the binary won't run on a system with glibc-2.2, glibc-2.0,
libc5, or any future glibc.  You now have the exact same problem of
needing a different binary for every different glibc version. 

2. Security.  A dynamically linked glibc makes it very easy to
override a C library function with a custom version.  This makes it a
lot easier to fake out a licensing system or cheat at on online game. 
Yes you can still modify binaries or modify the kernel, but
LD_PRELOAD=flexlm_crack.so is a lot easier.

Comment 7 Jakub Jelinek 2004-01-08 09:12:33 UTC
You're wrong about "or any future glibc". If you link dynamically
against glibc-2.3, it will run against any future glibc (assuming
it doesn't poke into glibc internals etc.).  If you link against
say glibc-2.1, it will run against glibc-2.2, glibc-2.3 and later glibcs
as well.
If you link statically but your program is not self-contained
(e.g. because it uses NSS/iconv/locales), then you certainly don't get
any portability advantages, just disadvantages, because the binary
portability is suddenly not with the library you linked against and any
future versions, but just the single one you linked against.
If you think that when you link statically against say glibc 2.0
your program will run on a libc5 system, it will not.
What you perhaps could do is ship all the NSS modules/locale definitions
etc. you use together with the statically linked binary and tweak
LD_LIBRARY_PATH in the statically linked program to point to the
library with the modules (and libc.so/ld.so etc.).
But that can be fairly huge.  Plus you risk not including some NSS module needed on the target system.  Or use a small dynamically linked helper
application for NSS etc. from the statically linked program.

Comment 8 Michael Wilkes 2004-07-21 04:27:56 UTC
We are also seeing this bug when running statically linked code.
It showed up after upgrading glibc rpm sets from 2.3.2-11.9 to
2.3.2-27.9.7
Why did we upgrade glibc? Beacuse RedHat says it will "resolve
vulnerabilities and address several bugs". Looks like it introduces or
exposes some too.
Why do we run statically linked code? Because it is a highly
specialised  piece of software, we don't have the source code, and the
authors will not change the way it is distributed. We don't pay for it
(so no leverage there) but it is essential.
Looks like we downgrade glibc until it is fixed.


Comment 9 Robert DiGrazia 2005-02-11 17:11:09 UTC
This issue has cost us, our customers, and our prospective
customers too much time and money.  The bottom line is that
we cannot support Redhat Linux 7.3.  Whether we can support
some other distribution remains to be seen.   We are very
reluctantly canceling Linux support for our products.  

(Aside: we have all the respect in the world for the Linux
and Redhat programmers.  I have been writing computer programs
since 1962, and I know how difficult it is.  Even in the days
when computer programs occupied only 1000 bits, the things
hardly ever worked.)


In case anyone is interested:

The trouble starts with a call to check out a license feature
using FlexLm.  This makes MacroVision look bad, but it
isn't MacroVision's fault.

Running on Redhat 7.3, statically linked, we see this:

Starting program: xxx.vhd 
// Built Tue Nov 23 2004 Linux 2.4.18-3 
// Running.. 

Program received signal SIGSEGV, Segmentation fault. 
0x081cbf61 in _dl_relocate_object () 
(gdb) bt 
#0  0x081cbf61 in _dl_relocate_object () 
#1  0x081c69b3 in dl_open_worker () 
#2  0x081c558f in _dl_catch_error () 
#3  0x081c6ba7 in _dl_open () 
#4  0x081a2a06 in do_dlopen () 
#5  0x081c558f in _dl_catch_error () 
#6  0x081a28f9 in __libc_dlopen () 
#7  0x0819e9be in __nss_lookup_function () 
#8  0x0819f046 in __nss_lookup () 
#9  0x0819f23f in __nss_passwd_lookup () 
#10 0x0819c4b0 in getpwuid_r () 
#11 0x0819c2df in getpwuid () 
#12 0x0814e314 in lc_username () 
#13 0x081675ad in l_conn_msg () 
#14 0x0815702f in l_try_connect () 
#15 0x08156e37 in l_connect_host_or_list () 
#16 0x08156cb9 in l_connect () 
#17 0x0813fa89 in checkout_from_server () 
#18 0x0813e2e5 in lm_start_real () 
#19 0x0813de4b in l_checkout () 
#20 0x0813dcb1 in lc_checkout () 
#21 0x0814e934 in lp_checkout () 
...




And here is info on our Linux and libraries (I've removed
people's names):


Red Hat Linux release 7.3 (Valhalla)
Kernel 2.4.18-3 on an i686

----------------------
./libc-2.2.5.so
GNU C Library stable release version 2.2.5, by  et al.
Copyright (C) 1992-2001, 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 2.96 20000731 (Red Hat Linux 7.3 2.96-110).
Compiled on a Linux 2.4.9-9 system on 2002-04-15.
Available extensions:
       GNU libio by 
       crypt add-on version 2.1 by  and others
       The C stubs add-on version 2.1.2.
       linuxthreads-0.9 by 
       BIND-8.2.3-T5B
       NIS(YP)/NIS+ NSS modules 0.19 by 
       Glibc-2.0 compatibility add-on by 
       libthread_db work sponsored by 
Report bugs using the `glibcbug' script to <bugs>.


----------------
libdl-2.2.5.so

----------------

gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)


Comment 10 Bill Nottingham 2006-08-04 20:23:08 UTC
Red Hat Linux and Red Hat Powertools are currently no longer supported by Red
Hat, Inc. In an effort to clean up bugzilla, we are closing all bugs in MODIFIED
state for these products.

However, we do want to make sure that nothing important slips through the
cracks. If, in fact, these issues are not resolved in a current Fedora Core
Release (such as Fedora Core 5), please open a new issues stating so. Thanks.