Bug 89698
Summary: | Statically linked programs segfault | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Trent Piepho <tpiepho> |
Component: | glibc | Assignee: | Jakub Jelinek <jakub> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | fweimer, michael.wilkes, rjd |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-04 20:23:08 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Trent Piepho
2003-04-26 01:30:31 UTC
First of all, static linking does *not* guarantee the binary runs everywhere if the program uses NSS. To the contrary, it makes it less likely. Second, I cannot reproduce any problem. Running statically linked binary on RHL7.2 works just fine on a RHL9 system. You have to be much more specific. What services are used? The steps to reproduce are quite simple. On a system with glibc-2.2.5-43 (or glibc-2.2.4-32) and gcc-2.96-113 compile the following two line program with the gcc option -static (no other options are necessary): #include <netdb.h> main() {gethostbyname("localhost");} Then run the program on a redhat 9 system with glibc-2.3.2-27.9. The program will segfault. /etc/hosts is the default (files, dns). The backtrace from gdb is: #0 0x08082901 in _dl_relocate_object () at ../sysdeps/i386/dl-machine.h:348 #1 0x0806e3bf in dl_open_worker (a=0xbfffed10) at dl-open.c:294 #2 0x08052a1b in _dl_catch_error (objname=0xbfffed08, errstring=0xbfffed0c, operate=0x806dfa8 <dl_open_worker>, args=0xbfffed10) at dl-error.c:152 #3 0x0806e4fb in _dl_open (file=0xbfffee80 "libnss_files.so.2", mode=1, caller=0x0) at dl-open.c:407 #4 0x0805ab5e in do_dlopen (ptr=0xbfffee58) at dl-libc.c:78 #5 0x08052a1b in _dl_catch_error (objname=0xbfffee50, errstring=0xbfffee54, operate=0x805ab48 <do_dlopen>, args=0xbfffee58) at dl-error.c:152 #6 0x0805aa51 in __libc_dlopen (__name=0xbfffee80 "libnss_files.so.2") at dl-libc.c:42 #7 0x080588ba in __nss_lookup_function (ni=0x80afd28, fct_name=0x80a7c60 "gethostbyname_r") at nsswitch.c:340 #8 0x0805922a in __nss_lookup (ni=0xbfffef90, fct_name=0x80a7c60 "gethostbyname_r", fctp=0xbfffef94) at nsswitch.c:147 #9 0x0804d27f in __gethostbyname_r (name=0x8097ec8 "localhost", resbuf=0x80ae3d8, buffer=0x80af630 "", buflen=1024, result=0xbfffefd4, h_errnop=0xbfffefd8) at ../nss/getXXbyYY_r.c:168 #10 0x0804d077 in gethostbyname (name=0x8097ec8 "localhost") at ../nss/getXXbyYY.c:131 #11 0x080481f3 in main () #12 0x080482da in __libc_start_main (main=0x80481e0 <main>, argc=1, ubp_av=0xbffff074, init=0x80480b4 <_init>, fini=0x8097ea0 <_fini>, rtld_fini=0, stack_end=0xbffff06c) at ../sysdeps/generic/libc-start.c:129 If the libnss_files.so from glibc 2.3 isn't backward compatible, then why does it have the same major version number? With the version number the same, it is _impossible_ to run a program compiled statically with glibc 2.2 on a glibc 2.3 system. I have told you that a) I cannot reproduce the problem. I'm using that glibc-2.2.4-32 and gcc-2.96-112.7.2 and b) that statically linked problems have no right to assume compatibility if they are using NSS. Anyway, since I cannot reproduce any problem I have to assume it's some local bogosity on your system. I am an author of the distributed computing project Seventeen or Bust (http://www.seventeenorbust.com/) and this problem has been the main reason I have stopped supporting Linux. I got tired of having to recompile dozens of binaries for every version of glibc when they just end up seg faulting due to buggy code I can't fix in NSS. This problem has existed for over a year. This report is not a localized problem to him. I find that any statically linked program which uses gethostbyname() will segfault on any machine except one built with the exact same version of libc. This makes it impossible to distribute static binaries for Linux which require networking support. Incredibly frustrating. Oddly enough, this problem disappears if gethostbyname is passed an IP address in dot notation (ie 127.0.0.1) instead of a hostname (ie localhost). Here is a bt from one of the many machines I can cause this problem on: Program received signal SIGSEGV, Segmentation fault. 0x080b08f5 in _dl_relocate_object () (gdb) bt #0 0x080b08f5 in _dl_relocate_object () #1 0x080a7643 in dl_open_worker () #2 0x08092253 in _dl_catch_error () #3 0x080a7837 in _dl_open () #4 0x08093306 in do_dlopen () #5 0x08092253 in _dl_catch_error () #6 0x080931f9 in __libc_dlopen () #7 0x0808c4c6 in __nss_lookup_function () #8 0x0808cb4e in __nss_lookup () #9 0x0808d337 in __nss_hosts_lookup () #10 0x08070fbc in gethostbyname_r () #11 0x08070db7 in gethostbyname () Cheers, Louie So why you link statically instead of dynamically? That's almost always a bad idea. If you look at e.g. Solaris, you cannot link a statically linked application using gethostbyname at all (and for a reason). That routine in /usr/lib/libnsl.a uses dlopen/dlsym/dlerror and Solaris provides no libdl.a library. In GLIBC, you can link such programs but they are guaranteed to work only if run against the same glibc as they have been linked against. Current glibc even issues a link time warning about it. If you need to link some specific library into the program, you should use -Bstatic -lthatlibrary -Bdynamic instead and keep libraries included in glibc linked in dynamically. That way symbol versioning ensures binary compatibility. I have two reason to link statically instead of dynamically. 1. Binary portability. If you dynamically link against glibc-2.3, then the binary won't run on a system with glibc-2.2, glibc-2.0, libc5, or any future glibc. You now have the exact same problem of needing a different binary for every different glibc version. 2. Security. A dynamically linked glibc makes it very easy to override a C library function with a custom version. This makes it a lot easier to fake out a licensing system or cheat at on online game. Yes you can still modify binaries or modify the kernel, but LD_PRELOAD=flexlm_crack.so is a lot easier. You're wrong about "or any future glibc". If you link dynamically against glibc-2.3, it will run against any future glibc (assuming it doesn't poke into glibc internals etc.). If you link against say glibc-2.1, it will run against glibc-2.2, glibc-2.3 and later glibcs as well. If you link statically but your program is not self-contained (e.g. because it uses NSS/iconv/locales), then you certainly don't get any portability advantages, just disadvantages, because the binary portability is suddenly not with the library you linked against and any future versions, but just the single one you linked against. If you think that when you link statically against say glibc 2.0 your program will run on a libc5 system, it will not. What you perhaps could do is ship all the NSS modules/locale definitions etc. you use together with the statically linked binary and tweak LD_LIBRARY_PATH in the statically linked program to point to the library with the modules (and libc.so/ld.so etc.). But that can be fairly huge. Plus you risk not including some NSS module needed on the target system. Or use a small dynamically linked helper application for NSS etc. from the statically linked program. We are also seeing this bug when running statically linked code. It showed up after upgrading glibc rpm sets from 2.3.2-11.9 to 2.3.2-27.9.7 Why did we upgrade glibc? Beacuse RedHat says it will "resolve vulnerabilities and address several bugs". Looks like it introduces or exposes some too. Why do we run statically linked code? Because it is a highly specialised piece of software, we don't have the source code, and the authors will not change the way it is distributed. We don't pay for it (so no leverage there) but it is essential. Looks like we downgrade glibc until it is fixed. This issue has cost us, our customers, and our prospective customers too much time and money. The bottom line is that we cannot support Redhat Linux 7.3. Whether we can support some other distribution remains to be seen. We are very reluctantly canceling Linux support for our products. (Aside: we have all the respect in the world for the Linux and Redhat programmers. I have been writing computer programs since 1962, and I know how difficult it is. Even in the days when computer programs occupied only 1000 bits, the things hardly ever worked.) In case anyone is interested: The trouble starts with a call to check out a license feature using FlexLm. This makes MacroVision look bad, but it isn't MacroVision's fault. Running on Redhat 7.3, statically linked, we see this: Starting program: xxx.vhd // Built Tue Nov 23 2004 Linux 2.4.18-3 // Running.. Program received signal SIGSEGV, Segmentation fault. 0x081cbf61 in _dl_relocate_object () (gdb) bt #0 0x081cbf61 in _dl_relocate_object () #1 0x081c69b3 in dl_open_worker () #2 0x081c558f in _dl_catch_error () #3 0x081c6ba7 in _dl_open () #4 0x081a2a06 in do_dlopen () #5 0x081c558f in _dl_catch_error () #6 0x081a28f9 in __libc_dlopen () #7 0x0819e9be in __nss_lookup_function () #8 0x0819f046 in __nss_lookup () #9 0x0819f23f in __nss_passwd_lookup () #10 0x0819c4b0 in getpwuid_r () #11 0x0819c2df in getpwuid () #12 0x0814e314 in lc_username () #13 0x081675ad in l_conn_msg () #14 0x0815702f in l_try_connect () #15 0x08156e37 in l_connect_host_or_list () #16 0x08156cb9 in l_connect () #17 0x0813fa89 in checkout_from_server () #18 0x0813e2e5 in lm_start_real () #19 0x0813de4b in l_checkout () #20 0x0813dcb1 in lc_checkout () #21 0x0814e934 in lp_checkout () ... And here is info on our Linux and libraries (I've removed people's names): Red Hat Linux release 7.3 (Valhalla) Kernel 2.4.18-3 on an i686 ---------------------- ./libc-2.2.5.so GNU C Library stable release version 2.2.5, by et al. Copyright (C) 1992-2001, 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 2.96 20000731 (Red Hat Linux 7.3 2.96-110). Compiled on a Linux 2.4.9-9 system on 2002-04-15. Available extensions: GNU libio by crypt add-on version 2.1 by and others The C stubs add-on version 2.1.2. linuxthreads-0.9 by BIND-8.2.3-T5B NIS(YP)/NIS+ NSS modules 0.19 by Glibc-2.0 compatibility add-on by libthread_db work sponsored by Report bugs using the `glibcbug' script to <bugs>. ---------------- libdl-2.2.5.so ---------------- gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110) Red Hat Linux and Red Hat Powertools are currently no longer supported by Red Hat, Inc. In an effort to clean up bugzilla, we are closing all bugs in MODIFIED state for these products. However, we do want to make sure that nothing important slips through the cracks. If, in fact, these issues are not resolved in a current Fedora Core Release (such as Fedora Core 5), please open a new issues stating so. Thanks. |