Bug 217614
Summary: | PIE executable portability: FC6 ping generates arithmetic exception on CentOS 4.3 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jeff Johnson <n3npq> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | CC: | aoliva, herrold, rkhadgar |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-07-03 22:18:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jeff Johnson
2006-11-28 23:07:09 UTC
FC6+ compiled apps are by default -Wl,--hash-style=gnu, which is incompatible with pre-FC6 dynamic linkers, see FC6 release notes. At the rpm level all FC6+ built rpms which have such binaries or shared libraries have rtld(GNU_HASH) dependency which is only satisfied by FC6+ glibc. Anyway, assuming that you can take FC6 binaries, even when they are built with -Wl,--hash-style=sysv, and run them on (much older) distro is a wrong assumption. glibc only maintains backwards compatibility, not forward compatibility. So, if you don't get bitten by .gnu.hash, you will get bitten e.g. by missing symbol versions - among others almost all FC5 and later compiled programs and shared libraries are built with -fstack-protector and reference __stack_chk_fail symbol (which is of course not available in RHEL4/Centos4). Understood. However, please consider the 2nd issue of the CentOS 4.3 ping, which segfaults with the CentOS 4.3 glibc, because a 2.6.18 kernel is used, unless ld-linux.so.2 is replaced with later. That appears solvable, and rather easily, by backporting a few changes to glibc. I'm going to solve by using a different ELF interpreter for PIE executables with the 2.6.18 kernel. Bu that's a sick hack, the far better fix would be to backport certain changes to ld-linux.so.2 For the segfaults it would be useful to see the backtrace where it segfaults and why, if you need many iterations, that likely means there is a problem triggered only by some randomization constellations. But it is unclear where the problem is, it very well could be in ping itself, or in glibc, etc. I'll attach a ping core today, and try to get a backtrace. Yes, randomization. Meanwhile, there is a core file for a very similar problem with sudo attached at #211112. Both ping/sudo are PIE executables, removing -fpie -pie is an alternative fix. I will also attempt to reproduce on RHEL4 userland with the FC6 kernel if that helps. So far the problems with ping (and sudo) are quite reproducible on a variety of "production" machines that are closely monitored. I just happen to use CentOS 4.3, not RHEL4, userland. [root@gt40 tmp]# cat t #!/bin/sh i=0 while /bin/ping -c1 -w1 127.0.0.1 >& /dev/null; do i=$((i + 1)); done echo "Iterations: $i" [root@gt40 tmp]# sh t t: line 4: 1329 Segmentation fault (core dumped) /bin/ping -c1 -w1 127.0.0.1 >&/dev/null Iterations: 6642 [root@gt40 tmp]# gdb /bin/ping /var/TKLC/core/core.ping.1329 GNU gdb Red Hat Linux (6.3.0.0-1.96rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (no debugging symbols found) Using host libthread_db library "/lib/tls/i686/libthread_db.so.1". Core was generated by `/bin/ping -c1 -w1 127.0.0.1 HEALTH_CHECK_DECODE=/usr/TKLC/plat/bin/ syscheck -de'. Program terminated with signal 11, Segmentation fault. #0 0x00c89a3b in ?? () (gdb) bt #0 0x00c89a3b in ?? () Cannot access memory at address 0xbf9914d4 (gdb) Same, with locally compiled ping and symbols: [root@gt40 tmp]# sh t /tmp/ping: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), not stripped t: line 5: 2902 Segmentation fault (core dumped) /tmp/ping -c1 -w1 127.0.0.1 >&/dev/null Iterations: 1292 [root@gt40 tmp]# gdb /tmp/ping /var/TKLC/core/core.ping.2902 GNU gdb Red Hat Linux (6.3.0.0-1.96rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/i686/ libthread_db.so.1". Core was generated by `/tmp/ping -c1 -w1 127.0.0.1 HEALTH_CHECK_DECODE=/usr/TKLC/plat/bin/ syscheck -de'. Program terminated with signal 11, Segmentation fault. #0 0x00c89a3b in ?? () (gdb) bt #0 0x00c89a3b in ?? () Cannot access memory at address 0xbff43a84 (gdb) [root@gt40 tmp]# rpm -qif /bin/ping Name : iputils Relocations: /usr Version : 20020927 Vendor: CentOS Release : 18.EL4.2 Build Date: Sun 01 Jan 2006 09:04:30 AM EST Install Date: Thu 26 Oct 2006 11:55:48 AM EDT Build Host: build-i386 Group : System Environment/Daemons Source RPM: iputils-20020927-18.EL4.2.src.rpm Size : 220328 License: BSD Signature : DSA/SHA1, Sun 01 Jan 2006 11:19:30 PM EST, Key ID a53d0bab443e1821 Packager : Johnny Hughes <johnny> Summary : Network monitoring tools including ping. Description : The iputils package contains basic utilities for monitoring a network, including ping. The ping command sends a series of ICMP protocol ECHO_REQUEST packets to a specified network host to discover whether the target machine is alive and receiving network traffic. Dunno if this script is useful or not, but putting FC6 /lib in /tmp/lib (on the same box as above) appears to "fix" the ping segfault. (load maps verified using strace) #!/bin/sh file /tmp/ping i=0 while /tmp/lib/ld-2.5.90.so --library-path /tmp/lib /tmp/ping -c1 -w1 127.0.0.1 >& /dev/null; do i=$((i + 1)); done echo "Iterations: $i" Here's Yet Another PIE executable segfault, this time gpg (same conditions as above) #!/bin/sh i=0 while gpg --version >& /dev/null; do i=$((i + 1)); done echo "Iterations: $i" (the comment below was on a different box) FWIW, gpg just segfaulted on plain vanilla FC6 running 1.2798 while rebuilding a kernel, signing kernel modules. I have seen that segfault before. Sorry, no core dump, I will try to reproduce Is there any evidence that this is not just a bug in the kernel that causes random corruption of userland memory, rather than a glibc bug? If the original bug reported here is not appropriate, and the remaining one is the same as bug 211112, shouldn't this be closed after transferring the relevant information there? Apologies for not replying sooner. This issue was tracked down to glibc.i386 instead of glibc.i686 installed. The problem is quite reproducible with glibc.i386 installed, with sudo/gpg/ping PIE executables. I'm not sure anyone cares about glibc.i386. How long should it take to trigger this bug on a moderately fast x86_64 dual-core box? It's actually running the x86_64 kernel, but I've tried all of Centos 4.4/i386, RHEL 4.3/i386 and RHEL 4.4/i386 chroots with glibc.i386 and didn't manage to get crashes after tens of thousands of ping runs. Could it be that I need to boot a 32-bit kernel? Any chance you could let me know whether you get the problem in a chroot inside a x86_64 rawhide or FC6 install? I don't have i386 boxes handy at home any more, so it would take some major work to investigate this further, such as like reinstalling one of my boxes as 32-bit. I'm not entirely comfortable playing with CentOS binaries inside the Red Hat VPN. Thanks, A segfault within 500 -> 10000 iterations, (i.e. seconds to 3-4 minutes max) was what I saw. The cpu was a recent dual xeon at 3 GHz or more, I can get details if necessary. The kernel was a slightly modified (mebbe 10 small patches for drivers) FC6 PAE 1.2849 i686 kernel. I can make the kernel package available if necessary. I'll try to reproduce within a chroot and/or on FC6 on next week. The technique in comment #9 in reverse (i.e. using a glibc.i386 rather than glibc.i686) might be easiest reproducer. I'll attempt that on FC6 this weekend. is this the prelink bug? Yes. search "prelink ping" iirc for the core problem. *** This bug has been marked as a duplicate of 246623 *** |