Bug 217614

Summary:	PIE executable portability: FC6 ping generates arithmetic exception on CentOS 4.3
Product:	[Fedora] Fedora	Reporter:	Jeff Johnson <n3npq>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED DUPLICATE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6	CC:	aoliva, herrold, rkhadgar
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-07-03 22:18:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jeff Johnson 2006-11-28 23:07:09 UTC

I'm reporting against glibc because the failing component appears to be /lib/ld-linux.so.2.

For various reasons I'm running a FC6 kernel-2.6.18-1.2798 kernel on top of a CentOS4.3
user land.

If I copy /bin/ping from FC6 and try to execute, I get an immediate arithmetic exception.

If I replace glibc on Centos4.3 with a recompiled version of the FC6 glibc, then the
/bin/ping from FC6 executes correctly. That points to /lib/ld-linux.so.2 as the problem
of lack of portability of ET_DYN executables.

Furthermore, if I use the CentOS4.3 ping with my FC6 kernel, this tight loop sooner or
later segfaults

i=0
while ping -c1 -w1 localhost >& /dev/null; do i=$((i + 1)); done
echo $i

Again the segfault disappears if I replace glibc as above.

See also bugzilla #211112 for a very similar issue with sudo, another PIE executable.

Comment 1 Jakub Jelinek 2006-11-28 23:18:30 UTC

FC6+ compiled apps are by default -Wl,--hash-style=gnu, which is incompatible
with pre-FC6 dynamic linkers, see FC6 release notes.
At the rpm level all FC6+ built rpms which have such binaries or shared libraries
have rtld(GNU_HASH) dependency which is only satisfied by FC6+ glibc.

Anyway, assuming that you can take FC6 binaries, even when they are built with
-Wl,--hash-style=sysv, and run them on (much older) distro is a wrong assumption.
glibc only maintains backwards compatibility, not forward compatibility.
So, if you don't get bitten by .gnu.hash, you will get bitten e.g.
by missing symbol versions - among others almost all FC5 and later compiled
programs and shared libraries are built with -fstack-protector and reference
__stack_chk_fail symbol (which is of course not available in
RHEL4/Centos4).

Comment 2 Jeff Johnson 2006-11-29 00:07:05 UTC

Understood.

However, please consider the 2nd issue of the CentOS 4.3 ping, which segfaults with the CentOS 4.3 
glibc,
because a 2.6.18 kernel is used, unless ld-linux.so.2 is replaced with later.

That appears solvable, and rather easily, by backporting a few changes to glibc.

I'm going to solve by using a different ELF interpreter for PIE executables with the 2.6.18
kernel. Bu that's a sick hack, the far better fix would be to backport certain changes to
ld-linux.so.2

Comment 3 Jakub Jelinek 2006-11-29 10:02:52 UTC

For the segfaults it would be useful to see the backtrace where it segfaults
and why, if you need many iterations, that likely means there is a problem
triggered only by some randomization constellations.  But it is unclear where
the problem is, it very well could be in ping itself, or in glibc, etc.

Comment 4 Jeff Johnson 2006-11-29 12:32:35 UTC

I'll attach a ping core today, and try to get a backtrace. Yes, randomization.

Meanwhile, there is a core file for a very similar problem with sudo attached at #211112.

Both ping/sudo are PIE executables, removing -fpie -pie is an alternative fix.

Comment 5 Jeff Johnson 2006-11-29 12:38:33 UTC

I will also attempt to reproduce on RHEL4 userland with the FC6 kernel if
that helps. So far the problems with ping (and sudo) are quite reproducible
on a variety of "production" machines that are closely monitored. I just happen
to use CentOS 4.3, not RHEL4, userland.

Comment 6 Jeff Johnson 2006-11-29 15:20:13 UTC

[root@gt40 tmp]# cat t
#!/bin/sh

i=0
while /bin/ping -c1 -w1 127.0.0.1 >& /dev/null; do i=$((i + 1)); done
echo "Iterations: $i"
[root@gt40 tmp]# sh t
t: line 4:  1329 Segmentation fault      (core dumped) /bin/ping -c1 -w1 127.0.0.1 >&/dev/null
Iterations: 6642
[root@gt40 tmp]# gdb /bin/ping /var/TKLC/core/core.ping.1329 
GNU gdb Red Hat Linux (6.3.0.0-1.96rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/libthread_db.so.1".

Core was generated by `/bin/ping -c1 -w1 127.0.0.1 HEALTH_CHECK_DECODE=/usr/TKLC/plat/bin/
syscheck -de'.
Program terminated with signal 11, Segmentation fault.
#0  0x00c89a3b in ?? ()
(gdb) bt
#0  0x00c89a3b in ?? ()
Cannot access memory at address 0xbf9914d4
(gdb)

Comment 7 Jeff Johnson 2006-11-29 15:28:24 UTC

Same, with locally compiled ping and symbols:
[root@gt40 tmp]# sh t
/tmp/ping: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), not stripped
t: line 5:  2902 Segmentation fault      (core dumped) /tmp/ping -c1 -w1 127.0.0.1 >&/dev/null
Iterations: 1292
[root@gt40 tmp]# gdb /tmp/ping /var/TKLC/core/core.ping.2902 
GNU gdb Red Hat Linux (6.3.0.0-1.96rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/i686/
libthread_db.so.1".

Core was generated by `/tmp/ping -c1 -w1 127.0.0.1 HEALTH_CHECK_DECODE=/usr/TKLC/plat/bin/
syscheck -de'.
Program terminated with signal 11, Segmentation fault.
#0  0x00c89a3b in ?? ()
(gdb) bt
#0  0x00c89a3b in ?? ()
Cannot access memory at address 0xbff43a84
(gdb)

Comment 8 Jeff Johnson 2006-11-29 15:31:03 UTC

[root@gt40 tmp]# rpm -qif /bin/ping
Name        : iputils                      Relocations: /usr 
Version     : 20020927                          Vendor: CentOS
Release     : 18.EL4.2                      Build Date: Sun 01 Jan 2006 09:04:30 AM EST
Install Date: Thu 26 Oct 2006 11:55:48 AM EDT      Build Host: build-i386
Group       : System Environment/Daemons    Source RPM: iputils-20020927-18.EL4.2.src.rpm
Size        : 220328                           License: BSD
Signature   : DSA/SHA1, Sun 01 Jan 2006 11:19:30 PM EST, Key ID a53d0bab443e1821
Packager    : Johnny Hughes <johnny>
Summary     : Network monitoring tools including ping.
Description :
The iputils package contains basic utilities for monitoring a network,
including ping. The ping command sends a series of ICMP protocol
ECHO_REQUEST packets to a specified network host to discover whether
the target machine is alive and receiving network traffic.

Comment 9 Jeff Johnson 2006-11-29 16:41:30 UTC

Dunno if this script is useful or not, but putting FC6 /lib in /tmp/lib (on the same box as above) 
appears to "fix" the ping segfault.

(load maps verified using strace)

#!/bin/sh

file /tmp/ping
i=0
while  /tmp/lib/ld-2.5.90.so --library-path /tmp/lib /tmp/ping -c1 -w1 127.0.0.1 >& /dev/null; do 
i=$((i + 1)); done
echo "Iterations: $i"

Comment 10 Jeff Johnson 2006-11-30 17:54:06 UTC

Here's Yet Another PIE executable segfault, this time gpg (same conditions as above)

#!/bin/sh

i=0
while gpg --version >& /dev/null; do i=$((i + 1)); done
echo "Iterations: $i"

(the comment below was on a different box)
FWIW, gpg just segfaulted on plain vanilla FC6 running 1.2798 while rebuilding a kernel, signing kernel 
modules. I have seen that segfault before. Sorry, no core dump, I will try to reproduce

Comment 11 Alexandre Oliva 2007-01-04 17:09:32 UTC

Is there any evidence that this is not just a bug in the kernel that causes
random corruption of userland memory, rather than a glibc bug?  If the original
bug reported here is not appropriate, and the remaining one is the same as bug
211112, shouldn't this be closed after transferring the relevant information there?

Comment 12 Jeff Johnson 2007-01-04 18:45:18 UTC

Apologies for not replying sooner.

This issue was tracked down to glibc.i386 instead of glibc.i686 installed. The problem is quite
reproducible with glibc.i386 installed, with sudo/gpg/ping PIE executables.

I'm not sure anyone cares about glibc.i386.

Comment 13 Alexandre Oliva 2007-01-27 08:00:38 UTC

How long should it take to trigger this bug on a moderately fast x86_64
dual-core box?  It's actually running the x86_64 kernel, but I've tried all of
Centos 4.4/i386, RHEL 4.3/i386 and RHEL 4.4/i386 chroots with glibc.i386 and
didn't manage to get crashes after tens of thousands of ping runs.  Could it be
that I need to boot a 32-bit kernel?  Any chance you could let me know whether
you get the problem in a chroot inside a x86_64 rawhide or FC6 install?  I don't
have i386 boxes handy at home any more, so it would take some major work to
investigate this further, such as like reinstalling one of my boxes as 32-bit. 
I'm not entirely comfortable playing with CentOS binaries inside the Red Hat
VPN.  Thanks,

Comment 14 Jeff Johnson 2007-01-27 13:37:00 UTC

A segfault within 500 -> 10000 iterations, (i.e. seconds to 3-4 minutes max) was what I saw.

The cpu was a recent dual xeon at 3 GHz or more, I can get details if necessary.

The kernel was a slightly modified (mebbe 10 small patches for drivers) FC6 PAE 1.2849 i686 kernel.
I can make the kernel package available if necessary.

I'll try to reproduce within a chroot and/or on FC6 on next week.

The technique in comment #9 in reverse (i.e. using a glibc.i386 rather than glibc.i686)
might be easiest reproducer. I'll attempt that on FC6 this weekend.

Comment 15 R P Herrold 2007-07-03 19:40:19 UTC

is this the prelink bug?

Comment 16 Jeff Johnson 2007-07-03 20:16:46 UTC

Yes. search "prelink ping" iirc for the core problem.

Comment 17 Jakub Jelinek 2007-07-03 22:18:07 UTC


*** This bug has been marked as a duplicate of 246623 ***