Bug 246623

Summary: applications compiled with pie flag crash randomly
Product: [Fedora] Fedora Reporter: ritz <rkhadgar>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 7CC: atkac, bugzilla, charlieb-fedora-bugzilla, gawith, herrold, jakub, n3npq, redhat-bugzilla, riel, roland, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.22.1-41.fc7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-08 18:06:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test case
none
coredump attached. none

Description ritz 2007-07-03 14:06:32 UTC
Description of problem:
Application compiled with "-pie" flag crash. test case attached.

Version-Release number of selected component (if applicable):
gcc-4.1.2-12

How reproducible:
frequent

Steps to Reproduce:
1. gcc -pie a.c
2. while(true) do ./a.out; done 
  
Actual results:
Segmentation fault


Expected results:
no segv

Additional info:

Comment 1 ritz 2007-07-03 14:06:32 UTC
Created attachment 158441 [details]
test case

Comment 2 Jakub Jelinek 2007-07-03 15:01:41 UTC
What kernel (x86_64, i686, ...), can you get a coredump?
There are several other bugreports about this, but I haven't ever been able
to reproduce this.

Comment 3 ritz 2007-07-03 15:10:00 UTC
kernel-PAE-2.6.21-1.3194.fc7
coredump to be attached.

additionally, enabling LD_DEBUG=all flag delays the onset of this bug.

Comment 4 ritz 2007-07-03 15:12:28 UTC
Created attachment 158453 [details]
coredump attached.

$file core.32202
core.32202: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style

$ gdb --core=core.8534 ./a.out 
GNU gdb Red Hat Linux (6.6-15.fc7rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `./a.out SSH_AGENT_PID=2624 HOSTNAME=localhost
SHELL=/bin/bash TERM=xterm DESKTO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00b51410 in _start () from /lib/ld-linux.so.2
(gdb) bt
#0  0x00b51410 in _start () from /lib/ld-linux.so.2
Cannot access memory at address 0xbfc3d814
(gdb) quit

Comment 5 ritz 2007-07-03 15:14:27 UTC
$ uname -a
Linux localhost 2.6.21-1.3194.fc7PAE #1 SMP Wed May 23 22:27:31 EDT 2007 i686
i686 i386 GNU/Linux


Comment 6 Jakub Jelinek 2007-07-03 15:43:11 UTC
Similarly to #245291 and #217614 the core is just weird, e.g. both $esp and $ebp
into info reg of the core file are much smaller than the actual PT_LOAD segment
in the core file that presumably contains stack.
Also, the PT_LOAD segment in the core file contains just environment and argv[]
array, nothing else, so I assume the crash happened already right after kernel
passed control to the program - to a wrong place.
Is there any way to convince kernel to dump also VMAs (like /proc/pid/maps)
on fatal signals?
This is certainly related to kernel randomization, but it is unclear whether
the bug is on the kernel or glibc side.  The core files aren't helpful there.


Comment 7 Jakub Jelinek 2007-07-03 22:13:50 UTC
With help from Roland and systemtap I tracked this down to a bug in
linux-2.6-execshield.patch.
stap -v -e 'probe kernel.function("elf_map") { if (execname() == "a")
printf("elf_map %x+%x\n", $addr, $total_size) }
probe kernel.function("elf_map").return { if (execname() == "a") {
printf("elf_map_ret %x\n", $return) } }
probe kernel.function("install_special_mapping") { if (execname() == "a") {
printf("ism %x+%x\n", $addr, $len) } }
probe kernel.function("load_elf_binary") { if (execname() == "a") {
printf("leb\n") } }
probe kernel.function("do_mmap_pgoff") { if (execname() == "a") { printf("dmp
%x+%x\n", $addr, $len) } }
probe kernel.function("do_mmap_pgoff").return { if (execname() == "a") {
printf("do_mmap_pgoff_ret %x\n", $return) } }'

showed on the crashed pie (./a):
elf_map ffffffff80000000+0
dmp ffffffff80000000+1000
do_mmap_pgoff_ret ffffffff80000000
elf_map_ret ffffffff80000000
elf_map ffffffff8000163c+0
dmp ffffffff80001000+1000
do_mmap_pgoff_ret ffffffff80001000
elf_map_ret ffffffff80001000
elf_map 0+1c650
dmp 0+1d000
do_mmap_pgoff_ret 466000
elf_map_ret 466000
elf_map 481c80+0
dmp 481000+2000
do_mmap_pgoff_ret 481000
elf_map_ret 481000

This is only reproduceable if ld.so is prelinked low (--exec-shield) and what
all the crashed stap dumps seem to have common is that it crashed when
elf_map returned 0x466000 (while ld.so was prelinked to 0x467000), no other
address was problematic.

In vanilla kernels load_elf_interp returns e_entry + load_addr and also returns
load_addr via reference.  With linux-2.6-execshield.patch it returns load_addr,
also returns the base address (i.e. what elf_map returned for the first PT_LOAD)
via reference.  load_addr in load_elf_interp is the load bias of ld.so, i.e.
difference between the actual load base address where it has been mapped to and
first PT_LOAD segment's p_vaddr.
load_addr = map_addr - ELF_PAGESTART(vaddr);
In our case, map_addr is 0x466000 (where elf_map mapped it) and vaddr is
0x467000, therefore load_addr is 0xfffff000.  Now, the caller does
                        elf_entry = load_elf_interp(&loc->interp_elf_ex,
                                                    interpreter,
                                                    &interp_map_addr,
                                                    load_bias);
                        if (!BAD_ADDR(elf_entry)) {
                                /* load_elf_interp() returns relocation
adjustment */
                                interp_load_addr = elf_entry;
                                elf_entry += loc->interp_elf_ex.e_entry;
                        }
where BAD_ADDR is defined earlier as
#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK)
But PAGE_MASK is 0xfffff000 and so we hit
                if (BAD_ADDR(elf_entry)) {
                        force_sig(SIGSEGV, current);
                        retval = IS_ERR((void *)elf_entry) ?
                                        (int)elf_entry : -EINVAL;
                        goto out_free_dentry;
                }
a few lines after this.  I believe the easiest fix would be to
#define BAD_ADDR(x) IS_ERR_VALUE(x)
instead (on i?86 the same as ((unsigned long)(x) > PAGE_MASK) instead of >=.

Comment 8 Jakub Jelinek 2007-07-03 22:16:55 UTC
*** Bug 245291 has been marked as a duplicate of this bug. ***

Comment 9 Jakub Jelinek 2007-07-03 22:18:11 UTC
*** Bug 217614 has been marked as a duplicate of this bug. ***

Comment 12 Steve 2007-07-06 11:27:20 UTC
RHEL 5 version of this issue:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=247169

Comment 13 Nathan G. Grennan 2007-07-06 18:50:04 UTC
I tested the proposed change and it fixed my issue.

RHEL5 bug is private. :\

Comment 14 Nathan G. Grennan 2007-07-13 18:27:22 UTC
Is this going to get fixed in a release any time soon for both Fedora and RHEL?
The bug is marked low severity, but I don't see random program crashes as low.
This stopped a backup job from running one day for me.

Comment 15 Chuck Ebbert 2007-07-13 18:32:23 UTC
Fixed in forthcoming FC6 and F7 kernels based on the Linux 2.6.22.


Comment 16 Jeff Johnson 2007-07-13 18:33:27 UTC
This snippet is at 
    https://www.redhat.com/archives/fedora-devel-list/2007-July/msg00733.html

kernel-2.6.23-0.15.rc0.git1.fc8
-------------------------------
* Wed Jul 11 2007 Roland McGrath <roland redhat com>
- core dump enhancement: include first page of ELF files, with sysctl control

* Wed Jul 11 2007 John W. Linville <linville redhat com>
- Reinstate git-wireless-dev.patch
- Add updated iwlwifi driver from intellinuxwireless.org

* Tue Jul 10 2007 Dave Jones <davej redhat com>
- Fix issue with PIE randomization (#246623).

Comment 17 Tomas Mraz 2007-08-08 07:10:54 UTC
*** Bug 250668 has been marked as a duplicate of this bug. ***

Comment 18 Kasper Dupont 2007-08-08 17:59:00 UTC
This bug was said to have been fixed almost a month ago, but I don't see a new
kernel in the yum reposity.

Comment 19 Jeff Johnson 2007-08-08 18:05:07 UTC
Then look harder. See comment #16.

Comment 20 Kasper Dupont 2007-08-08 19:01:50 UTC
I have a system with kernel-2.6.20-1.2962.fc6.i586.rpm installed, and yum tells
me there are no updates to be installed.

Comment 21 Chuck Ebbert 2007-08-08 19:07:06 UTC
This is a Fedora 7 bug. But the FC6 kernel was submitted for release today and
should show up soon.

Comment 22 Kasper Dupont 2007-08-08 19:26:37 UTC
I reported 250668 against FC6 and it was marked as a duplicate of this bug. So
is this one bug present in both FC6 and FC7, or is it two different bugs?

Comment 23 Chuck Ebbert 2007-08-08 19:32:58 UTC
This should have been left as two separate bugs for tracking, but it's the same
underlying problem.

Comment 24 Prarit Bhargava 2007-08-14 18:24:28 UTC
*** Bug 247169 has been marked as a duplicate of this bug. ***

Comment 25 Jakub Jelinek 2007-10-09 15:31:40 UTC
*** Bug 225485 has been marked as a duplicate of this bug. ***

Comment 27 Charlie Brady 2010-10-21 16:52:09 UTC
(In reply to comment #13)
> I tested the proposed change and it fixed my issue.
> 
> RHEL5 bug is private. :\

bug 247169 was closed as a duplicate of bug 246623, which is not private, but not RHEL5 either:

https://bugzilla.redhat.com/show_bug.cgi?id=246623

Neither  247169 nor 246623 shows up in changelog of RHEL5 kernel. There's no changelog reference to "PIE randomization" either.

My guess is this isn't fixed in RHEL5.

Comment 28 Charlie Brady 2010-10-21 17:05:32 UTC
(In reply to comment #27)

> My guess is this isn't fixed in RHEL5.

But maybe it is, via a slightly different patch, in:

https://bugzilla.redhat.com/show_bug.cgi?id=230339

* Fri Jun 01 2007 Don Zickus <dzickus> [2.6.18-21.el5]
...
- [fs] invalid segmentation violation during exec (Dave Anderson ) [230339]

Comment 29 Charlie Brady 2010-10-21 17:19:03 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=488449

"segfaults from ld-linux-x86-64"

suggests that the fix to the execshield patch was not done correctly.

https://bugzilla.redhat.com/show_bug.cgi?id=488449#c19