Description of problem: Application compiled with "-pie" flag crash. test case attached. Version-Release number of selected component (if applicable): gcc-4.1.2-12 How reproducible: frequent Steps to Reproduce: 1. gcc -pie a.c 2. while(true) do ./a.out; done Actual results: Segmentation fault Expected results: no segv Additional info:
Created attachment 158441 [details] test case
What kernel (x86_64, i686, ...), can you get a coredump? There are several other bugreports about this, but I haven't ever been able to reproduce this.
kernel-PAE-2.6.21-1.3194.fc7 coredump to be attached. additionally, enabling LD_DEBUG=all flag delays the onset of this bug.
Created attachment 158453 [details] coredump attached. $file core.32202 core.32202: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style $ gdb --core=core.8534 ./a.out GNU gdb Red Hat Linux (6.6-15.fc7rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `./a.out SSH_AGENT_PID=2624 HOSTNAME=localhost SHELL=/bin/bash TERM=xterm DESKTO'. Program terminated with signal 11, Segmentation fault. #0 0x00b51410 in _start () from /lib/ld-linux.so.2 (gdb) bt #0 0x00b51410 in _start () from /lib/ld-linux.so.2 Cannot access memory at address 0xbfc3d814 (gdb) quit
$ uname -a Linux localhost 2.6.21-1.3194.fc7PAE #1 SMP Wed May 23 22:27:31 EDT 2007 i686 i686 i386 GNU/Linux
Similarly to #245291 and #217614 the core is just weird, e.g. both $esp and $ebp into info reg of the core file are much smaller than the actual PT_LOAD segment in the core file that presumably contains stack. Also, the PT_LOAD segment in the core file contains just environment and argv[] array, nothing else, so I assume the crash happened already right after kernel passed control to the program - to a wrong place. Is there any way to convince kernel to dump also VMAs (like /proc/pid/maps) on fatal signals? This is certainly related to kernel randomization, but it is unclear whether the bug is on the kernel or glibc side. The core files aren't helpful there.
With help from Roland and systemtap I tracked this down to a bug in linux-2.6-execshield.patch. stap -v -e 'probe kernel.function("elf_map") { if (execname() == "a") printf("elf_map %x+%x\n", $addr, $total_size) } probe kernel.function("elf_map").return { if (execname() == "a") { printf("elf_map_ret %x\n", $return) } } probe kernel.function("install_special_mapping") { if (execname() == "a") { printf("ism %x+%x\n", $addr, $len) } } probe kernel.function("load_elf_binary") { if (execname() == "a") { printf("leb\n") } } probe kernel.function("do_mmap_pgoff") { if (execname() == "a") { printf("dmp %x+%x\n", $addr, $len) } } probe kernel.function("do_mmap_pgoff").return { if (execname() == "a") { printf("do_mmap_pgoff_ret %x\n", $return) } }' showed on the crashed pie (./a): elf_map ffffffff80000000+0 dmp ffffffff80000000+1000 do_mmap_pgoff_ret ffffffff80000000 elf_map_ret ffffffff80000000 elf_map ffffffff8000163c+0 dmp ffffffff80001000+1000 do_mmap_pgoff_ret ffffffff80001000 elf_map_ret ffffffff80001000 elf_map 0+1c650 dmp 0+1d000 do_mmap_pgoff_ret 466000 elf_map_ret 466000 elf_map 481c80+0 dmp 481000+2000 do_mmap_pgoff_ret 481000 elf_map_ret 481000 This is only reproduceable if ld.so is prelinked low (--exec-shield) and what all the crashed stap dumps seem to have common is that it crashed when elf_map returned 0x466000 (while ld.so was prelinked to 0x467000), no other address was problematic. In vanilla kernels load_elf_interp returns e_entry + load_addr and also returns load_addr via reference. With linux-2.6-execshield.patch it returns load_addr, also returns the base address (i.e. what elf_map returned for the first PT_LOAD) via reference. load_addr in load_elf_interp is the load bias of ld.so, i.e. difference between the actual load base address where it has been mapped to and first PT_LOAD segment's p_vaddr. load_addr = map_addr - ELF_PAGESTART(vaddr); In our case, map_addr is 0x466000 (where elf_map mapped it) and vaddr is 0x467000, therefore load_addr is 0xfffff000. Now, the caller does elf_entry = load_elf_interp(&loc->interp_elf_ex, interpreter, &interp_map_addr, load_bias); if (!BAD_ADDR(elf_entry)) { /* load_elf_interp() returns relocation adjustment */ interp_load_addr = elf_entry; elf_entry += loc->interp_elf_ex.e_entry; } where BAD_ADDR is defined earlier as #define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK) But PAGE_MASK is 0xfffff000 and so we hit if (BAD_ADDR(elf_entry)) { force_sig(SIGSEGV, current); retval = IS_ERR((void *)elf_entry) ? (int)elf_entry : -EINVAL; goto out_free_dentry; } a few lines after this. I believe the easiest fix would be to #define BAD_ADDR(x) IS_ERR_VALUE(x) instead (on i?86 the same as ((unsigned long)(x) > PAGE_MASK) instead of >=.
*** Bug 245291 has been marked as a duplicate of this bug. ***
*** Bug 217614 has been marked as a duplicate of this bug. ***
RHEL 5 version of this issue: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=247169
I tested the proposed change and it fixed my issue. RHEL5 bug is private. :\
Is this going to get fixed in a release any time soon for both Fedora and RHEL? The bug is marked low severity, but I don't see random program crashes as low. This stopped a backup job from running one day for me.
Fixed in forthcoming FC6 and F7 kernels based on the Linux 2.6.22.
This snippet is at https://www.redhat.com/archives/fedora-devel-list/2007-July/msg00733.html kernel-2.6.23-0.15.rc0.git1.fc8 ------------------------------- * Wed Jul 11 2007 Roland McGrath <roland redhat com> - core dump enhancement: include first page of ELF files, with sysctl control * Wed Jul 11 2007 John W. Linville <linville redhat com> - Reinstate git-wireless-dev.patch - Add updated iwlwifi driver from intellinuxwireless.org * Tue Jul 10 2007 Dave Jones <davej redhat com> - Fix issue with PIE randomization (#246623).
*** Bug 250668 has been marked as a duplicate of this bug. ***
This bug was said to have been fixed almost a month ago, but I don't see a new kernel in the yum reposity.
Then look harder. See comment #16.
I have a system with kernel-2.6.20-1.2962.fc6.i586.rpm installed, and yum tells me there are no updates to be installed.
This is a Fedora 7 bug. But the FC6 kernel was submitted for release today and should show up soon.
I reported 250668 against FC6 and it was marked as a duplicate of this bug. So is this one bug present in both FC6 and FC7, or is it two different bugs?
This should have been left as two separate bugs for tracking, but it's the same underlying problem.
*** Bug 247169 has been marked as a duplicate of this bug. ***
*** Bug 225485 has been marked as a duplicate of this bug. ***
(In reply to comment #13) > I tested the proposed change and it fixed my issue. > > RHEL5 bug is private. :\ bug 247169 was closed as a duplicate of bug 246623, which is not private, but not RHEL5 either: https://bugzilla.redhat.com/show_bug.cgi?id=246623 Neither 247169 nor 246623 shows up in changelog of RHEL5 kernel. There's no changelog reference to "PIE randomization" either. My guess is this isn't fixed in RHEL5.
(In reply to comment #27) > My guess is this isn't fixed in RHEL5. But maybe it is, via a slightly different patch, in: https://bugzilla.redhat.com/show_bug.cgi?id=230339 * Fri Jun 01 2007 Don Zickus <dzickus> [2.6.18-21.el5] ... - [fs] invalid segmentation violation during exec (Dave Anderson ) [230339]
https://bugzilla.redhat.com/show_bug.cgi?id=488449 "segfaults from ld-linux-x86-64" suggests that the fix to the execshield patch was not done correctly. https://bugzilla.redhat.com/show_bug.cgi?id=488449#c19