From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4 Description of problem: Return from signal handler is unreliable because the kernel uses a dangling value for __kernel_sigreturn when setting up the code for return from signal handler. setup_frame() in arch/i386/kernel/signal.c uses restorer = current->mm->context.vdso + (long)&__kernel_sigreturn; whenever !(.sa_flags & SA_RESTORER). Unfortunately: context.vdso is never updated when the user changes the mapping for that page, the mapping is not protected against being changed, and because /proc/PID/auxv is read-only then the user cannot inform the kernel. So any *sigaction() that does not specify SA_RESTORER creates a time bomb for the return from signal handler. Some users want to move the AT_SYSINFO_EHDR page in order to maximize contiguous page ranges when dealing with large arrays. The kernel must allow the user to move the AT_SYSINFO page. [If not, then the kernel must object to any mmap/mprotect/munmap/mremap that affects the AT_SYSINFO page.] The alternatives I see are: kernel detects mmap(vaddr, PAGE_SIZE, PROT_EXEC, MAP_FIXED, fd, 0) where fd is /proc/self/auxv, then adjusts __kernel_sigreturn; let the user tell the kernel by writing to /proc/PID/auxv (using some protocol such as: seek to AT_SYSINFO_EHDR * sizeof(void *), write new binary value); or let the user tell the kernel by using a new syscall. Version-Release number of selected component (if applicable): kernel-2.6.11-1.1369_FC4 How reproducible: Always Steps to Reproduce: 1. At Elf32_Ehdr.e_entry, immediately after execve(): find AT_SYSINFO_EHDR, copy that page to a new one, update AT_SYSINFO and AT_SYSINFO_EHDR to point to the new page, discard the old page. 2. Call *sigaction() with a handler but without SA_RESTORER. 3. Receive a signal, and attempt to return from the handler. Actual Results: Return from the handler faults SIGSEGV because the original AT_SYSINFO_EHDR page no longer exists. Expected Results: Kernel should not rely on dangling __kernel_sigreturn when setting up return from signal handler. Either kernel should use a fallback, "always works" mechanism, or allow [and require] the user to tell the kernel when the user moves the AT_SYSINFO page. Additional info:
It seems dubious to me that the user should reasonably expect to unmap the kernel-supplied page and not have things go all to hell. If the application needs to ensure that certain ranges of its address space are free, the most proper thing to do is reserve those by using PT_LOAD segments in the executable ELF file. Segments that reserve address space without loading anything can have p_flags=0 to get PROT_NONE mappings that you can later unmap or mmap over. It's possible the kernel has some bug dealing with this, but that would be a separate kernel bug and if a fix is necessary it should be done there. Unless you can convince me otherwise, I'm inclined to resolve this NOTABUG.
I agree that ranges with fixed addresses known in advance should be reserved using PT_LOAD. The problem is for ranges not known in advance, namely all the holes left over after mapping the executable and the PT_INTERP, particularly when ET_DYN, 0==p_vaddr, and randomization is involved. This is a frequent case for PT_INTERP [that has not been prelinked], and an increasingly-common case for -fPIE main executables. The kernel randomly places the AT_SYSINFO page somewhere in the holes, and this often fragments the address space unnecessarily. Splitting a 100MB hole into a {30MB hole, 4KB AT_SYSINFO, 70MB hole} is costly because a 75MB array can no longer use that address space if the 4KB AT_SYSINFO page cannot be moved. Here is a compromise that I could live with: adjust the default policy for AT_SYSINFO placement to be 1 page below that of the first PT_LOAD for the PT_INTERP (and especially for a ET_DYN PT_INTERP), else 1 page below that of the first PT_LOAD of the main executable. In particualar, if I prelink my PT_INTERP (or use an ET_EXEC PT_INTERP) then in effect I also prelink the AT_SYSINFO page at 4KB less. This would dramatically reduce the unnecessary fragmentation of the address space, while still retaining some security benefits of randomization (if either the PT_INTERP or the main execve() were ET_DYN with 0==p_vaddr). It would also give administrators and users more control, and in an understandable way.
There is a performance aspect, too. The random placement of the AT_SYSINFO page (linux-gate.so.1) by the kernel can disrupt much of the prelinking for a typical configuration of shared libraries. The kernel places AT_SYSINFO after seeing at most the main execve() and the PT_INTERP. If the kernel happens to chose a page that is prefered by a prelinked .so which is mapped-in later by the usual PT_INTERP ld-linux.so.2, then ld-linux must relocate that .so somewhere else. Doing so invokes another randomization by the kernel, which may step on pages prefered by subsequent prelinked .so, and the result can cascade many times. It is not uncommon for KDE, Gnome, even "bare" X11 applications to use a dozen or more prelinked .so. Being forced to abandon the prelinked address costs CPU time and may reduce page sharing. If the placement policy for AT_SYSINFO were "1 page below [or above] PT_INTERP" then a local administrator (or prelink itself) could take care to avoid that page. The result would be that the randomization would be controlled by the prelink policy (and not directly by the kernel unless 0==p_vaddr), and the kernel would not "accidentally" disrupt placement of an entire configuration of .so.
Here is an example which shows that the kernel placement of linux-gate.so.1 does interfere with a prelinked glibc, forcing libc.so.6 to be relocated at runtime. In this specific case where glibc is the only .so besides ld-linux.so.2, the frequency was 7.4%. ----- for i in 0 1 2 3 4 5 6 7 8 9; do for j in 0 1 2 3 4 5 6 7 8 9; do for k in 0 1 2 3 4 5 6 7 8 9; do ldd /bin/cat done done done | grep libc | sort | uniq -c ----- 74 libc.so.6 => /lib/libc.so.6 (0x00111000) 926 libc.so.6 => /lib/libc.so.6 (0x009ee000) -----
[This comment has been added as a mass update for all FC4 kernel bugs. If you have migrated this bug from an FC3 bug today, ignore this comment.] Please retest your problem with todays 2.6.12-1.1398_FC4 update. If your problem involved being unable to boot, or some hardware not being detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE* installing any kernel updates. If in doubt, you can recreate this file using.. mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak mv /etc/modprobe.conf /etc/modprobe.conf.bak kudzu Thank you.
kernel-2.6.12-1.1398_FC4 on i686 still has the original problem: dangling __kernel_sigreturn. I will attach two small files which form a reproducible testcase: start.S and sigreturn.c. $ gcc -g -o sigreturn -nostartfiles -nostdlib start.S sigreturn.c $ ./sigreturn Segmentation fault # because linux-gate.so.1 page was moved $
Created attachment 116950 [details] start.S assembly code for testcase
Created attachment 116951 [details] sigreturn.c for testcase
Mass update to all FC4 bugs: An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream kernel (2.6.13.2). As there were ~3500 changes upstream between this and the previous kernel, it's possible your bug has been fixed already. Please retest with this update, and update this bug if necessary. Thanks.
The testcase of comment #6 still fails (gives Segmentation fault because __kernel_sigreturn dangles after the user moves the AT_SYSINFO page) under kernel-2.6.13-1.1526_FC4. The performance test of comment #4 showed: 68 libc.so.6 => /lib/libc.so.6 (0x00111000) 932 libc.so.6 => /lib/libc.so.6 (0x009ee000) which is 6.8% interference of AT_SYSINFO page with pre-linked glibc. (This will become worse if the app uses more than one pre-linked shared lib.)
bugzilla mail says jreiser changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO_REPORTER |NEEDINFO but I did not "press any buttons" [except the "Save Changes"] when creating Comment #10. Perhaps >this< comment will be enough to respond to the "NEEDINFO" status.
2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you.
kernel-2.6.14-1.1637_FC4 still gives SIGSEGV because __kernel_sigreturn dangles after the user moves the AT_SYSINFO page (testcase of comment #6.) The performance test of comment #4 showed: 77 libc.so.6 => /lib/libc.so.6 (0x00111000) 923 libc.so.6 => /lib/libc.so.6 (0x009e4000) which is 7.7% interference of AT_SYSINFO page with pre-linked glibc. The degradation will get worse as the process uses more pre-linked shared libraries (such as any KDE or Gnome application.)
Created attachment 122261 [details] put vDSO at STACK_TOP This patch linux-2.6-x86-vdso-stacktop.patch to kernel-2.6.14-1.1760_FC5 is a workaround that puts the vDSO at STACK_TOP, 1 page below TASK_SIZE. Because the vDSO page is executable and will reside at the highest user address, then exec_shield has no effect; so set /proc/sys/kernel/exec-shield (or the compile-time variable exec_shield in kernel/sysctl.c) to zero.
Created attachment 122364 [details] vDSO: random, STACK_TOP, just below mm->start_code This patch puts the vDSO at STACK_TOP when exec-shield is 0. Otherwise, another bit in exec_shield chooses between random placement and the page just below current->mm->start_code.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
The issues persist with kernel-2.6.15-1.1826.2.10_FC5 on i686. [Changed Version to fc5test2.] kernel-2.6.15-1.1895_FC5 hangs starting udev on Athlon SiS730 [see bugzilla #179601.] The performance degradation of Comment #4 remains about 7%: 74 libc.so.6 => /lib/libc.so.6 (0x00111000) 926 libc.so.6 => /lib/libc.so.6 (0x006db000)
Created attachment 126260 [details] exec-shield options for vDSO placement on x86 Two more bits in /proc/sys/kernel/exec-shield control placement of vDSO on x86: random, just below STACK_TOP, just below .text of main executable, just below .text of PT_INTERP (ld.so.)
kernel-2.6.15-1.2054_FC5 still has the same properties here. The performance problem (randomly-placed vDSO interferes with prelinking) sticks out prominently. The Firefox browser often takes over 15 seconds to start (from click on icon in menu bar, until some content is visible from local file:/// home page), which includes over 10 seconds after "Starting Web Browser" disappears but before window appears. Thus it looks like launch has failed. In contrast, with the patch of Comment #18, Firefox always launches in less than 5 seconds. The vDSO is placed just below .text of the PT_INTERP (ld.so), just below the .text of the main executable, or just below STACK_TOP; this is controlled by new bits in exec-shield.
Fixed in 2080_FC5 and rawhide (and the pending FC4 update). Thanks for persevering on this one John.