Description of problem: A stack overflow can be easily reproduced with a trivial test program. The stack overflow appears to be erroneous, as the program's stack size at the time is well under the process stack size limit (set with ulimit -s). We suspect the stack randomization code may be the culprit. Version-Release number of selected component (if applicable): test system (RHEL5 rc1 ppc64): ----------------------------- # rpm -q redhat-release redhat-release-5Server-5.0.0.7 # uname -a Linux trumpkin 2.6.18-8.el5 #1 SMP Fri Jan 26 14:19:36 EST 2007 ppc64 ppc64 ppc64 GNU/Linux gcc used to build test program (RHEL3 ppc64): -------------------------------------------- $ gcc --version gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-42) How reproducible: 1 in every 10 or so runs of the test program. The problem seems to occur when the stack randomization sets the stack starting point very low (near the libraries that are mapped in under the stack). Steps to Reproduce: 1. build test program: gcc a.c -o a -lstdc++ (The -lstdc++ was added simply to make sure there were libraries mapped in under the stack, as the problem does not occur otherwise. Also, -g and -O options can be used and the problem still occurs.) 2. run test program in a loop: ulimit -s 10000; while :; do ./a.sh "./a"; if [ $? -ne 0 ]; then echo ERROR; break; fi; done (a.sh and a.c source are attached, the 10000 can be changed to a higher value and the problem still occurs) Actual results: core dump is produced -- running gdb on core produces this: Core was generated by `./a'. Program terminated with signal 11, Segmentation fault. #0 0x100004c0 in func () at a.c:4 4 char buf[500000]; (gdb) disassemble func Dump of assembler code for function func: 0x100004b4 <func+0>: mr r12,r1 0x100004b8 <func+4>: lis r0,-8 0x100004bc <func+8>: ori r0,r0,24256 0x100004c0 <func+12>: stwux r1,r1,r0 So when the stack size is increased (stwux into r1), the program crashes. Expected results: No crash. Additional info: sources attached
Created attachment 149467 [details] test harness (shell script)
Created attachment 149469 [details] test program (C source) compile with: gcc a.c -o c -lstdc++
The cause of this is the RHEL ppc64 kernel having 64k pages on by default. The problem is in fs/binfmt_elf.c:randomize_stack_top() which has this code: #ifndef STACK_RND_MASK #define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12)) /* 8MB of VA */ #endif static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } #ifdef CONFIG_STACK_GROWSUP return PAGE_ALIGN(stack_top) + random_variable; #else return PAGE_ALIGN(stack_top) - random_variable; #endif } if you have 64k pages, this makes your randomization 128MB. Co-incidentally, in the new binary format, only 128MB is left between the top of process memory and the first mapping, so for a stack rlimit of < 128MB you stand a non zero chance of randomizing your stack base away entirely and thus producing random crashes.
Sorry, that's code from the proposed fix on lkml. the true define is #define STACK_RND_MASK 0x7ff /* with 4K pages 8MB of VA */
The fix is now committed to mainline as commit d1cabd63262707ad5d6bb730f25b7a2852734595 Author: James Bottomley <James.Bottomley> Date: Fri Mar 16 13:38:35 2007 -0800 [PATCH] fix process crash caused by randomisation and 64k pages
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch as put into 2.6.21-rc4-git2: diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 51db118..a2fceba 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -507,7 +507,7 @@ out: #define INTERPRETER_ELF 2 #ifndef STACK_RND_MASK -#define STACK_RND_MASK 0x7ff /* with 4K pages 8MB of VA */ +#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12)) /* 8MB of VA */ #endif static unsigned long randomize_stack_top(unsigned long stack_top)
I agree that the patch should be applied, but I cannot reproduce this. If compiled natively on the RHEL5 machine with gcc 4.1.1, it runs with no problem. But the test directions indicate that the test program to be compiled on a RHEL3 machine with gcc 3.2.3-42. However, the closest I can come to that is a RHEL3 machine with gcc 3.2.3-46: # cat /etc/redhat-release Red Hat Enterprise Linux AS release 3 (Taroon) [root@p630 root]# gcc --version gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-46) Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # gcc a.c -o a -lstdc++ # scp a [ to RHEL5 machine ] But it does not run on the RHEL5 machine: # cat /etc/redhat-release Red Hat Enterprise Linux Server release 5 (Tikanga) # ./a ./a: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory # So I compiled it without the libstc++ on the RHEL3 machine, but then it runs OK on the RHEL5 machine. Perhaps without the libstc++, the libaries are not even close to the stack since they seem to be moved to 64-bit space: 10000 pid: 14058 00100000-00120000 r-xp 00100000 00:00 0 [vdso] 10000000-10010000 r-xp 00000000 fd:00 6520866 /root/testdir-3.2.3/a 10010000-10020000 rw-p 00000000 fd:00 6520866 /root/testdir-3.2.3/a 80eb5a0000-80eb5d0000 r-xp 00000000 fd:00 2031917 /lib64/ld-2.5.so 80eb5d0000-80eb5e0000 r--p 00020000 fd:00 2031917 /lib64/ld-2.5.so 80eb5e0000-80eb5f0000 rw-p 00030000 fd:00 2031917 /lib64/ld-2.5.so 80eb5f0000-80eb770000 r-xp 00000000 fd:00 2031918 /lib64/libc-2.5.so 80eb770000-80eb780000 r--p 00180000 fd:00 2031918 /lib64/libc-2.5.so 80eb780000-80eb790000 rw-p 00190000 fd:00 2031918 /lib64/libc-2.5.so 80eb790000-80eb7a0000 rw-p 80eb790000 00:00 0 ffffc000000-ffffc150000 rw-p ffffc000000 00:00 0 [stack] Maybe you could attach your "a" binary if by some chance it's different than the one I'm creating? Although I don't see how it will get around the "libstdc++.so.5" error. I tried creating a symbolic link from the current libstdc version to libstdc++.so.5 like so: # cd /usr/lib # ls -l libstdc* lrwxrwxrwx 1 root root 18 May 3 11:59 libstdc++.so.5 -> libstdc++.so.6.0.8 lrwxrwxrwx 1 root root 18 May 3 08:42 libstdc++.so.6 -> libstdc++.so.6.0.8 -rwxr-xr-x 1 root root 1187328 Jan 17 20:24 libstdc++.so.6.0.8 # But I still get the same error. (???) How did you get it all to work in your environment?
You need to install the compat-libstdc++ package on your RHEL5 machine: compat-libstdc++-33-3.2.3-61
Ok, I first installed compat-libstdc++-33-3.2.3-61.ppc.rpm, but "a" still fails with the "error while loading shared libraries: libstdc++.so.5". So I installed compat-libstdc++-33-3.2.3-61.ppc64.rpm as well, and "a" works OK, but runs fine, but presumably because it's in 64-bit space: 10000 pid: 14841 00100000-00120000 r-xp 00100000 00:00 0 [vdso] 10000000-10010000 r-xp 00000000 fd:00 6520867 /root/testdir-3.2.3/a 10010000-10020000 rw-p 00000000 fd:00 6520867 /root/testdir-3.2.3/a 80eb5a0000-80eb5d0000 r-xp 00000000 fd:00 2031917 /lib64/ld-2.5.so 80eb5d0000-80eb5e0000 r--p 00020000 fd:00 2031917 /lib64/ld-2.5.so 80eb5e0000-80eb5f0000 rw-p 00030000 fd:00 2031917 /lib64/ld-2.5.so 80eb5f0000-80eb770000 r-xp 00000000 fd:00 2031918 /lib64/libc-2.5.so 80eb770000-80eb780000 r--p 00180000 fd:00 2031918 /lib64/libc-2.5.so 80eb780000-80eb790000 rw-p 00190000 fd:00 2031918 /lib64/libc-2.5.so 80eb790000-80eb7a0000 rw-p 80eb790000 00:00 0 80eb7d0000-80eb890000 r-xp 00000000 fd:00 2031697 /lib64/libm-2.5.so 80eb890000-80eb8a0000 r--p 000b0000 fd:00 2031697 /lib64/libm-2.5.so 80eb8a0000-80eb8b0000 rw-p 000c0000 fd:00 2031697 /lib64/libm-2.5.so 80eb9d0000-80eb9f0000 r-xp 00000000 fd:00 2031926 /lib64/libgcc_s-4.1.1-20070105.so.1 80eb9f0000-80eba00000 rw-p 00010000 fd:00 2031926 /lib64/libgcc_s-4.1.1-20070105.so.1 40000010000-40000120000 r-xp 00000000 fd:00 4574268 /usr/lib64/libstdc++.so.5.0.7 40000120000-40000140000 rw-p 00110000 fd:00 4574268 /usr/lib64/libstdc++.so.5.0.7 40000140000-40000150000 rw-p 40000140000 00:00 0 ffffab80000-ffffacd0000 rw-p ffffab80000 00:00 0 # So, can you confirm that it should use the "ppc" package, and also attach your "a" executable?
> So, can you confirm that it should use the "ppc" package, > and also attach your "a" executable? Although the "ppc" package doesn't seem to make sense, because the executable I built is looking in /usr/lib64: # ldd /usr/tmp/a linux-vdso64.so.1 => (0x0000000000100000) libstdc++.so.5 => /usr/lib64/libstdc++.so.5 (0x0000040000010000) libc.so.6 => /lib64/libc.so.6 (0x00000080eb5f0000) libm.so.6 => /lib64/libm.so.6 (0x00000080eb7d0000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000080eb9d0000) /lib64/ld64.so.1 (0x00000080eb5a0000) # # rpm2cpio compat-libstdc++-33-3.2.3-61.ppc64.rpm | cpio -t ./usr/lib64/libstdc++.so.5 ./usr/lib64/libstdc++.so.5.0.7 # # rpm2cpio compat-libstdc++-33-3.2.3-61.ppc.rpm | cpio -t ./usr/lib/libstdc++.so.5 ./usr/lib/libstdc++.so.5.0.7 #
Ok, I re-compiled it on the RHEL3 machine: gcc -m32 a.c -o a -lstdc++ and now I can get it to core dump...
Just for documentation purposes, here's an example of a failure: ... 10000 pid: 21750 00100000-00120000 r-xp 00100000 00:00 0 [vdso] 0fba0000-0fbc0000 r-xp 00000000 fd:00 8883491 /lib/libgcc_s-4.1.1-20070105.so.1 0fbc0000-0fbd0000 rw-p 00010000 fd:00 8883491 /lib/libgcc_s-4.1.1-20070105.so.1 0fc90000-0fd50000 r-xp 00000000 fd:00 8883490 /lib/libm-2.5.so 0fd50000-0fd60000 r--p 000b0000 fd:00 8883490 /lib/libm-2.5.so 0fd60000-0fd70000 rw-p 000c0000 fd:00 8883490 /lib/libm-2.5.so 0fee0000-0ffa0000 r-xp 00000000 fd:00 4574266 /usr/lib/libstdc++.so.5.0.7 0ffa0000-0ffb0000 rwxp 000c0000 fd:00 4574266 /usr/lib/libstdc++.so.5.0.7 0ffc0000-0ffe0000 r-xp 00000000 fd:00 8883484 /lib/ld-2.5.so 0ffe0000-0fff0000 r--p 00010000 fd:00 8883484 /lib/ld-2.5.so 0fff0000-10000000 rw-p 00020000 fd:00 8883484 /lib/ld-2.5.so 10000000-10010000 r-xp 00000000 fd:00 6520871 /root/testdir-3.2.3/a 10010000-10020000 rwxp 00000000 fd:00 6520871 /root/testdir-3.2.3/a f7e60000-f7fc0000 r-xp 00000000 fd:00 8883485 /lib/libc-2.5.so f7fc0000-f7fd0000 r--p 00160000 fd:00 8883485 /lib/libc-2.5.so f7fd0000-f7fe0000 rw-p 00170000 fd:00 8883485 /lib/libc-2.5.so f8230000-f8380000 rw-p f8230000 00:00 0 [stack] limit 10 limit 9 limit 8 limit 7 limit 6 limit 5 limit 4 ./a.sh: line 7: 21750 Segmentation fault (core dumped) $prog ERROR The task ran with "ulimit -s 10000", so it could conceivably allow the stack to reach from a top of f8380000 down to f79bc000. That would put it way down into the no-man's land between the /root/testdir-3.2.3/a data region and the first region used by /lib/libc-2.5.so. But it never made it that far, but rather the DAR register shows 00000000F7FAEE50, which puts it in the non-writable libc-2.5.so segment between f7e60000-f7fc0000, causing the segmentation violation: # dmesg a/21750: potentially unexpected fatal signal 11. NIP: 00000000100004C0 LR: 0000000010000534 CTR: 00000000F7ED6380 REGS: c00000003cf6bea0 TRAP: 0300 Not tainted (2.6.18-8.el5) MSR: 000000000000D032 <EE,PR,ME,IR,DR> CR: 40000482 XER: 00000000 DAR: 00000000F7FAEE50, DSISR: 000000000A000000 TASK = c00000003a565ae0[21750] 'a' THREAD: c00000003cf68000 CPU: 1 GPR00: FFFFFFFFFFF85EC0 00000000F8028F90 000000000FFF9710 0000000000000008 GPR04: 00000000F8026948 0000000000000008 0000000000000000 0000000000000000 GPR08: 0000000000008000 0000000000000003 0000000000000000 0000000010010000 GPR12: 00000000F8028F90 0000000010018A7C 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 000000000FFCEB40 00000000F837FAE0 00000000F837FAF4 0000000000000001 GPR28: 0000000000000000 000000000FFEF6D8 00000000F7FCFFF4 00000000F8028F90 NIP [00000000100004C0] 0x100004c0 LR [0000000010000534] 0x10000534
in 2.6.18-19.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html