Bug 231312

Summary: reproducible stack overflow with trivial test program
Product: Red Hat Enterprise Linux 5 Reporter: Paul Clements <paul.clements>
Component: kernelAssignee: Dave Anderson <anderson>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: dzickus, james.bottomley, smoser
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0959 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 19:43:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test harness (shell script)
none
test program (C source) none

Description Paul Clements 2007-03-07 17:05:21 UTC
Description of problem: A stack overflow can be easily reproduced with a trivial
test program. The stack overflow appears to be erroneous, as the program's stack
size at the time is well under the process stack size limit (set with ulimit
-s). We suspect the stack randomization code may be the culprit.

Version-Release number of selected component (if applicable):

test system (RHEL5 rc1 ppc64):
-----------------------------
# rpm -q redhat-release
redhat-release-5Server-5.0.0.7
# uname -a
Linux trumpkin 2.6.18-8.el5 #1 SMP Fri Jan 26 14:19:36 EST 2007 ppc64 ppc64
ppc64 GNU/Linux


gcc used to build test program (RHEL3 ppc64): 
--------------------------------------------
$ gcc --version
gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-42)


How reproducible: 1 in every 10 or so runs of the test program. The problem
seems to occur when the stack randomization sets the stack starting point very
low (near the libraries that are mapped in under the stack).


Steps to Reproduce:

1. build test program:

gcc a.c -o a -lstdc++

(The -lstdc++ was added simply to make sure there were libraries mapped in under
the stack, as the problem does not occur otherwise. Also, -g and -O options can
be used and the problem still occurs.)

2. run test program in a loop:

ulimit -s 10000; while :; do ./a.sh "./a"; if [ $? -ne 0 ]; then echo ERROR;
break; fi; done

(a.sh and a.c source are attached, the 10000 can be changed to a higher value
and the problem still occurs)

  
Actual results:

core dump is produced -- running gdb on core produces this:

Core was generated by `./a'.
Program terminated with signal 11, Segmentation fault.
#0  0x100004c0 in func () at a.c:4
4               char buf[500000];
(gdb) disassemble func
Dump of assembler code for function func:
0x100004b4 <func+0>:    mr      r12,r1
0x100004b8 <func+4>:    lis     r0,-8
0x100004bc <func+8>:    ori     r0,r0,24256
0x100004c0 <func+12>:   stwux   r1,r1,r0

So when the stack size is increased (stwux into r1), the program crashes.


Expected results:

No crash.

Additional info: sources attached

Comment 1 Paul Clements 2007-03-07 17:05:21 UTC
Created attachment 149467 [details]
test harness (shell script)

Comment 2 Paul Clements 2007-03-07 17:07:05 UTC
Created attachment 149469 [details]
test program (C source)

compile with: gcc a.c -o c -lstdc++

Comment 3 James Bottomley 2007-03-16 18:26:08 UTC
The cause of this is the RHEL ppc64 kernel having 64k pages on by default.  The
problem is in fs/binfmt_elf.c:randomize_stack_top() which has this code:

#ifndef STACK_RND_MASK
#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))	/* 8MB of VA */
#endif

static unsigned long randomize_stack_top(unsigned long stack_top)
{
	unsigned int random_variable = 0;

	if ((current->flags & PF_RANDOMIZE) &&
		!(current->personality & ADDR_NO_RANDOMIZE)) {
		random_variable = get_random_int() & STACK_RND_MASK;
		random_variable <<= PAGE_SHIFT;
	}
#ifdef CONFIG_STACK_GROWSUP
	return PAGE_ALIGN(stack_top) + random_variable;
#else
	return PAGE_ALIGN(stack_top) - random_variable;
#endif
}

if you have 64k pages, this makes your randomization 128MB.  Co-incidentally, in
the new binary format, only 128MB is left between the top of process memory and
the first mapping, so for a stack rlimit of < 128MB you stand a non zero chance
of randomizing your stack base away entirely and thus producing random crashes.

Comment 4 James Bottomley 2007-03-16 18:28:20 UTC
Sorry, that's code from the proposed fix on lkml.  the true define is

#define STACK_RND_MASK 0x7ff           /* with 4K pages 8MB of VA */


Comment 7 James Bottomley 2007-03-17 18:51:36 UTC
The fix is now committed to mainline as

commit d1cabd63262707ad5d6bb730f25b7a2852734595
Author: James Bottomley <James.Bottomley>
Date:   Fri Mar 16 13:38:35 2007 -0800

    [PATCH] fix process crash caused by randomisation and 64k pages


Comment 8 RHEL Program Management 2007-04-25 20:50:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Dave Anderson 2007-04-30 20:43:43 UTC
Patch as put into 2.6.21-rc4-git2:

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 51db118..a2fceba 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -507,7 +507,7 @@ out:
 #define INTERPRETER_ELF 2

 #ifndef STACK_RND_MASK
-#define STACK_RND_MASK 0x7ff           /* with 4K pages 8MB of VA */
+#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))    /* 8MB of VA */
 #endif

 static unsigned long randomize_stack_top(unsigned long stack_top)


Comment 11 Dave Anderson 2007-05-03 16:04:15 UTC
I agree that the patch should be applied, but I cannot reproduce this.

If compiled natively on the RHEL5 machine with gcc 4.1.1, it runs with
no problem.  

But the test directions indicate that the test program to be compiled
on a RHEL3 machine with gcc 3.2.3-42.

However, the closest I can come to that is a RHEL3 machine with gcc 3.2.3-46:

  # cat /etc/redhat-release
  Red Hat Enterprise Linux AS release 3 (Taroon)
  [root@p630 root]# gcc --version
  gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-46)
  Copyright (C) 2002 Free Software Foundation, Inc.
  This is free software; see the source for copying conditions.  There is NO
  warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

  # gcc a.c -o a -lstdc++
  # scp a [ to RHEL5 machine ]

But it does not run on the RHEL5 machine:

  # cat /etc/redhat-release
  Red Hat Enterprise Linux Server release 5 (Tikanga)
  # ./a
  ./a: error while loading shared libraries: libstdc++.so.5: cannot open shared
  object file: No such file or directory
#

So I compiled it without the libstc++ on the RHEL3 machine, but then 
it runs OK on the RHEL5 machine.

Perhaps without the libstc++, the libaries are not even close
to the stack since they seem to be moved to 64-bit space:

10000
pid: 14058
00100000-00120000 r-xp 00100000 00:00 0                                  [vdso]
10000000-10010000 r-xp 00000000 fd:00 6520866                           
/root/testdir-3.2.3/a
10010000-10020000 rw-p 00000000 fd:00 6520866                           
/root/testdir-3.2.3/a
80eb5a0000-80eb5d0000 r-xp 00000000 fd:00 2031917                       
/lib64/ld-2.5.so
80eb5d0000-80eb5e0000 r--p 00020000 fd:00 2031917                       
/lib64/ld-2.5.so
80eb5e0000-80eb5f0000 rw-p 00030000 fd:00 2031917                       
/lib64/ld-2.5.so
80eb5f0000-80eb770000 r-xp 00000000 fd:00 2031918                       
/lib64/libc-2.5.so
80eb770000-80eb780000 r--p 00180000 fd:00 2031918                       
/lib64/libc-2.5.so
80eb780000-80eb790000 rw-p 00190000 fd:00 2031918                       
/lib64/libc-2.5.so
80eb790000-80eb7a0000 rw-p 80eb790000 00:00 0
ffffc000000-ffffc150000 rw-p ffffc000000 00:00 0                         [stack]

Maybe you could attach your "a" binary if by some chance it's
different than the one I'm creating?  Although I don't see how
it will get around the "libstdc++.so.5" error.  I tried creating
a symbolic link from the current libstdc version to libstdc++.so.5
like so:  

# cd /usr/lib
# ls -l libstdc*
lrwxrwxrwx 1 root root      18 May  3 11:59 libstdc++.so.5 -> libstdc++.so.6.0.8
lrwxrwxrwx 1 root root      18 May  3 08:42 libstdc++.so.6 -> libstdc++.so.6.0.8
-rwxr-xr-x 1 root root 1187328 Jan 17 20:24 libstdc++.so.6.0.8
# 

But I still get the same error.  (???)

How did you get it all to work in your environment?



Comment 12 Paul Clements 2007-05-03 17:27:11 UTC
You need to install the compat-libstdc++ package on your RHEL5 machine:

compat-libstdc++-33-3.2.3-61






Comment 13 Dave Anderson 2007-05-03 18:16:45 UTC
Ok, I first installed compat-libstdc++-33-3.2.3-61.ppc.rpm, but "a" still
fails with the "error while loading shared libraries: libstdc++.so.5".

So I installed compat-libstdc++-33-3.2.3-61.ppc64.rpm as well, and "a"
works OK, but runs fine, but presumably because it's in 64-bit space:

10000
pid: 14841
00100000-00120000 r-xp 00100000 00:00 0                                  [vdso]
10000000-10010000 r-xp 00000000 fd:00 6520867                           
/root/testdir-3.2.3/a
10010000-10020000 rw-p 00000000 fd:00 6520867                           
/root/testdir-3.2.3/a
80eb5a0000-80eb5d0000 r-xp 00000000 fd:00 2031917                       
/lib64/ld-2.5.so
80eb5d0000-80eb5e0000 r--p 00020000 fd:00 2031917                       
/lib64/ld-2.5.so
80eb5e0000-80eb5f0000 rw-p 00030000 fd:00 2031917                       
/lib64/ld-2.5.so
80eb5f0000-80eb770000 r-xp 00000000 fd:00 2031918                       
/lib64/libc-2.5.so
80eb770000-80eb780000 r--p 00180000 fd:00 2031918                       
/lib64/libc-2.5.so
80eb780000-80eb790000 rw-p 00190000 fd:00 2031918                       
/lib64/libc-2.5.so
80eb790000-80eb7a0000 rw-p 80eb790000 00:00 0
80eb7d0000-80eb890000 r-xp 00000000 fd:00 2031697                       
/lib64/libm-2.5.so
80eb890000-80eb8a0000 r--p 000b0000 fd:00 2031697                       
/lib64/libm-2.5.so
80eb8a0000-80eb8b0000 rw-p 000c0000 fd:00 2031697                       
/lib64/libm-2.5.so
80eb9d0000-80eb9f0000 r-xp 00000000 fd:00 2031926                       
/lib64/libgcc_s-4.1.1-20070105.so.1
80eb9f0000-80eba00000 rw-p 00010000 fd:00 2031926                       
/lib64/libgcc_s-4.1.1-20070105.so.1
40000010000-40000120000 r-xp 00000000 fd:00 4574268                     
/usr/lib64/libstdc++.so.5.0.7
40000120000-40000140000 rw-p 00110000 fd:00 4574268                     
/usr/lib64/libstdc++.so.5.0.7
40000140000-40000150000 rw-p 40000140000 00:00 0
ffffab80000-ffffacd0000 rw-p ffffab80000 00:00 0
#

So, can you confirm that it should use the "ppc" package,
and also attach your "a" executable?
  

Comment 14 Dave Anderson 2007-05-03 18:24:30 UTC
> So, can you confirm that it should use the "ppc" package,
> and also attach your "a" executable?

Although the "ppc" package doesn't seem to make sense, because the
executable I built is looking in /usr/lib64:

# ldd /usr/tmp/a
        linux-vdso64.so.1 =>  (0x0000000000100000)
        libstdc++.so.5 => /usr/lib64/libstdc++.so.5 (0x0000040000010000)
        libc.so.6 => /lib64/libc.so.6 (0x00000080eb5f0000)
        libm.so.6 => /lib64/libm.so.6 (0x00000080eb7d0000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000080eb9d0000)
        /lib64/ld64.so.1 (0x00000080eb5a0000)
#


# rpm2cpio compat-libstdc++-33-3.2.3-61.ppc64.rpm | cpio -t
./usr/lib64/libstdc++.so.5
./usr/lib64/libstdc++.so.5.0.7
#

# rpm2cpio compat-libstdc++-33-3.2.3-61.ppc.rpm | cpio -t
./usr/lib/libstdc++.so.5
./usr/lib/libstdc++.so.5.0.7
#



Comment 15 Dave Anderson 2007-05-03 19:43:10 UTC
Ok, I re-compiled it on the RHEL3 machine: gcc -m32 a.c -o a -lstdc++

and now I can get it to core dump...


Comment 16 Dave Anderson 2007-05-03 20:53:07 UTC
Just for documentation purposes, here's an example of a failure:

...
10000
pid: 21750
00100000-00120000 r-xp 00100000 00:00 0        [vdso]
0fba0000-0fbc0000 r-xp 00000000 fd:00 8883491  /lib/libgcc_s-4.1.1-20070105.so.1
0fbc0000-0fbd0000 rw-p 00010000 fd:00 8883491  /lib/libgcc_s-4.1.1-20070105.so.1
0fc90000-0fd50000 r-xp 00000000 fd:00 8883490  /lib/libm-2.5.so
0fd50000-0fd60000 r--p 000b0000 fd:00 8883490  /lib/libm-2.5.so
0fd60000-0fd70000 rw-p 000c0000 fd:00 8883490  /lib/libm-2.5.so
0fee0000-0ffa0000 r-xp 00000000 fd:00 4574266  /usr/lib/libstdc++.so.5.0.7
0ffa0000-0ffb0000 rwxp 000c0000 fd:00 4574266  /usr/lib/libstdc++.so.5.0.7
0ffc0000-0ffe0000 r-xp 00000000 fd:00 8883484  /lib/ld-2.5.so
0ffe0000-0fff0000 r--p 00010000 fd:00 8883484  /lib/ld-2.5.so
0fff0000-10000000 rw-p 00020000 fd:00 8883484  /lib/ld-2.5.so
10000000-10010000 r-xp 00000000 fd:00 6520871  /root/testdir-3.2.3/a
10010000-10020000 rwxp 00000000 fd:00 6520871  /root/testdir-3.2.3/a
f7e60000-f7fc0000 r-xp 00000000 fd:00 8883485  /lib/libc-2.5.so
f7fc0000-f7fd0000 r--p 00160000 fd:00 8883485  /lib/libc-2.5.so
f7fd0000-f7fe0000 rw-p 00170000 fd:00 8883485  /lib/libc-2.5.so
f8230000-f8380000 rw-p f8230000 00:00 0        [stack]
limit 10
limit 9
limit 8
limit 7
limit 6
limit 5
limit 4
./a.sh: line 7: 21750 Segmentation fault      (core dumped) $prog
ERROR

The task ran with "ulimit -s 10000", so it could conceivably
allow the stack to reach from a top of f8380000 down to f79bc000.  
That would put it way down into the no-man's land between 
the /root/testdir-3.2.3/a data region and the first region
used by /lib/libc-2.5.so.  But it never made it that far,
but rather the DAR register shows 00000000F7FAEE50, which
puts it in the non-writable libc-2.5.so segment between
f7e60000-f7fc0000, causing the segmentation violation:

# dmesg
a/21750: potentially unexpected fatal signal 11.

NIP: 00000000100004C0 LR: 0000000010000534 CTR: 00000000F7ED6380
REGS: c00000003cf6bea0 TRAP: 0300   Not tainted  (2.6.18-8.el5)
MSR: 000000000000D032 <EE,PR,ME,IR,DR>  CR: 40000482  XER: 00000000
DAR: 00000000F7FAEE50, DSISR: 000000000A000000
TASK = c00000003a565ae0[21750] 'a' THREAD: c00000003cf68000 CPU: 1
GPR00: FFFFFFFFFFF85EC0 00000000F8028F90 000000000FFF9710 0000000000000008
GPR04: 00000000F8026948 0000000000000008 0000000000000000 0000000000000000
GPR08: 0000000000008000 0000000000000003 0000000000000000 0000000010010000
GPR12: 00000000F8028F90 0000000010018A7C 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 000000000FFCEB40 00000000F837FAE0 00000000F837FAF4 0000000000000001
GPR28: 0000000000000000 000000000FFEF6D8 00000000F7FCFFF4 00000000F8028F90
NIP [00000000100004C0] 0x100004c0
LR [0000000010000534] 0x10000534



Comment 18 Don Zickus 2007-05-11 22:05:53 UTC
in 2.6.18-19.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 21 errata-xmlrpc 2007-11-07 19:43:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html