Bug 126297

Summary: Segmentation fault when stack size is less than 2Mbytes
Product: Red Hat Enterprise Linux 3 Reporter: Hui Huang <hui.huang>
Component: kernelAssignee: Jim Paradis <jparadis>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: darin242, hongjiu.lu, lwoodman, peterm, petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-20 20:55:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
A patch against 2.4.21-18.EL none

Description Hui Huang 2004-06-18 18:06:50 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.3) Gecko/20030314

Description of problem:
Same problem as 104688, frequent crash when stack size is smaller
than 2Mbytes. It's been fixed for x86, but now reappeared on x86_64.

bash-2.05b$ ulimit -s 512
bash-2.05b$ ls
Segmentation fault

This happens in RHEL3-U2 (kernel-2.4.21-15.EL), kernel-2.4.21-9 
works fine.


Version-Release number of selected component (if applicable):
kernel-2.4.21-15.EL

How reproducible:
Always

Steps to Reproduce:
1. ulimit -s 512
2. ls
3.
    

Actual Results:  Segmentation fault

Expected Results:  no crash

Additional info:

Comment 1 Rik van Riel 2004-06-18 18:20:36 UTC
Why do you think this is a kernel bug ?

Could you please try to strace and/or ltrace the ls call to find out
exactly what it's doing that needs more than 512 kB of stack ?

Comment 2 Rik van Riel 2004-06-18 18:24:14 UTC
Ummm n/m, I missed the part where you said that kernel-2.4.21-9 works
fine, but kernel-2.4.21-15.EL is broken...

Jim, any ideas ?

Comment 3 Hui Huang 2004-06-18 18:42:37 UTC
ls does not need more than 512kB stack. It appears to me 2.4.21-15.EL
now randomizes the initial SP. However, it is being set way too low
in the primordial thread that an app is started below or near its 
stack limit.


Comment 4 Jim Paradis 2004-06-18 18:47:46 UTC
A quick investigation suggests that something in the kernel exec path
might be at fault.  Doing a strace -f of a bash session yields, in
part, the following:

...
30106 rt_sigaction(SIGTERM, {SIG_DFL}, {SIG_IGN}, 8) = 0
30106 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x42fb50, [], 0x4000000}, 8) = 0
30106 execve("/bin/ls", ["ls", "--color=tty"], [/* 28 vars */]) = 0
30106 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
...

Your stack-start theory has some merit; will check it out.




Comment 5 Jim Paradis 2004-06-29 22:39:58 UTC
On x86_64, the stack-randomization algorithm can conceivably shift the
stack base by as much as 1M (64K * 16):

    unsigned long arch_align_stack(unsigned long sp)
    {
        return sp - ((get_random_int() % 65536) << 4);
    }

On top of that, setup_arg_pages() allocates 128K on the stack for
holding command-line arguments and environment strings.

If we're going to mess with the stack base like this, we'd best find a
way to have it not count against the user's stack rlimit.  I'll look
into this.


Comment 6 Ernie Petrides 2004-08-03 22:30:04 UTC
*** Bug 128892 has been marked as a duplicate of this bug. ***

Comment 7 H.J. Lu 2004-08-04 06:24:19 UTC
Created attachment 102414 [details]
A patch against 2.4.21-18.EL

This is a patch against 2.4.21-18.EL, backported from 2.6.7-1.503.

Comment 8 Hui Huang 2004-08-28 00:05:20 UTC
How do I figure out stack size with this proposed patch?

I need the stack size to properly set up guard page so Java
VM can detect and throw StackOverflowError. It used to be a 
simple getrlimit() call, and I would find out stack top from
/proc/self/stat, align it using /proc/self/maps, and then put 
guard page at stack_top - getrlimit_result. 

Now kernel has this 2M EXEC_STACK_BIAS, the actual stack size 
is 2M + getrlimit. But I can't use it as stack size, because 
it's a property hidden to the kernel, and if I run on kernels 
where the stack limit is still determined by getrlimit I could 
crash the app by setting up guard page 2M below the actual limit. 
If I stick to getrlimit result and ignore EXEC_STACK_BIAS, then 
I would put stack guard too high that I'll run out of stack 
space (or even crash) very early.

Why can't arch_align_stack() use rlimit to decide how far it
can randomize the stack pointer? Something like:

sp - min(get_random_int() % (rlim.rlim_cur >> 7), 65536) << 4

If I choose to use small stack so the address space could be saved
for heap or other stuff, it doesn't seem right for kernel to
still randomize as if there's no limit.


Comment 9 Ernie Petrides 2004-08-31 04:18:16 UTC
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.1.EL).


Comment 10 John Flanagan 2004-12-20 20:55:21 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html


Comment 11 Ernie Petrides 2005-01-05 21:03:14 UTC
*** Bug 144299 has been marked as a duplicate of this bug. ***