Bug 126297
Summary: | Segmentation fault when stack size is less than 2Mbytes | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Hui Huang <hui.huang> | ||||
Component: | kernel | Assignee: | Jim Paradis <jparadis> | ||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | darin242, hongjiu.lu, lwoodman, peterm, petrides, riel, tao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-12-20 20:55:21 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Hui Huang
2004-06-18 18:06:50 UTC
Why do you think this is a kernel bug ? Could you please try to strace and/or ltrace the ls call to find out exactly what it's doing that needs more than 512 kB of stack ? Ummm n/m, I missed the part where you said that kernel-2.4.21-9 works fine, but kernel-2.4.21-15.EL is broken... Jim, any ideas ? ls does not need more than 512kB stack. It appears to me 2.4.21-15.EL now randomizes the initial SP. However, it is being set way too low in the primordial thread that an app is started below or near its stack limit. A quick investigation suggests that something in the kernel exec path might be at fault. Doing a strace -f of a bash session yields, in part, the following: ... 30106 rt_sigaction(SIGTERM, {SIG_DFL}, {SIG_IGN}, 8) = 0 30106 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x42fb50, [], 0x4000000}, 8) = 0 30106 execve("/bin/ls", ["ls", "--color=tty"], [/* 28 vars */]) = 0 30106 --- SIGSEGV (Segmentation fault) @ 0 (0) --- ... Your stack-start theory has some merit; will check it out. On x86_64, the stack-randomization algorithm can conceivably shift the stack base by as much as 1M (64K * 16): unsigned long arch_align_stack(unsigned long sp) { return sp - ((get_random_int() % 65536) << 4); } On top of that, setup_arg_pages() allocates 128K on the stack for holding command-line arguments and environment strings. If we're going to mess with the stack base like this, we'd best find a way to have it not count against the user's stack rlimit. I'll look into this. *** Bug 128892 has been marked as a duplicate of this bug. *** Created attachment 102414 [details]
A patch against 2.4.21-18.EL
This is a patch against 2.4.21-18.EL, backported from 2.6.7-1.503.
How do I figure out stack size with this proposed patch? I need the stack size to properly set up guard page so Java VM can detect and throw StackOverflowError. It used to be a simple getrlimit() call, and I would find out stack top from /proc/self/stat, align it using /proc/self/maps, and then put guard page at stack_top - getrlimit_result. Now kernel has this 2M EXEC_STACK_BIAS, the actual stack size is 2M + getrlimit. But I can't use it as stack size, because it's a property hidden to the kernel, and if I run on kernels where the stack limit is still determined by getrlimit I could crash the app by setting up guard page 2M below the actual limit. If I stick to getrlimit result and ignore EXEC_STACK_BIAS, then I would put stack guard too high that I'll run out of stack space (or even crash) very early. Why can't arch_align_stack() use rlimit to decide how far it can randomize the stack pointer? Something like: sp - min(get_random_int() % (rlim.rlim_cur >> 7), 65536) << 4 If I choose to use small stack so the address space could be saved for heap or other stuff, it doesn't seem right for kernel to still randomize as if there's no limit. A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.1.EL). An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html *** Bug 144299 has been marked as a duplicate of this bug. *** |