The following has be reported by IBM LTC: Large external array causes SIGILL in 32-bitPlease fill in each of the sections below. Hardware Environment: 4-way POWER4 Software Environment RHEL3 Steps to Reproduce: /home/xxue/work> cat bigdata.c int x[3000][3000][30]; int main() { return 10; } /home/xxue/work> gcc bigdata.c /home/xxue/work> a.out Illegal instruction (core dumped) /home/xxue/work> /opt/cross/bin/powerpc64-linux-gcc bigdata.c /home/xxue/work> a.out Actual Results: SIGILL Expected Results: clean pass Additional Information: The test cases passes in 64-bit mode. Hardware Environment: Software Environment: This bug report was opened specifically to submit the patch included in LTC Bug 1631 to Red Hat. Please refer to LTC Bug 1631 for more details.Here's the patch for RHEL3 (kernel 2.4.21-4.EL): --- linux-2.4.21-4.EL/fs/binfmt_elf.c 2003-11-11 17:45:13.000000000 +0530 +++ linux-2.4.21-4.EL.patched/fs/binfmt_elf.c 2003-11-11 17:46:49.000000000 +0530 @@ -80,13 +80,14 @@ #define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE) -static void set_brk(unsigned long start, unsigned long end) +static unsigned set_brk(unsigned long start, unsigned long end) { + int retval=0; start = ELF_PAGEALIGN(start); end = ELF_PAGEALIGN(end); if (end <= start) return; - do_brk(start, end - start); + retval = do_brk(start, end - start); } @@ -367,7 +368,7 @@ unsigned long text_data, elf_entry = ~0UL; char * addr; loff_t offset; - + int retval; current->mm->end_code = interp_ex->a_text; text_data = interp_ex->a_text + interp_ex->a_data; current->mm->end_data = text_data; @@ -387,7 +388,9 @@ goto out; } - do_brk(0, text_data); + retval = do_brk(0, text_data); + if(BAD_ADDR(retval)) + goto out; if (!interpreter->f_op || !interpreter->f_op->read) goto out; if (interpreter->f_op->read(interpreter, addr, text_data, &offset) < 0) @@ -395,8 +398,10 @@ flush_icache_range((unsigned long)addr, (unsigned long)addr + text_data); - do_brk(ELF_PAGESTART(text_data + ELF_MIN_ALIGN - 1), + retval = do_brk(ELF_PAGESTART(text_data + ELF_MIN_ALIGN - 1), interp_ex->a_bss); + if(BAD_ADDR(retval)) + goto out; elf_entry = interp_ex->a_entry; out: @@ -720,7 +725,26 @@ end_code += load_bias; start_data += load_bias; end_data += load_bias; + + current->mm->end_code = end_code; + current->mm->start_code = start_code; + current->mm->start_data = start_data; + current->mm->end_data = end_data; + current->mm->start_stack = bprm->p; + + /* Calling set_brk effectively mmaps the pages that we need + * for the bss and break sections. + * We must do this now before we load the interpreter + * in case we have a huge bss region. + */ + retval = set_brk(elf_bss, elf_brk); + if(retval < 0) { + printk(KERN_ERR "unable to map bss section\n"); + send_sig(SIGSEGV, current, 0); + goto out_free_dentry; + } + padzero(elf_bss); if (elf_interpreter) { if (interpreter_type == INTERPRETER_AOUT) elf_entry = load_aout_interp(&interp_ex, @@ -764,20 +788,7 @@ /* N.B. passed_fileno might not be initialized? */ if (interpreter_type == INTERPRETER_AOUT) current->mm->arg_start += strlen(passed_fileno) + 1; - current->mm->start_brk = current->mm->brk = elf_brk; - current->mm->end_code = end_code; - current->mm->start_code = start_code; - current->mm->start_data = start_data; - current->mm->end_data = end_data; - current->mm->start_stack = bprm->p; - - /* Calling set_brk effectively mmaps the pages that we need - * for the bss and break sections - */ - set_brk(elf_bss, elf_brk); - - padzero(elf_bss); - + #if 0 printk("(start_brk) %lx\n" , (long) current->mm->start_brk); printk("(end_code) %lx\n" , (long) current->mm->end_code); Glen/Greg - please submit the patch above to Red Hat. Thanks.
------ Additional Comments From salina.com 2003-20-11 11:58 ------- The original problem 1631 has now become a ship issue, do we need to raise the severity of this problem to RedHat too ?
In reviewing the patch, I was wondering how this could ever work correctly. The first hunk changes set_brk() to now supposedly return a value. However, a value is never returned by the function. Still, the caller does check for a return value so who knows what data it is using. Please give an exact problem description and how this patch purports to solve it. Thanks.
Created attachment 96214 [details] Patch to handle programs w/ large bss sections Modified patch supplied by Roland for this problem. I have tested it here, after first reproducing the problem. This patch fixes the problem and is the least invasive fix. Would like IBM to also test with it and approve it if they are happy with it.
------ Additional Comments From srikrishnan.com 2003-26-11 13:50 ------- opps....I had attached a previous/incomplete version of the patch.. The bug seems to be that the bss area of the app is so large, that it overlaps where the kernel loaded ld.so, overwritting (clearing) the early ld.so code which leads to a SIGILL when the kernel hands control over to it. The problem is that we don't reserve the bss region for the app (via a set_brk/do_brk call) until after we've loaded the loader so they get mapped to overlapping memory locations. The fix is to move the update to current->mm.* and the set_brk/do_brk call to before the point we call load_elf_interp(). Changing set_brk to return unsigned long instead of void, was to check if do_brk succeeds and take appropriate action on failure. (Which in our case is to print a message (Unable to map bss region) and terminate with a induced Segmentation fault signal. ) Here's the patch: --- linux-2.4.21-4.EL/fs/binfmt_elf.c 2003-11-21 08:55:51.000000000 +0530 +++ linux-2.4.21-4.EL.patched/fs/binfmt_elf.c 2003-11-21 08:55:08.000000000 +0530 @@ -80,13 +80,13 @@ #define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE) -static void set_brk(unsigned long start, unsigned long end) +static unsigned set_brk(unsigned long start, unsigned long end) { start = ELF_PAGEALIGN(start); end = ELF_PAGEALIGN(end); if (end <= start) - return; - do_brk(start, end - start); + return 0; + return do_brk(start, end - start); } @@ -367,7 +367,7 @@ unsigned long text_data, elf_entry = ~0UL; char * addr; loff_t offset; - + int retval; current->mm->end_code = interp_ex->a_text; text_data = interp_ex->a_text + interp_ex->a_data; current->mm->end_data = text_data; @@ -387,7 +387,9 @@ goto out; } - do_brk(0, text_data); + retval = do_brk(0, text_data); + if(BAD_ADDR(retval)) + goto out; if (!interpreter->f_op || !interpreter->f_op->read) goto out; if (interpreter->f_op->read(interpreter, addr, text_data, &offset) < 0) @@ -395,8 +397,10 @@ flush_icache_range((uns igned long)addr, (unsigned long)addr + text_data); - do_brk(ELF_PAGESTART(text_data + ELF_MIN_ALIGN - 1), + retval = do_brk(ELF_PAGESTART(text_data + ELF_MIN_ALIGN - 1), interp_ex->a_bss); + if(BAD_ADDR(retval)) + goto out; elf_entry = interp_ex->a_entry; out: @@ -720,7 +724,26 @@ end_code += load_bias; start_data += load_bias; end_data += load_bias; + + current->mm->end_code = end_code; + current->mm->start_code = start_code; + current->mm->start_data = start_data; + current->mm->end_data = end_data; + current->mm->start_stack = bprm->p; + + /* Calling set_brk effectively mmaps the pages that we need + * for the bss and break sections. + * We must do this now before we load the interpreter + * in case we have a huge bss region. + */ + retval = set_brk(elf_bss, elf_brk); + if(retval < 0) { + printk(KERN_ERR "unable to map bss section "); + send_sig(SIGSEGV, current, 0); + goto out_free_dentry; + } + padzero(elf_ bss); if (elf_interpreter) { if (interpreter_type == INTERPRETER_AOUT) elf_entry = load_aout_interp(&interp_ex, @@ -764,20 +787,7 @@ /* N.B. passed_fileno might not be initialized? */ if (interpreter_type == INTERPRETER_AOUT) current->mm->arg_start += strlen(passed_fileno) + 1; - current->mm->start_brk = current->mm->brk = elf_brk; - current->mm->end_code = end_code; - current->mm->start_code = start_code; - current->mm->start_data = start_data; - current->mm->end_data = end_data; - current->mm->start_stack = bprm->p; - - /* Calling set_brk effectively mmaps the pages that we need - * for the bss and break sections - */ - set_brk(elf_bss, elf_brk); - - padzero(elf_bss); - + #if 0 printk("(start_brk) %lx " , (long) current->mm->start_brk); printk("(end_code) %lx " , (long) current->mm->end_code);
Can you try the patch supplied by Red Hat to see if it works for you? Thanks
------ Additional Comments From srikrishnan.com 2003-26-11 14:33 ------- The latest update to the corresponding RH bugzilla bug, has a patch by Roland which is a shorter modified version of the current patch. While it _works_ for the given test case, what is happenning is that set_brk fails, Hence, no memory is actually allocated for the array. If we assign values to the array, we'll get segmentation fault. We can verify by declaring an external array: x[4000][4000][2] assigning to x[0][0][0] and x[3999][3999][1] gives a seg fault. We can verify the behavior by looking at /proc/<pid>/maps So the the crux of the problem is check if set_brk succeeds and also avoid overwriting ld.so by the aloocation to bss. I'll explore if we can get s smaller patch.
------ Additional Comments From srikrishnan.com 2003-26-11 14:45 ------- A correction to my previous post: We can verify by declaring an external array: x[4000][4000][30]. Assigning values to x[0][0][0] and x[3999][3999][29] gives a seg fault. ( The previous values didn't exceed the limit and hence would run fine. )
Created attachment 96261 [details] test patch RHEL3 test patch The bug seems to be that the bss area of the app is so large, that it overlaps where the kernel loaded ld.so, overwritting (clearing) the early ld.so code which leads to a SIGILL when the kernel hands control over to it. The problem is that we don't reserve the bss region for the app (via a set_brk/do_brk call) until after we've loaded the loader so they get mapped to overlapping memory locations. The fix suggested is to move the update to current->mm.* and the set_brk/do_brk call to before the point we call load_elf_interp(). Changing set_brk to return unsigned long instead of void, was to check if do_brk succeeds and take appropriate action on failure. (Which in our case is to print a message (Unable to map bss region) and terminate with a induced Segmentation fault signal. ) (Sorry for the formatting problem with the previous update. Hope the attached patch would be readable)
------ Additional Comments From srikrishnan.com 2003-02-12 07:44 ------- I tested Roland's patch. The test case succeeds. However, the test case makes an attempt to only allocate a static array. It does not use the array by assigning values. Upon closer investigation, I found that set_brk *does not* allocate all the memory requested. We can verify by having a look at /proc/<pid>/maps while running the testcase with a smaller sized external array and repeat by running the testcase with aarger sized external array like x[4000][4000][30]. This can be further verified by assigning values to the array (Especially the first and last location). Assigning values to the array gives a Segmentation Fault, indicating as though we are accessing array index out of bounds. Solution: We can made a modification to Roland's patch, which is to check if the return value of set_brk is a BAD_ADDR. If so, we need to print an appropriate message, send a SIGSEGV signal and call the clean up (goto out_free_dentry). Apart from this change, set_brk must be made to return unsigned long as return value (what ever do_brk returns). I'll post this patch shortly.
Created attachment 96399 [details] rhel3final.patch
------ Additional Comments From srikrishnan.com 2003-08-12 05:08 ------- Test Patch for RHEL3 Please apply this patch and let us know your review comments.
I tried the patch "rhel3final.patch" along with the following test program: #define SIZE 4100 volatile int x[SIZE][SIZE][30]; main() { int i, j; printf("%p..%p\n", x, &x[SIZE][SIZE][30]); system("cat /proc/$PPID/maps"); for (i=0; i<SIZE; i++) for (j=0; j<SIZE; j++) x[i][j][29] = i*j; printf("x[first] = %d, x[last] = %d\n",x[0][0][0], x[SIZE-1][SIZE-1][29]); } I built this program as a 32-bit elf program and ran it. The program ran to completion, much to my amazement. Here is the output: ./bigdata 0x10010b34..0x8844950c 000000000fe9b000-000000000ffd9000 r-xp 0000000000000000 08:03 6832132 /lib/tls/libc-2.3.2.so 000000000ffd9000-000000000ffdb000 ---p 000000000013e000 08:03 6832132 /lib/tls/libc-2.3.2.so 000000000ffdb000-000000000ffed000 rwxp 0000000000130000 08:03 6832132 /lib/tls/libc-2.3.2.so 000000000ffed000-000000000fff0000 rwxp 0000000000000000 00:00 0 0000000010000000-0000000010001000 r-xp 0000000000000000 00:0b 10551348 /home/boston/jdewand/test/bigdata 0000000010010000-0000000010011000 rwxp 0000000000000000 00:0b 10551348 /home/boston/jdewand/test/bigdata 0000000010011000-00000000883d2000 rwxp 0000000000000000 00:00 0 00000000883d2000-00000000883e9000 r-xp 0000000000000000 08:03 524292 /lib/ld-2.3.2.so 00000000883e9000-00000000883eb000 rw-p 0000000000000000 00:00 0 00000000883f8000-00000000883fa000 rwxp 0000000000016000 08:03 524292 /lib/ld-2.3.2.so 00000000ffffd000-00000000fffff000 rwxp fffffffffffff000 00:00 0 x[first] = 0, x[last] = 16801801 Notice that the huge array overlaps the shared libraries. How is this possible?
------ Additional Comments From srikrishnan.com 2003-10-12 06:25 ------- Are you trying out on a IA-64 machine? I too got similar results on IA64/RHEL3 using the stock kernel(Without applying my patch). The problem was reported on PPC64 and the patch addresses it although code changes are in arch-independent part of the kernel tree. Please test it on PPC64.
The results I reported were obtained from a pSeries 630 using the RHEL 3 Update 1 candidate. I compiled the app as a 32-bit app: # file bigdata bigdata: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped
------ Additional Comments From srikrishnan.com 2003-12-11 10:33 ------- Strange. I'll check if any changes have gone into RHEL 3 Update 1 candidate which might be responsible for this observation. (I'm sure you might have observed the problem on RHEL3 - stock kernel).
------ Additional Comments From srikrishnan.com 2003-12-15 09:34 ------- In case "Update 1 candidate" comes with 2.4.21-4.0.1.EL kernel, I could not locate the source rpm for ppc/ppc64 at rhn.redhat.com (I found found for x86, ia-64, x86-64). Can some one point me to where I could find that?
------ Additional Comments From srikrishnan.com 2003-12-17 09:07 ------- I tried with 2.4.21-6.EL kernel without making any changes. I get a SIGILL for values of 'SIZE' ranging from 2900 to 3490. I got a SEGFAULT for values of 'SIZE' from 3495 to 4200. There has been no changes in the fs/binfmt_elf.c file compared to 2.4.21-4.EL When I do a "file bigdata" I got the same values: # file bigdata bigdata: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped. [root@plinux root]# uname -a Linux plinux 2.4.21-6.EL #2 SMP Wed Dec 17 16:49:33 IST 2003 ppc64 ppc64 ppc64 GNU/Linux I posted the patch for 2.6 kernel to LKML. Andrew Morton got back with a request to review a similar patch submitted by Roland which is a modified version of the previous patch posted here (Checking for return value of set_brk etc) ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0- test10/2.6.0-test10-mm1/broken-out/fix-ELF-exec-with-huge-bss.patch
I was planning it for Update 2. I tried both the original patch from IBM and Roland's patch. I had a different result than expected and wanted to spend some time trying to see what the deal was. But I've been busy with other issues in the meantime so didn't get to this in time for U1.
------ Additional Comments From khoa.com 2004-01-09 12:18 ------- We push this bug to RHEL3 Update 2 list.
Have been looking into this problem. There are additional problems in the same vein as the original complaint which are currently being addressed. The supplied patch only fixes some of the issues. I expect to post an updated patch by week's end to this bugzilla for verification from IBM. Stay tuned....
Julie, can you make the patch available to IBM for test also at the end of the week ?
Once we have a patch agreed upon between Roland and myself, I will post it to this bugzilla for IBM to verify. We are currently working out the details but are close to the final version.
Created attachment 97055 [details] Proposed patch for this and other binfmt_elf issues The attached patch is one that I plan to propose to our kernel team here. I'd like to ask IBM to test it and review it as well. It adds code to deal with several different issues: (1) a program with a bss that is too large for a 32-bit application (2) a program that fits but then allows insufficient room for the elf interpreter (3) a large program plus an elf interpreter that appears to fit however the stack starts in the middle of the interpreter's bss segment.
----- Additional Comments From khoa.com 2004-01-16 11:07 ------- Xing - please test the patch above from Red Hat. Thanks.
----- Additional Comments From khoa.com 2004-01-16 11:08 ------- The patch above is Red Hat's fix for LTC Bug 1631. Thanks.
----- Additional Comments From xingxue.com 2004-01-16 11:41 ------- I am not familiar with building the kernel. Can I ask someone at LTC to test it, please? The test case can be found in bug 1631. Thanks.
----- Additional Comments From khoa.com 2004-02-02 01:59 ------- Sachin - can you test the patch from Red Hat ? Thanks in advance.
----- Additional Comments From ssant.com 2004-02-03 23:14 ------- khoa , i applied the patch and rebuilt the kernel. I ran the testcase from bug #1631 and the program ran to completion without any problem.
This patch was committed to our tree on Feb 9th and will appear in Update 2.
----- Additional Comments From khoa.com 2004-03-28 17:26 ------- Red Hat confirmed that this fix would be in RHEL3 U2. Please re-open this bug report if that is not the case. Thanks.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-188.html