Bug 251043
Summary: | [RHEL5 U1] [ia64] Kernel test failing under limited memory | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Burke <jburke> |
Component: | kernel | Assignee: | Aron Griffis <agriffis> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | medium | Docs Contact: | |
Priority: | urgent | ||
Version: | 5.1 | CC: | dchapman, ddomingo, dwalsh, dzickus, lwang, mgahagan, prarit, sgrubb, tyamamot, yuchen |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | ia64 | ||
OS: | Linux | ||
URL: | http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/kernel/misc/ktst_msg&package_filter=kernel-xen&package_arch=ia64&package_version=2.6.18&package_release=38.el5&type=KernelTier1&type=KernelTier2&result=Fail&all_packages=0 | ||
Whiteboard: | GSSApproved | ||
Fixed In Version: | RHBA-2008-0314 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-05-21 14:48:45 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 222082, 412091, 425461 |
Description
Jeff Burke
2007-08-06 18:16:26 UTC
I have done the investigation. [Summary] This is not a specific issue of ia64 xen guest, but ia64 generic. I think there are possible three ways to resolve this issue: 1) Increase the memory size 2) Create a swap device whose size is more than 2GB 3) Run 'ulimit -Hs unlimited' before the program running [Details] Why it occurred only on ia64 xen guest is because the guest met all the three conditions: 1) the memory size for the guset was 512MB (Per my quick test, about 700MB or less is required.) 2) the swap size for the guest was less than 2GB 3) the default "Hard limit" for processes on the guest was 2097152k (not 'unlimited') (If the three conditions are met, the failures occur on other arches.) The first two can be applicable to all arches, but the third one is ia64 specific (applies to linux, dom0 and guest.) The default value of "Hard limit" for processes is somehow 2097152k on ia64. ('unlimited' is set for other arches.) So, in that sense, this can be considered as an ia64 generic thing. When a 'make' command is invoked, it sets the same value of "Hard limit" to its "Soft limit" (maybe that's the spec of make command) and that causes the failures. pthread_create() tries to allocate (mmap) memory of the same size of "Soft limit" and there is not enough so... However, it seems that the allocation completes successfully if there is more than 2GB swap, which larger than the "Hard limit" size. But, I have not had any clues on this swap thing yet. Thanks to Kei for the comprehensive investigation. Looking at kernel.org, it appears that the default stack hard limit was removed earlier this year: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d826393cdebe340b3716002bfb1298ab19b57e83 This patch brings ia64 into alignment with x86 and x86_64, neither of which have hard stack limits by default. Also Debian stable runs 2.6.18 with this patch, so it appears to work as expected on the 2.6.18 kernel (we're using Debian stable on our lab server at HP) I'm building kernels tonight with this modification and will make them available for testing in the morning. Test kernels available at http://free.linux.hp.com/~agriffis/rhel5/bz251043/ I posted the patch for review on rhkernel-list. Requested help in that message getting this submitted to RHTS since I haven't done that before. When that's done, I'll change the status to POST With help from Kei, I was able to run the RHTS test on dom0 on my rx6600. 2.6.18-45.el5xen (unpatched) -- FAIL 2.6.18-47.el5.bz251043.agriffis.1xen (patched) -- PASS This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in 2.6.18-58.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html fv_* tests all failed on 2.6.18-227.el5xen. OS: RHEL5.6-20101014.0 Kernel: 2.6.18-227.el5xen V7: 1.2-25.el5 #virsh console v7ia64 Output for fv_core testing: ---------------------------- ... +----------ia64 CPU info end----------+ Checking clock jitter ... Single CPU detected. No clock jitter testing necessary. clock direction test: start time 1288605280, stop time 1288605340, sleeptime 60, delta 0 PASSED audispd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0 Call Trace: [<a000000100013ba0>] show_stack+0x40/0xa0 sp=e00000000206f5c0 bsp=e000000002069468 [<a000000100013c30>] dump_stack+0x30/0x60 sp=e00000000206f790 bsp=e000000002069450 [<a000000100113a50>] out_of_memory+0xf0/0x780 sp=e00000000206f790 bsp=e000000002069418 [<a00000010011a2c0>] __alloc_pages+0x420/0x540 sp=e00000000206f820 bsp=e0000000020693a0 [<a000000100152290>] alloc_page_vma+0x150/0x180 sp=e00000000206f830 bsp=e000000002069368 [<a000000100145f10>] read_swap_cache_async+0x70/0x220 sp=e00000000206f830 bsp=e000000002069320 [<a00000010012cfa0>] swapin_readahead+0xa0/0x240 sp=e00000000206f830 bsp=e0000000020692d8 [<a0000001001331a0>] __handle_mm_fault+0x1400/0x1d00 sp=e00000000206f840 bsp=e000000002069260 [<a000000100652e20>] ia64_do_page_fault+0x240/0xa40 sp=e00000000206f850 bsp=e000000002069210 [<a00000010000c040>] __ia64_leave_kernel+0x0/0x280 sp=e00000000206f900 bsp=e000000002069210 [<a0000001001a1e30>] do_sys_poll+0x590/0x740 sp=e00000000206fad0 bsp=e000000002069180 [<a0000001001a2680>] sys_poll+0x80/0xc0 sp=e00000000206fe20 bsp=e000000002069128 [<a00000010000bea0>] ia64_ret_from_syscall+0x0/0x40 sp=e00000000206fe30 bsp=e000000002069128 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e000000002070000 bsp=e000000002069128 ... Swap cache: add 45947, delete 45947, find 158/264, race 0+0 Free swap = 0kB Total swap = 720864kB Out of memory: Killed process 2278 (stress). stress: page allocation failure. order:0, mode:0x280d2 ... Output for fv_memory testing: ---------------------------------- ... Starting Threaded Memory Test running for more than free memory at 195 MB for 60 sec. automount invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [<a000000100013ba0>] show_stack+0x40/0xa0 sp=e00000000331fa70 bsp=e000000003319500 [<a000000100013c30>] dump_stack+0x30/0x60 sp=e00000000331fc40 bsp=e0000000033194e8 [<a000000100113a50>] out_of_memory+0xf0/0x780 sp=e00000000331fc40 bsp=e0000000033194b0 [<a00000010011a2c0>] __alloc_pages+0x420/0x540 sp=e00000000331fcd0 bsp=e000000003319440 [<a000000100152110>] alloc_pages_current+0x170/0x1a0 sp=e00000000331fce0 bsp=e000000003319410 [<a00000010010b7c0>] page_cache_alloc_cold+0x1a0/0x1c0 sp=e00000000331fce0 bsp=e0000000033193e8 [<a00000010011dec0>] __do_page_cache_readahead+0x120/0x460 sp=e00000000331fce0 bsp=e000000003319388 [<a00000010011eb00>] do_page_cache_readahead+0xe0/0x120 sp=e00000000331fd70 bsp=e000000003319350 [<a000000100111820>] filemap_nopage+0x280/0x7c0 sp=e00000000331fd70 bsp=e0000000033192e8 [<a000000100132170>] __handle_mm_fault+0x3d0/0x1d00 sp=e00000000331fd70 bsp=e000000003319270 [<a000000100652e20>] ia64_do_page_fault+0x240/0xa40 sp=e00000000331fd80 bsp=e000000003319220 [<a00000010000c040>] __ia64_leave_kernel+0x0/0x280 sp=e00000000331fe30 bsp=e000000003319220 ... Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 0kB Total swap = 0kB Out of memory: Killed process 2270 (threaded_memtes). done. ... |