Description of problem: ----------------------- During a compilation of OpenAFS, I see memory allocation errors some way through the compile. Depending on how the allocation failed, subsequent invocations of make will fail, with most of the applications complaining of insufficient memory Version-Release number of selected component (if applicable): ------------------------------------------------------------- Linux saias83 2.4.21-3.EL #1 SMP Fri Sep 19 13:59:46 EDT 2003 ia64 ia64 ia64 GNU/Linux How reproducible: ----------------- Set memory overcommit to strict Compile something (like OpenAFS) Steps to Reproduce: ------------------- 1. sysctl -w vm.overcommit_memory=2 2. ./configure --with-afs-sysname=ia64_linux24 --enable-transarc-paths --with-linux-kernel-headers=/usr/src/linux-2.4.21-3.EL 3. make Actual results: --------------- bash: fork: Cannot allocate memory Expected results: ----------------- creating cache ./config.cache checking for a BSD compatible install... /usr/bin/install -c checking whether build environment is sane... yes ... Additional info: ---------------- It seems that this was reported and fixed on i386 and x86_64 platforms. I looked at BugZilla bugs: 106010, 98413 and 104172. To avoid this, we currently set vm.overcommit_memory to 0, but strict overcommit is what we would like to use.
I forgot to add that the system had plenty of memory free, and no swap had been used yet. It has 8GB of RAM and 2GB of swap, and I think that 6GB were still free.
What is the overcommit percentage you configured in /proc ?
(Note that this is after a reboot) saias83 /usr/src/redhat/SOURCES/openafs-1.2.10# sysctl -a | grep overc vm.overcommit_ratio = 90 vm.overcommit_memory = 2 saias83 /usr/src/redhat/SOURCES/openafs-1.2.10# sysctl -w vm.overcommit_memory=0 vm.overcommit_memory = 0 saias83 /ms/dev/openafs/core/1.2.10-1/src# cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 8498495488 1002586112 7495909376 0 85590016 410075136 Swap: 2089172992 0 2089172992 MemTotal: 8299312 kB MemFree: 7320224 kB MemShared: 0 kB Buffers: 83584 kB Cached: 400464 kB SwapCached: 0 kB Active: 416032 kB ActiveAnon: 226016 kB ActiveCache: 190016 kB Inact_dirty: 294432 kB Inact_laundry: 0 kB Inact_clean: 0 kB Inact_target: 142080 kB HighTotal: 6290720 kB HighFree: 5629488 kB LowTotal: 2008592 kB LowFree: 1690736 kB SwapTotal: 2040208 kB SwapFree: 2040208 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 262144 kB
This behavior still appears to be exhibited in the release kernel.
...and in the Update 1 beta kernel for both x86_64 and ia64 as well. Any progress being made on this?
And still in Update 1 release kernel, in case anyone was wondering.
Sorry for the long delay on this ussue. When overcommit_memory=2 the virtual memory allocations will fail if the size of all allocations is greater than 90% of physical memory + swap space. It is highly possible that the the total allocations has exceeded 9GB which would ligitimately fail in this case. Please attach a "ps aux" output for starters when the allocation failures occur so we can try to sum up all of the anonymous memory allocated so far. Also, If I spin up a new test kernel that prints out the statistics when the allocations fail will you give it a test run? Larry
Will try to replicate and record it. Also, I can certainly run and have the client run a test kernel.
Here's another amd64 machine having the same problem. Right after the failure, I did a "cat /proc/meminfo" total: used: free: shared: buffers: cached: Mem: 5666713600 5282353152 384360448 0 380510208 4268195840 Swap: 2146787328 0 2146787328 MemTotal: 5533900 kB MemFree: 375352 kB MemShared: 0 kB Buffers: 371592 kB Cached: 4168160 kB SwapCached: 0 kB Active: 1708980 kB ActiveAnon: 38592 kB ActiveCache: 1670388 kB Inact_dirty: 2734296 kB Inact_laundry: 73256 kB Inact_clean: 74296 kB Inact_target: 918164 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 5533900 kB LowFree: 375352 kB SwapTotal: 2096472 kB SwapFree: 2096472 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB And I'll attach the ps output.
Created attachment 97885 [details] ps output of machine complaining that it can't allocate any more memory
This problem has been found and a fix is pending. This is an IA64 only problem, ia64_brk() was accounting for the memory allocation twice but the unmap logic only considers it once, thereby causing a global virtual address space leak. I'll make the U2 kernel with the fix available once it has been officially built. Larry
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-188.html
Why is this still open? The problem was fixed long ago, it was due to ia64_brk() calling vm_enough_memory() before calling do_brk() which ended up making 2 calls to vm_enough_memory which double accounted for the virtual address space. I ficed the problem by removing the call to vm_enough_memory() from ia6_brk(). Larry Woodman
Created attachment 110887 [details] Test program
it's broken in x86_64 as well - always was and was never fixed.
I ran the pig.C program on an x86_64 system running RHEL3-U5 and I didnt get the BUG that was reported. Also, I can not reproduce the overcommit_memory problem on either ia64 or x86_64 any more. Can someone help me find a reproducer on the RHEL3-U5 kernel if it is still broken? Thanks, Larry Woodman
Sorry about the confusion. You are correct the system I tested on was a RHEL3 U5. It was the x86_64 the kernel arch is the ia32e (x86_64 RHEL3). The hardware platform is a Dell PowerEdge 1800 with 4 gig of ram and Dual 3.6GHz Xeon processors. I believe the following is true "ia32e kernel stops allocating memory too early when overcommit_memory set to strict" I have a system here that Larry W will be able to use to look at the issue. Larry W is presenting at the Red Hat Summit. He will be back next week.
*** Bug 159330 has been marked as a duplicate of this bug. ***
This bug was against ia64 and was resolved in U2. Bug 159330 is against x86_64 and is the result of some different problem. So, I'm reclosing this one and undoing the dependency.