Bug 106503
| Summary: | ia64 kernel stops allocating memory too early when overcommit_memory set to strict | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Warren Yenson <warren.yenson> | ||||||
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.0 | CC: | jburke, kkruzich, petrides, riel, tao | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | ia64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2005-06-01 20:54:44 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 107562 | ||||||||
| Attachments: |
|
||||||||
I forgot to add that the system had plenty of memory free, and no swap had been used yet. It has 8GB of RAM and 2GB of swap, and I think that 6GB were still free. What is the overcommit percentage you configured in /proc ? (Note that this is after a reboot)
saias83 /usr/src/redhat/SOURCES/openafs-1.2.10# sysctl -a | grep overc
vm.overcommit_ratio = 90
vm.overcommit_memory = 2
saias83 /usr/src/redhat/SOURCES/openafs-1.2.10# sysctl -w vm.overcommit_memory=0
vm.overcommit_memory = 0
saias83 /ms/dev/openafs/core/1.2.10-1/src# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 8498495488 1002586112 7495909376 0 85590016 410075136
Swap: 2089172992 0 2089172992
MemTotal: 8299312 kB
MemFree: 7320224 kB
MemShared: 0 kB
Buffers: 83584 kB
Cached: 400464 kB
SwapCached: 0 kB
Active: 416032 kB
ActiveAnon: 226016 kB
ActiveCache: 190016 kB
Inact_dirty: 294432 kB
Inact_laundry: 0 kB
Inact_clean: 0 kB
Inact_target: 142080 kB
HighTotal: 6290720 kB
HighFree: 5629488 kB
LowTotal: 2008592 kB
LowFree: 1690736 kB
SwapTotal: 2040208 kB
SwapFree: 2040208 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 262144 kB
This behavior still appears to be exhibited in the release kernel. ...and in the Update 1 beta kernel for both x86_64 and ia64 as well. Any progress being made on this? And still in Update 1 release kernel, in case anyone was wondering. Sorry for the long delay on this ussue. When overcommit_memory=2 the virtual memory allocations will fail if the size of all allocations is greater than 90% of physical memory + swap space. It is highly possible that the the total allocations has exceeded 9GB which would ligitimately fail in this case. Please attach a "ps aux" output for starters when the allocation failures occur so we can try to sum up all of the anonymous memory allocated so far. Also, If I spin up a new test kernel that prints out the statistics when the allocations fail will you give it a test run? Larry Will try to replicate and record it. Also, I can certainly run and have the client run a test kernel. Here's another amd64 machine having the same problem. Right after the
failure, I did a "cat /proc/meminfo"
total: used: free: shared: buffers: cached:
Mem: 5666713600 5282353152 384360448 0 380510208 4268195840
Swap: 2146787328 0 2146787328
MemTotal: 5533900 kB
MemFree: 375352 kB
MemShared: 0 kB
Buffers: 371592 kB
Cached: 4168160 kB
SwapCached: 0 kB
Active: 1708980 kB
ActiveAnon: 38592 kB
ActiveCache: 1670388 kB
Inact_dirty: 2734296 kB
Inact_laundry: 73256 kB
Inact_clean: 74296 kB
Inact_target: 918164 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 5533900 kB
LowFree: 375352 kB
SwapTotal: 2096472 kB
SwapFree: 2096472 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
And I'll attach the ps output.
Created attachment 97885 [details]
ps output of machine complaining that it can't allocate any more memory
This problem has been found and a fix is pending. This is an IA64 only problem, ia64_brk() was accounting for the memory allocation twice but the unmap logic only considers it once, thereby causing a global virtual address space leak. I'll make the U2 kernel with the fix available once it has been officially built. Larry An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-188.html Why is this still open? The problem was fixed long ago, it was due to ia64_brk() calling vm_enough_memory() before calling do_brk() which ended up making 2 calls to vm_enough_memory which double accounted for the virtual address space. I ficed the problem by removing the call to vm_enough_memory() from ia6_brk(). Larry Woodman Created attachment 110887 [details]
Test program
it's broken in x86_64 as well - always was and was never fixed. I ran the pig.C program on an x86_64 system running RHEL3-U5 and I didnt get the BUG that was reported. Also, I can not reproduce the overcommit_memory problem on either ia64 or x86_64 any more. Can someone help me find a reproducer on the RHEL3-U5 kernel if it is still broken? Thanks, Larry Woodman Sorry about the confusion. You are correct the system I tested on was a RHEL3 U5. It was the x86_64 the kernel arch is the ia32e (x86_64 RHEL3). The hardware platform is a Dell PowerEdge 1800 with 4 gig of ram and Dual 3.6GHz Xeon processors. I believe the following is true "ia32e kernel stops allocating memory too early when overcommit_memory set to strict" I have a system here that Larry W will be able to use to look at the issue. Larry W is presenting at the Red Hat Summit. He will be back next week. *** Bug 159330 has been marked as a duplicate of this bug. *** This bug was against ia64 and was resolved in U2. Bug 159330 is against x86_64 and is the result of some different problem. So, I'm reclosing this one and undoing the dependency. |
Description of problem: ----------------------- During a compilation of OpenAFS, I see memory allocation errors some way through the compile. Depending on how the allocation failed, subsequent invocations of make will fail, with most of the applications complaining of insufficient memory Version-Release number of selected component (if applicable): ------------------------------------------------------------- Linux saias83 2.4.21-3.EL #1 SMP Fri Sep 19 13:59:46 EDT 2003 ia64 ia64 ia64 GNU/Linux How reproducible: ----------------- Set memory overcommit to strict Compile something (like OpenAFS) Steps to Reproduce: ------------------- 1. sysctl -w vm.overcommit_memory=2 2. ./configure --with-afs-sysname=ia64_linux24 --enable-transarc-paths --with-linux-kernel-headers=/usr/src/linux-2.4.21-3.EL 3. make Actual results: --------------- bash: fork: Cannot allocate memory Expected results: ----------------- creating cache ./config.cache checking for a BSD compatible install... /usr/bin/install -c checking whether build environment is sane... yes ... Additional info: ---------------- It seems that this was reported and fixed on i386 and x86_64 platforms. I looked at BugZilla bugs: 106010, 98413 and 104172. To avoid this, we currently set vm.overcommit_memory to 0, but strict overcommit is what we would like to use.