Bug 106503

Summary: ia64 kernel stops allocating memory too early when overcommit_memory set to strict
Product: Red Hat Enterprise Linux 3 Reporter: Warren Yenson <warren.yenson>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jburke, kkruzich, petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-01 20:54:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 107562    
Attachments:
Description Flags
ps output of machine complaining that it can't allocate any more memory
none
Test program none

Description Warren Yenson 2003-10-07 21:06:33 UTC
Description of problem:
-----------------------
During a compilation of OpenAFS, I see memory allocation errors some way through
the compile.  Depending on how the allocation failed, subsequent invocations of
make will fail, with most of the applications complaining of insufficient memory

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
Linux saias83 2.4.21-3.EL #1 SMP Fri Sep 19 13:59:46 EDT 2003 ia64 ia64 ia64
GNU/Linux

How reproducible:
-----------------
Set memory overcommit to strict
Compile something (like OpenAFS)

Steps to Reproduce:
-------------------
1. sysctl -w vm.overcommit_memory=2
2. ./configure --with-afs-sysname=ia64_linux24 --enable-transarc-paths
--with-linux-kernel-headers=/usr/src/linux-2.4.21-3.EL
3. make
    
Actual results:
---------------
bash: fork: Cannot allocate memory

Expected results:
-----------------
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
...

Additional info:
----------------
It seems that this was reported and fixed on i386 and x86_64 platforms.  I
looked at BugZilla bugs: 106010, 98413 and 104172.

To avoid this, we currently set vm.overcommit_memory to 0, but strict overcommit
is what we would like to use.

Comment 1 Warren Yenson 2003-10-07 21:31:12 UTC
I forgot to add that the system had plenty of memory free, and no swap had been
used yet.  It has 8GB of RAM and 2GB of swap, and I think that 6GB were still
free.

Comment 2 Rik van Riel 2003-10-07 21:40:45 UTC
What is the overcommit percentage you configured in /proc ?

Comment 3 Warren Yenson 2003-10-07 22:07:08 UTC
(Note that this is after a reboot)

saias83 /usr/src/redhat/SOURCES/openafs-1.2.10# sysctl -a | grep overc
vm.overcommit_ratio = 90
vm.overcommit_memory = 2

saias83 /usr/src/redhat/SOURCES/openafs-1.2.10# sysctl -w vm.overcommit_memory=0
vm.overcommit_memory = 0

saias83 /ms/dev/openafs/core/1.2.10-1/src# cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  8498495488 1002586112 7495909376        0 85590016 410075136
Swap: 2089172992        0 2089172992
MemTotal:      8299312 kB
MemFree:       7320224 kB
MemShared:           0 kB
Buffers:         83584 kB
Cached:         400464 kB
SwapCached:          0 kB
Active:         416032 kB
ActiveAnon:     226016 kB
ActiveCache:    190016 kB
Inact_dirty:    294432 kB
Inact_laundry:       0 kB
Inact_clean:         0 kB
Inact_target:   142080 kB
HighTotal:     6290720 kB
HighFree:      5629488 kB
LowTotal:      2008592 kB
LowFree:       1690736 kB
SwapTotal:     2040208 kB
SwapFree:      2040208 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:    262144 kB


Comment 4 Frank Hirtz 2003-10-27 19:34:44 UTC
This behavior still appears to be exhibited in the release kernel. 

Comment 6 Eric Hagberg 2004-01-06 15:45:19 UTC
...and in the Update 1 beta kernel for both x86_64 and ia64 as well.

Any progress being made on this?

Comment 9 Eric Hagberg 2004-01-20 21:21:16 UTC
And still in Update 1 release kernel, in case anyone was wondering.

Comment 10 Larry Woodman 2004-02-20 15:35:03 UTC
Sorry for the long delay on this ussue.  When overcommit_memory=2
the virtual memory allocations will fail if the size of all
allocations is greater than 90% of physical memory + swap space.  It
is highly possible that the the total allocations has exceeded 9GB
which would ligitimately fail in this case.  Please attach a "ps aux"
output for starters when the allocation failures occur so we can try
to sum up all of the anonymous memory allocated so far.

Also, If I spin up a new test kernel that prints out the statistics
when the allocations fail will you give it a test run?

Larry


Comment 11 Frank Hirtz 2004-02-20 15:48:25 UTC
Will try to replicate and record it. Also, I can certainly run and
have the client run a test kernel.

Comment 12 Eric Hagberg 2004-02-20 20:21:51 UTC
Here's another amd64 machine having the same problem. Right after the
failure, I did a "cat /proc/meminfo"

        total:    used:    free:  shared: buffers:  cached:
Mem:  5666713600 5282353152 384360448        0 380510208 4268195840
Swap: 2146787328        0 2146787328
MemTotal:      5533900 kB
MemFree:        375352 kB
MemShared:           0 kB
Buffers:        371592 kB
Cached:        4168160 kB
SwapCached:          0 kB
Active:        1708980 kB
ActiveAnon:      38592 kB
ActiveCache:   1670388 kB
Inact_dirty:   2734296 kB
Inact_laundry:   73256 kB
Inact_clean:     74296 kB
Inact_target:   918164 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      5533900 kB
LowFree:        375352 kB
SwapTotal:     2096472 kB
SwapFree:      2096472 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

And I'll attach the ps output.

Comment 13 Eric Hagberg 2004-02-20 20:23:03 UTC
Created attachment 97885 [details]
ps output of machine complaining that it can't allocate any more memory

Comment 15 Larry Woodman 2004-02-25 16:52:45 UTC
This problem has been found and a fix is pending.  This is an IA64
only problem, ia64_brk() was accounting for the memory allocation
twice but the unmap logic only considers it once, thereby causing a
global virtual address space leak.

I'll make the U2 kernel with the fix available once it has been
officially built.

Larry


Comment 16 John Flanagan 2004-05-12 01:07:40 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-188.html


Comment 21 Larry Woodman 2005-02-08 12:15:23 UTC
Why is this still open?  The problem was fixed long ago, it was due to
ia64_brk() calling vm_enough_memory() before calling do_brk() which
ended up making 2 calls to vm_enough_memory which double accounted for
the virtual address space.

I ficed the problem by removing the call to vm_enough_memory() from
ia6_brk().


Larry Woodman

Comment 24 Frank Hirtz 2005-02-09 19:12:17 UTC
Created attachment 110887 [details]
Test program

Comment 26 Eric Hagberg 2005-02-09 20:50:37 UTC
it's broken in x86_64 as well - always was and was never fixed.

Comment 27 Larry Woodman 2005-04-27 13:43:25 UTC
I ran the pig.C program on an x86_64 system running RHEL3-U5 and I didnt get the
BUG that was reported.  Also, I can not reproduce the overcommit_memory problem
on either ia64 or x86_64 any more.  Can someone help me find a reproducer on the
RHEL3-U5 kernel if it is still broken?

Thanks, Larry Woodman


Comment 31 Jeff Burke 2005-06-01 18:41:17 UTC
 Sorry about the confusion. You are correct the system I tested on was a RHEL3
U5. It was the x86_64 the kernel arch is the ia32e (x86_64 RHEL3). The hardware
platform is a Dell PowerEdge 1800 with 4 gig of ram and Dual 3.6GHz Xeon processors.

 I believe the following is true "ia32e kernel stops allocating memory too early
  when overcommit_memory set to strict" 

 I have a system here that Larry W will be able to use to look at the issue.
Larry W is presenting at the Red Hat Summit. He will be back next week.


Comment 32 Kevin Kruzich 2005-06-01 18:56:15 UTC
*** Bug 159330 has been marked as a duplicate of this bug. ***

Comment 33 Ernie Petrides 2005-06-01 20:54:44 UTC
This bug was against ia64 and was resolved in U2.  Bug 159330 is against
x86_64 and is the result of some different problem.  So, I'm reclosing
this one and undoing the dependency.