Bug 246086

Summary: Is accounting of Committed memory correct?
Product: Red Hat Enterprise Linux 4 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: clalance, jbaron, matthias.schroder
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0791 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:29:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backport of upstream patch to fix Committed_AS leakage none

Description Issue Tracker 2007-06-28 13:45:38 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2007-06-28 13:45:41 UTC
Description of problem:
We have a problem understanding the value for Committed_AS from /pro/meminfo, and see malloc failing despite lots of free memory.
 
How reproducible:

Unknown.

Steps to Reproduce:

Actual results:

The value for Committed_AS appears to be very high even for a machine doing nothing. We are not able to determine which process commits the memory. 

Expected results:

Value for Committed_AS should match memory usage (?), we should be able to determine which process commits the memory.

Additional info:

We have seen frequent cases of OOM_kill's, and in an attempt to reduce these have set /proc/sys/vm/overcommit_memory to 2 and /proc/sys/vm/overcommit_ratio to 50. To our surprise we ended up with huge amounts of free memory, and processes not being able to allocate memory. We observe very high values of Committed_AS, which we can not attribute to the processes on the machine or even usage as filecache. Can you please let us know 

- whether you are aware of a known issue with the accounting of committed memory;

- how we can determine which processes are responsible for the huge memory commits we observe;

- which value for the overcommit_ratio is reasonable for machines with 8GB RAM and 8GB swap;

This event sent from IssueTracker by clalance  [Support Engineering Group]
 issue 124630

Comment 2 Issue Tracker 2007-06-28 13:45:43 UTC
it seems as if other people also came across this problem, and even found a
fix for it. Can you have a look at http://bugs.centos.org/view.php?id=1608
and check whether the proposed fix
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2fd4ef85e0db9ed75c98e13953257a967ea55e03)
can be applied to the actual 4.5 kernel?

Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by clalance  [Support Engineering Group]
 issue 124630

Comment 3 Issue Tracker 2007-06-28 13:45:46 UTC
Customer seems to have hit a Committed_AS counter leakage which has been
addressed with the following upstream patch:

 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2fd4ef85e0db9ed75c98e13953257a967ea55e03

It doesn't seem to be included in latest 2.6.9-55.0.2. Escalating to
request inclusion in RHEL4 (RHEL5 already has it).

-- Navid


Issue escalated to Support Engineering Group by: navid.
Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by clalance  [Support Engineering Group]
 issue 124630

Comment 4 Chris Lalancette 2007-06-28 13:48:27 UTC
Created attachment 158124 [details]
Backport of upstream patch to fix Committed_AS leakage

This is a quick backport of the named upstream patch.  It should fix
Committed_AS leakage, although I haven't confirmed that 100% yet.

Chris Lalancette

Comment 7 Matthias Schroder 2007-06-29 14:51:46 UTC
I noticed that you have not included the changes to arch/x86_64/ia32/syscall32.c
in your version of the patch. Is that part not applicable?  

Comment 8 Chris Lalancette 2007-07-02 19:53:26 UTC
Hm, no, that part should still be applicable; I have no idea how it was missed
in the patch I did.  I'll redo the patch with that part back in.

Chris Lalancette

Comment 9 Matthias Schroder 2007-07-04 14:33:51 UTC
I did some more tests using a little python scriptlet running under a 32-bit
python (as this is what the users that suffered most from this issue do). There
I saw the following Committed_AS leaks:

kernel 2.6.9-55.EL.cernsmp : 56 kB per run

2.6.9-55.EL.cern.2smp : 40 kB per run

So there are other areas where Committed_AS leaks, not fixed by the patch...

The amount of the leakage did not change when having a little or a lot of memory
allocation within the scriptlet.



Comment 10 Matthias Schroder 2007-07-05 15:37:35 UTC
Seems I am getting closer to a test case that shows the (or yet another?)
problem nicely. A small C program (talking compiler here, not style) with
recursion. Build on a i386 system, run on x86_64. Committed_AS leakage per run : 

2.6.9-42.0.10.ELsmp   : 9768 kB / run
2.6.9-55.EL.cernsmp   : 9840 kB / run
2.6.9-55.EL.cern.2smp : 9768 kB / run

This amount of leakage does get serious...

The code:

int main(int argc, char *argv[]){
  int num;
  double fact(); 
  if ( argc != 2 ){
    printf("Please specify one integer number as argument.\n");
    return (1);
  }
  num = atoi(argv[1]);
  if ( num > -1 ) 
    printf ("factorial of %d is about %lf\n", num, fact(num));
  else
    printf ("factorial of %d is not defined.\n", num);
  return (0);
}

double fact(int num){
  if ( num == 0 )
    return (1.);
  return ( num * fact( num-1 ) );
}

To build:
cc -o fact -g fact.c

To run:
./fact 250000

To see effect:

#!/bin/csh
#
date
@ outer = 100
@ inner = 100
@ i = 0
@ j = 0
@ starting = `awk '/Committed_AS/ {print $2}' /proc/meminfo`
@ last = $starting
while ( $i < $outer ) 
    @ i ++
    echo "iteration $i"
    grep Committed /proc/meminfo
    while ( $j < $inner )
        @ j ++
	./fact 250000 >& /dev/null
    end
    @ now = `awk '/Committed_AS/ {print $2}' /proc/meminfo`
    @ loss = `expr $now - $last`
    echo "leak since last outer loop: $loss"
    @ mean = `expr $loss / $inner`
    echo "leak per call since last outer loop: $mean"
    @ loss = `expr $now - $starting`
    echo "total leak since start: $loss"
    @ mean = `expr $loss / $inner / $i` 
    echo "leak per call since start: $mean"
    @ j = 0
    @ last = $now
end
echo "Done."
grep Committed /proc/meminfo
date
exit


Comment 11 Issue Tracker 2007-07-06 11:06:56 UTC
Matthias,

thank you very much for that reproducer. I was able to obtain the same
kind of results using 2.6.9-55.0.2.ELsmp on a x86_64 system.

Regards,

-- Navid


This event sent from IssueTracker by navid 
 issue 125299

Comment 13 RHEL Program Management 2007-07-27 01:24:35 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Chris Lalancette 2007-07-27 14:26:30 UTC
2 things here:

1)  The original patch I had actually was the right patch; we already had the
bit in syscall32.c from another patch

2)  This patch was committed to the 4.6 tree.  The maintainer should be updating
this with MODIFIED pretty soon.

Chris Lalancette

Comment 15 Jason Baron 2007-07-27 17:00:24 UTC
committed in stream U6 build 55.24. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 19 errata-xmlrpc 2007-11-15 16:29:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html