Bug 246086
Summary: | Is accounting of Committed memory correct? | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Issue Tracker <tao> | ||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.0 | CC: | clalance, jbaron, matthias.schroder | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHBA-2007-0791 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-11-15 16:29:36 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Issue Tracker
2007-06-28 13:45:38 UTC
Description of problem: We have a problem understanding the value for Committed_AS from /pro/meminfo, and see malloc failing despite lots of free memory. How reproducible: Unknown. Steps to Reproduce: Actual results: The value for Committed_AS appears to be very high even for a machine doing nothing. We are not able to determine which process commits the memory. Expected results: Value for Committed_AS should match memory usage (?), we should be able to determine which process commits the memory. Additional info: We have seen frequent cases of OOM_kill's, and in an attempt to reduce these have set /proc/sys/vm/overcommit_memory to 2 and /proc/sys/vm/overcommit_ratio to 50. To our surprise we ended up with huge amounts of free memory, and processes not being able to allocate memory. We observe very high values of Committed_AS, which we can not attribute to the processes on the machine or even usage as filecache. Can you please let us know - whether you are aware of a known issue with the accounting of committed memory; - how we can determine which processes are responsible for the huge memory commits we observe; - which value for the overcommit_ratio is reasonable for machines with 8GB RAM and 8GB swap; This event sent from IssueTracker by clalance [Support Engineering Group] issue 124630 it seems as if other people also came across this problem, and even found a fix for it. Can you have a look at http://bugs.centos.org/view.php?id=1608 and check whether the proposed fix (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2fd4ef85e0db9ed75c98e13953257a967ea55e03) can be applied to the actual 4.5 kernel? Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by clalance [Support Engineering Group] issue 124630 Customer seems to have hit a Committed_AS counter leakage which has been addressed with the following upstream patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2fd4ef85e0db9ed75c98e13953257a967ea55e03 It doesn't seem to be included in latest 2.6.9-55.0.2. Escalating to request inclusion in RHEL4 (RHEL5 already has it). -- Navid Issue escalated to Support Engineering Group by: navid. Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by clalance [Support Engineering Group] issue 124630 Created attachment 158124 [details]
Backport of upstream patch to fix Committed_AS leakage
This is a quick backport of the named upstream patch. It should fix
Committed_AS leakage, although I haven't confirmed that 100% yet.
Chris Lalancette
I noticed that you have not included the changes to arch/x86_64/ia32/syscall32.c in your version of the patch. Is that part not applicable? Hm, no, that part should still be applicable; I have no idea how it was missed in the patch I did. I'll redo the patch with that part back in. Chris Lalancette I did some more tests using a little python scriptlet running under a 32-bit python (as this is what the users that suffered most from this issue do). There I saw the following Committed_AS leaks: kernel 2.6.9-55.EL.cernsmp : 56 kB per run 2.6.9-55.EL.cern.2smp : 40 kB per run So there are other areas where Committed_AS leaks, not fixed by the patch... The amount of the leakage did not change when having a little or a lot of memory allocation within the scriptlet. Seems I am getting closer to a test case that shows the (or yet another?) problem nicely. A small C program (talking compiler here, not style) with recursion. Build on a i386 system, run on x86_64. Committed_AS leakage per run : 2.6.9-42.0.10.ELsmp : 9768 kB / run 2.6.9-55.EL.cernsmp : 9840 kB / run 2.6.9-55.EL.cern.2smp : 9768 kB / run This amount of leakage does get serious... The code: int main(int argc, char *argv[]){ int num; double fact(); if ( argc != 2 ){ printf("Please specify one integer number as argument.\n"); return (1); } num = atoi(argv[1]); if ( num > -1 ) printf ("factorial of %d is about %lf\n", num, fact(num)); else printf ("factorial of %d is not defined.\n", num); return (0); } double fact(int num){ if ( num == 0 ) return (1.); return ( num * fact( num-1 ) ); } To build: cc -o fact -g fact.c To run: ./fact 250000 To see effect: #!/bin/csh # date @ outer = 100 @ inner = 100 @ i = 0 @ j = 0 @ starting = `awk '/Committed_AS/ {print $2}' /proc/meminfo` @ last = $starting while ( $i < $outer ) @ i ++ echo "iteration $i" grep Committed /proc/meminfo while ( $j < $inner ) @ j ++ ./fact 250000 >& /dev/null end @ now = `awk '/Committed_AS/ {print $2}' /proc/meminfo` @ loss = `expr $now - $last` echo "leak since last outer loop: $loss" @ mean = `expr $loss / $inner` echo "leak per call since last outer loop: $mean" @ loss = `expr $now - $starting` echo "total leak since start: $loss" @ mean = `expr $loss / $inner / $i` echo "leak per call since start: $mean" @ j = 0 @ last = $now end echo "Done." grep Committed /proc/meminfo date exit Matthias, thank you very much for that reproducer. I was able to obtain the same kind of results using 2.6.9-55.0.2.ELsmp on a x86_64 system. Regards, -- Navid This event sent from IssueTracker by navid issue 125299 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. 2 things here: 1) The original patch I had actually was the right patch; we already had the bit in syscall32.c from another patch 2) This patch was committed to the 4.6 tree. The maintainer should be updating this with MODIFIED pretty soon. Chris Lalancette committed in stream U6 build 55.24. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html |