Bug 472002

Summary: Memory management is broken, causing OOM kills despite plenty of available memory
Product: Red Hat Enterprise Linux 4 Reporter: Need Real Name <christopher>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: medium    
Version: 4.7CC: christopher
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-02 20:46:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2008-11-18 04:23:56 UTC
Description of problem:

User-level applications that use a lot of memory (eg: 15% of RAM on an 8gig server) trigger the OOM killer, which in turn "takes out" arbitrary other processes which don't necessarily belong to the user.

Version-Release number of selected component (if applicable):

Bug has been present in all AS4 releases, and still exists.

How reproducible:

Always

Steps to Reproduce:

1. Run some user-level processes that use a lot of memory
2. Wait a while
3. Watch the OOM killer go wild.

A fast way to watch this is as follows:
A) Install AS4u7 with the "everything" option.
B) Install the vmware server 1.0.8 application (you can use any other application if you prefer - but vmware quickly demos the problem.  This bug is not related to the application you choose - you can pick something else besides vmware if you like)
C) Install AS4u7 on a virtual machine
D) Copy the virtual machine a few times
E) Boot 3 or 4 vm's at once

*) Note - the bug seems dependent on the amount of disk activity that has taken place prior to the RAM usage - suggesting the bug is someplace in the disk cache code.  Specifically - the OOM is more likley to occur, the longer your host has been running for (eg: days).
  
Actual results:

Random processes killed - usually resulting in a complete system crash

Expected results:

I expect the same results as I get under SuSE 10.3 - that is - it all works fine, without any bugs, errors, crashing, OOM issues, or anything else.

Additional info:

SuSE 10.3 works fine.
RedHat AS4 and AS5 both crash.
A few other Linux distributions also crash with OOM problems - I forget the names (sorry) - I stopped testing when I found SuSE to be 100% stable.

Comment 1 Larry Woodman 2009-01-21 13:13:48 UTC
Please attach the show_mem() output that was printed to the console and is available via dmesg when the OOM kill occurs.  This information will tell us exactly where the memory is and why the kernel decided to OOM kill.

Larry Woodman