Bug 472002 - Memory management is broken, causing OOM kills despite plenty of available memory
Memory management is broken, causing OOM kills despite plenty of available me...
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7
i686 Linux
medium Severity urgent
: rc
: ---
Assigned To: Larry Woodman
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-17 23:23 EST by Need Real Name
Modified: 2010-11-02 16:46 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-02 16:46:51 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2008-11-17 23:23:56 EST
Description of problem:

User-level applications that use a lot of memory (eg: 15% of RAM on an 8gig server) trigger the OOM killer, which in turn "takes out" arbitrary other processes which don't necessarily belong to the user.

Version-Release number of selected component (if applicable):

Bug has been present in all AS4 releases, and still exists.

How reproducible:

Always

Steps to Reproduce:

1. Run some user-level processes that use a lot of memory
2. Wait a while
3. Watch the OOM killer go wild.

A fast way to watch this is as follows:
A) Install AS4u7 with the "everything" option.
B) Install the vmware server 1.0.8 application (you can use any other application if you prefer - but vmware quickly demos the problem.  This bug is not related to the application you choose - you can pick something else besides vmware if you like)
C) Install AS4u7 on a virtual machine
D) Copy the virtual machine a few times
E) Boot 3 or 4 vm's at once

*) Note - the bug seems dependent on the amount of disk activity that has taken place prior to the RAM usage - suggesting the bug is someplace in the disk cache code.  Specifically - the OOM is more likley to occur, the longer your host has been running for (eg: days).
  
Actual results:

Random processes killed - usually resulting in a complete system crash

Expected results:

I expect the same results as I get under SuSE 10.3 - that is - it all works fine, without any bugs, errors, crashing, OOM issues, or anything else.

Additional info:

SuSE 10.3 works fine.
RedHat AS4 and AS5 both crash.
A few other Linux distributions also crash with OOM problems - I forget the names (sorry) - I stopped testing when I found SuSE to be 100% stable.
Comment 1 Larry Woodman 2009-01-21 08:13:48 EST
Please attach the show_mem() output that was printed to the console and is available via dmesg when the OOM kill occurs.  This information will tell us exactly where the memory is and why the kernel decided to OOM kill.

Larry Woodman

Note You need to log in before you can comment on or make changes to this bug.