Bug 472002 - Memory management is broken, causing OOM kills despite plenty of available memory
Summary: Memory management is broken, causing OOM kills despite plenty of available me...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: i686
OS: Linux
medium
urgent
Target Milestone: rc
: ---
Assignee: Larry Woodman
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-18 04:23 UTC by Need Real Name
Modified: 2010-11-02 20:46 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-02 20:46:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Need Real Name 2008-11-18 04:23:56 UTC
Description of problem:

User-level applications that use a lot of memory (eg: 15% of RAM on an 8gig server) trigger the OOM killer, which in turn "takes out" arbitrary other processes which don't necessarily belong to the user.

Version-Release number of selected component (if applicable):

Bug has been present in all AS4 releases, and still exists.

How reproducible:

Always

Steps to Reproduce:

1. Run some user-level processes that use a lot of memory
2. Wait a while
3. Watch the OOM killer go wild.

A fast way to watch this is as follows:
A) Install AS4u7 with the "everything" option.
B) Install the vmware server 1.0.8 application (you can use any other application if you prefer - but vmware quickly demos the problem.  This bug is not related to the application you choose - you can pick something else besides vmware if you like)
C) Install AS4u7 on a virtual machine
D) Copy the virtual machine a few times
E) Boot 3 or 4 vm's at once

*) Note - the bug seems dependent on the amount of disk activity that has taken place prior to the RAM usage - suggesting the bug is someplace in the disk cache code.  Specifically - the OOM is more likley to occur, the longer your host has been running for (eg: days).
  
Actual results:

Random processes killed - usually resulting in a complete system crash

Expected results:

I expect the same results as I get under SuSE 10.3 - that is - it all works fine, without any bugs, errors, crashing, OOM issues, or anything else.

Additional info:

SuSE 10.3 works fine.
RedHat AS4 and AS5 both crash.
A few other Linux distributions also crash with OOM problems - I forget the names (sorry) - I stopped testing when I found SuSE to be 100% stable.

Comment 1 Larry Woodman 2009-01-21 13:13:48 UTC
Please attach the show_mem() output that was printed to the console and is available via dmesg when the OOM kill occurs.  This information will tell us exactly where the memory is and why the kernel decided to OOM kill.

Larry Woodman


Note You need to log in before you can comment on or make changes to this bug.