Red Hat Bugzilla – Bug 472002
Memory management is broken, causing OOM kills despite plenty of available memory
Last modified: 2010-11-02 16:46:51 EDT
Description of problem:
User-level applications that use a lot of memory (eg: 15% of RAM on an 8gig server) trigger the OOM killer, which in turn "takes out" arbitrary other processes which don't necessarily belong to the user.
Version-Release number of selected component (if applicable):
Bug has been present in all AS4 releases, and still exists.
Steps to Reproduce:
1. Run some user-level processes that use a lot of memory
2. Wait a while
3. Watch the OOM killer go wild.
A fast way to watch this is as follows:
A) Install AS4u7 with the "everything" option.
B) Install the vmware server 1.0.8 application (you can use any other application if you prefer - but vmware quickly demos the problem. This bug is not related to the application you choose - you can pick something else besides vmware if you like)
C) Install AS4u7 on a virtual machine
D) Copy the virtual machine a few times
E) Boot 3 or 4 vm's at once
*) Note - the bug seems dependent on the amount of disk activity that has taken place prior to the RAM usage - suggesting the bug is someplace in the disk cache code. Specifically - the OOM is more likley to occur, the longer your host has been running for (eg: days).
Random processes killed - usually resulting in a complete system crash
I expect the same results as I get under SuSE 10.3 - that is - it all works fine, without any bugs, errors, crashing, OOM issues, or anything else.
SuSE 10.3 works fine.
RedHat AS4 and AS5 both crash.
A few other Linux distributions also crash with OOM problems - I forget the names (sorry) - I stopped testing when I found SuSE to be 100% stable.
Please attach the show_mem() output that was printed to the console and is available via dmesg when the OOM kill occurs. This information will tell us exactly where the memory is and why the kernel decided to OOM kill.