Bug 124561

Summary: Kernel seems to leak memory after killing applications through "Out of Memory"
Product: Red Hat Enterprise Linux 3 Reporter: Robert Scheck <redhat-bugzilla>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: jbaron, petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-02-06 12:06:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Scheck 2004-05-27 14:42:25 UTC
Description of problem:
I'm using a HP Integrity rx1600 Server with 1 GB RAM for the test.
After booting up, we've got this:

--- snipp ---
[root@rx1600 root]# free
             total       used       free     shared    buffers     cached
Mem:       1005968     151568     854400          0      22464      50736
-/+ buffers/cache:      78368     927600
Swap:      2040208          0    2040208
[root@rx1600 root]#
--- snapp ---

For this benchmark, I use iozone <http://www.iozone.org/> here, the 
application itself needs 100-300 MB RAM (depends on the parameters 
given to iozone). Iozone does lots of I/O things (as the name says)
and this needs page cache for which is finally taken from the RAM.

If too much RAM is required (happens sometimes at the tests), the 
Linux kernel kills applications (I think, this is the normal kernel 2.4 behaviour):

--- snipp ---
May 27 14:51:19 rx1600 kernel: Out of Memory: Killed process 9424 (sendmail).
May 27 14:51:45 rx1600 kernel: Out of Memory: Killed process 9461 (xfs).
May 27 14:51:56 rx1600 kernel: Out of Memory: Killed process 9983 (bash).
May 27 14:51:57 rx1600 sshd(pam_unix)[9978]: session closed for user root
May 27 14:54:40 rx1600 kernel: Out of Memory: Killed process 10142 (bash).
May 27 14:54:40 rx1600 sshd(pam_unix)[10140]: session closed for user root
--- snapp ---

PID 10142 contained iozone, so it was also killed. But if I login 
again and check the memory again:

--- snipp ---
[root@rx1600 root]# free
             total       used       free     shared    buffers     cached
Mem:       1005968     987232      18736          0     546928     293984
-/+ buffers/cache:     146320     859648
Swap:      2040208          0    2040208
[root@rx1600 root]#
--- snapp ---

We've got only 18 MB free...I tried to started iozone with the same
parameters again, but it was killed ~ 30 seconds after start (out of
memory, too).

So my result is, that the kernel seems to leak memory after killing 
applications through "Out of Memory", because the normal behaviour
would be (or should), that the page cache is given free after the 
out-of-memory kill which devotes more free RAM...but it isn't case 
here :-(

And iozone is really killed (ps -aux | grep iozone returned nothing).

Version-Release number of selected component (if applicable):
kernel-2.4.21-9.EL and newer

How reproducible & Steps to Reproduce:
Everytime, see above.

Actual results:
Well, a reboot for example gives me the leaked memory back.

Expected results:
When the kernel kills applications caused "Out of Memory" problems,
the used page cache should be given free which devotes to more free
RAM.

Comment 1 Larry Woodman 2004-07-29 15:53:08 UTC
Robert, can you see if the unexpected OOM kill results still a problem
with the latest RHEL3 IA64 kernel?  It is located in:

 http://people.redhat.com/~lwoodman/IA64/

Also, FYI, the pagecache memory that was mapped into a process which
was OOM-killed will not get freed up immediately.  Instead, the
reference count will be decremented accordingly and the pages will
remain in the pagecache until they are either reused by some other
process or the system deems in necessary reclaim and free them.  The
system is not leaking memory, its just holding on to cached filesystem
data pages.  If that results in pre-mature OOM kills then thats a
separate problem, not a memory leak.

We have made several changes to the kernel since the 2.4.21-9.EL that
will delay and/or eliminate OOM kills and that what I want you to test
for us.

Larry Woodman


Larry Woodman


Comment 2 Robert Scheck 2005-02-06 12:06:01 UTC
At least 2.4.21-27.EL solves this issue for me - at RHEL3. RHEL4 
doesn't have already this problem. Thank you, Larry :)

Comment 3 Ernie Petrides 2005-02-07 22:52:59 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html