Bug 130012 - OOM Kill kicks in on 64 Gig Bull Nova system
OOM Kill kicks in on 64 Gig Bull Nova system
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2004-08-16 11:58 EDT by Bill Peck
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-12-20 15:55:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
buffers.patch (1.40 KB, text/plain)
2004-08-17 17:22 EDT, Larry Woodman
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:550 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4 2004-12-20 00:00:00 EST

  None (edit)
Description Bill Peck 2004-08-16 11:58:59 EDT
Description of problem:
When running I/O aganist qla2300 on a 16 way Bull system the OOM
killer kicks in.

Larry Woodman is looking at this.  It looks like the buffer head is
not getting reclaimed and we run out of memory.

Version-Release number of selected component (if applicable):
happens with both 

How reproducible:
run I/O for less than an hour against qla2300 controller

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Larry has built a new kernel which I am trying now
Comment 1 Larry Woodman 2004-08-16 17:26:08 EDT
There are two separate problems that are causing the OOM killer to
attack the processes on this machine: 1.) The fancyIOtlb.patch for the
IA64 system without IOMMUs in hardware cause the allocation of all
kernel data structures(kmem_cache_alloc and kmalloc) to be allocated
out of the relatively small(2GB) DMA zone.  So, it doenst take very
long before the DMA zone is totally consumed by the slab and the
system starts OOM killing.  2.) The try_to_reclaim_buffers() routine
which is responsible for reclaiming all buffer headers on RHEL3 is
only called from kswapd and not form other tasks via __alloc_pages. 
This means that on a machine with more than 10 processors its possible
for the OOM killer to be involked more than 10 times in a short
timeframe without an intervening success from kswapd.  This can result
in erroneous OOM kills as well as really lousy performance when lowmem
gets consumed by buffer headers via the slab.

I am working of separate fices for both problems.
Comment 2 Larry Woodman 2004-08-17 17:22:20 EDT
Created attachment 102811 [details]
Comment 3 Larry Woodman 2004-08-17 17:23:43 EDT
The above patch fixes both problems described above.  They have been
submitted to rhkernel-list for comments and RHEL3-U4 consideration.

Comment 4 Ernie Petrides 2004-09-14 20:08:09 EDT
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.6.EL).
Comment 5 Ernie Petrides 2004-09-18 01:57:37 EDT
The fix to the fix has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.7.EL).
Comment 6 John Flanagan 2004-12-20 15:55:54 EST
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.