Bug 189404 - revention of OOM kills and page allocation failures under heavy load
revention of OOM kills and page allocation failures under heavy load
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks: 176344
  Show dependency treegraph
 
Reported: 2006-04-19 15:08 EDT by Linda Wang
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-15 11:14:02 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to allow heavier RHEL4 system loads without encountering OOMkills (1.01 KB, patch)
2006-07-28 13:19 EDT, Larry Woodman
no flags Details | Diff

  None (edit)
Description Linda Wang 2006-04-19 15:08:17 EDT
Description of problem:

We continue to see OOM kills and page allocation failures running RHEL4 under
heavy work loads.
When this happens I am seeing the zone->all_unreclaimable flag set and when this
flag is set neither
kswapd nor try_to_free_pages() will attempt to reclaim memory.  The
zone->all_unreclaimable
flag is set by kswapd when is scans 4 times the number of the zone's active +
inactive pages
without freeing a single page to the buddy allocator via a call to
free_pages_bulk().  The zone->all_unreclaimable
is cleared by free_pages_bulk() every time it frees a page to that zone.

There are 2 serious flaws with this logic: 1.) There is a per-cpu cache of free
pages that all pages are
freed to and allocated from thats designed to prevent free_bulk_pages() from
being called too frequently
and this per-cpu cache does not clear the zone->all_unreclaimable flag so, the
freeing of pages doesnt even
allow kswapd or try_to_free_pages() to continue running!  2.) The clearing of
the zone->all_unreclaimable
flag should not be done just when a page gets freed but also when a page
writeback operation completes,
afterall its kswapd and try_to_free_pages() that are responsible for freeing the
writeback pages when
the IO completes and not clearing zone->all_unreclaimable properly prevents
these two functions from
even running! 

Version-Release number of selected component (if applicable):
2.6.9-X

How reproducible:
install the RHEL4 kernel

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 3 Kevin Krafthefer 2006-06-28 14:08:13 EDT
Linda, can you please post the patch to this BZ?
Comment 4 Larry Woodman 2006-07-28 13:19:03 EDT
Created attachment 133250 [details]
Patch to allow heavier RHEL4 system loads without encountering OOMkills


This is the patch that allows the system to run with heavier loads before OOK
kills are encountered.
Comment 5 Buck Huppmann 2006-11-02 12:31:49 EST
nice. can we get a test kernel with this? i've got RH tech support request
no. 1081734 open and the support tech pointed me to this BZ and it seems
to jibe with what we're looking at
Comment 6 Buck Huppmann 2006-11-21 09:40:53 EST
nevermind me. our problem is seemingly unrelated
Comment 7 RHEL Product and Program Management 2006-12-11 15:36:26 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 8 Jay Turner 2007-01-02 08:42:48 EST
QE ack for RHEL4.5.
Comment 11 RHEL Product and Program Management 2007-03-21 17:25:20 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 RHEL Product and Program Management 2007-04-18 19:00:17 EDT
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.
Comment 13 Larry Woodman 2007-09-24 12:01:51 EDT
The patch that was posted for this BZ is linux-2.6.9-vm-balance.patch

Larry Woodman
Comment 14 Jason Baron 2007-09-24 15:18:39 EDT
committed in stream U6 build 55.1. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 17 errata-xmlrpc 2007-11-15 11:14:02 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html

Note You need to log in before you can comment on or make changes to this bug.