Description of problem: We continue to see OOM kills and page allocation failures running RHEL4 under heavy work loads. When this happens I am seeing the zone->all_unreclaimable flag set and when this flag is set neither kswapd nor try_to_free_pages() will attempt to reclaim memory. The zone->all_unreclaimable flag is set by kswapd when is scans 4 times the number of the zone's active + inactive pages without freeing a single page to the buddy allocator via a call to free_pages_bulk(). The zone->all_unreclaimable is cleared by free_pages_bulk() every time it frees a page to that zone. There are 2 serious flaws with this logic: 1.) There is a per-cpu cache of free pages that all pages are freed to and allocated from thats designed to prevent free_bulk_pages() from being called too frequently and this per-cpu cache does not clear the zone->all_unreclaimable flag so, the freeing of pages doesnt even allow kswapd or try_to_free_pages() to continue running! 2.) The clearing of the zone->all_unreclaimable flag should not be done just when a page gets freed but also when a page writeback operation completes, afterall its kswapd and try_to_free_pages() that are responsible for freeing the writeback pages when the IO completes and not clearing zone->all_unreclaimable properly prevents these two functions from even running! Version-Release number of selected component (if applicable): 2.6.9-X How reproducible: install the RHEL4 kernel Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Linda, can you please post the patch to this BZ?
Created attachment 133250 [details] Patch to allow heavier RHEL4 system loads without encountering OOMkills This is the patch that allows the system to run with heavier loads before OOK kills are encountered.
nice. can we get a test kernel with this? i've got RH tech support request no. 1081734 open and the support tech pointed me to this BZ and it seems to jibe with what we're looking at
nevermind me. our problem is seemingly unrelated
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
QE ack for RHEL4.5.
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
The patch that was posted for this BZ is linux-2.6.9-vm-balance.patch Larry Woodman
committed in stream U6 build 55.1. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html