Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 159991 - [taroon patch] fix for indefinite postponement under __alloc_pages()
[taroon patch] fix for indefinite postponement under __alloc_pages()
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ernie Petrides
Brian Brock
Depends On:
Blocks: 156320
  Show dependency treegraph
Reported: 2005-06-09 18:03 EDT by Tim Burke
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-09-28 11:20:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:663 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6 2005-09-28 00:00:00 EDT

  None (edit)
Description Tim Burke 2005-06-09 18:03:08 EDT
From: Ernie Petrides <petrides>
Date: Thu, 12 May 2005 21:02:45 -0400
Subject: [taroon patch] fix for indefinite postponement under __alloc_pages()
Tracking: 0978.petrides.rebal-laundry-zone.patch
Archives: 2005-May/msg00253.html
Status: committed in -32.5.EL


While trying to develop a reproducer for the repeated-OOM-kill problem,
I ran into a VM problem that effectively caused my test system to hang
(or more specifically, to make no visible progress nor allow ^C via ssh
sessions to kill run-away processes).

The scenario and reproducer are documented in the RHKL archives here:


After much consultation with Larry, it was determined that the two-process
test program managed to get the two-cpu test system into a condition of
"indefinite postponement" in concurrent loops of the following functions:


with the innermost function continually returning a non-zero "work done"
value.  This behavior comes from 0777.lwoodman.incorrect-oom-kill.patch
(committed to U5), which was a fix for inappropriate OOM killing when
progress was actually being made.  That fix makes rebalance_laundry_zone()
save a zone's inactive-laundry-page count before releasing a lock on the
zone.  Then, after reacquiring the lock, if the current count value differs
from the saved value, it is assumed that some progress has been made, and
a "work done" indicator is incremented.  This utlimately results in the
allocating process staying in the outermost loop to try again, and more
importantly, preventing do_try_to_free_pages() from calling out_of_memory().

The patch below fixes this problem by only bumping the "work done" value
if the current count has been reduced from the saved count.  It also
moves the last of the three tests under the zone lock (where it belongs).

Without this fix, the reproducer repeatedly "hung" my test system for
over an hour.  With this fix, the reproducer would be OOM-killed in 2-3

Please review/ack/nak as you see fit.

Thanks.  -ernie

--- linux-2.4.21/mm/vmscan.c.orig
+++ linux-2.4.21/mm/vmscan.c
@@ -847,27 +847,27 @@ int rebalance_laundry_zone(struct zone_s
                        if ((gfp_mask & __GFP_WAIT) && (work_done < max_work)) {
                               int timed_out;
                                /* Page is being freed, waiting on lru lock */
+                               local_count = zone->inactive_laundry_pages;
                                if (!atomic_inc_if_nonzero(&page->count)) {
-                                       local_count = zone->inactive_laundry_pages;
-                                       if (local_count !=
+                                       if (zone->inactive_laundry_pages <
+                                           local_count)
                                /* move page to tail so every caller won't wait
on it */
                                list_add(&page->lru, &zone->inactive_laundry_list);
-                               local_count = zone->inactive_laundry_pages;
                                timed_out = wait_on_page_timeout(page, 5 * HZ);
-                               if (local_count !=
zone->inactive_laundry_pages)+                               if
(zone->inactive_laundry_pages < local_count)
                                 * If we timed out and the page has been in
@@ -902,10 +902,10 @@ int rebalance_laundry_zone(struct zone_s
                        try_to_release_page(page, 0);
-                       if (local_count != zone->inactive_laundry_pages)
-                               work_done++;
+                       if (zone->inactive_laundry_pages < local_count)
+                               work_done++;
                        if (unlikely((page->buffers != NULL)) &&
                                        PageInactiveLaundry(page)) {
Comment 2 Ernie Petrides 2005-06-10 00:07:43 EDT
A fix for this problem was committed to the RHEL3 U6 patch pool
on 26-May-2005 (in kernel version 2.4.21-32.5.EL).
Comment 7 Red Hat Bugzilla 2005-09-28 11:20:46 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.