Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 3 product line. The current stable release is 3.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 159991

Summary: [taroon patch] fix for indefinite postponement under __alloc_pages()
Product: Red Hat Enterprise Linux 3 Reporter: Tim Burke <tburke>
Component: kernelAssignee: Ernie Petrides <petrides>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: petrides
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-663 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 15:20:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156320    

Description Tim Burke 2005-06-09 22:03:08 UTC
From: Ernie Petrides <petrides>
Date: Thu, 12 May 2005 21:02:45 -0400
Subject: [taroon patch] fix for indefinite postponement under __alloc_pages()
Tracking: 0978.petrides.rebal-laundry-zone.patch
Archives: 2005-May/msg00253.html
Status: committed in -32.5.EL

------------------------------------------------------------------------

While trying to develop a reproducer for the repeated-OOM-kill problem,
I ran into a VM problem that effectively caused my test system to hang
(or more specifically, to make no visible progress nor allow ^C via ssh
sessions to kill run-away processes).

The scenario and reproducer are documented in the RHKL archives here:

  http://post-office.corp.redhat.com/archives/rhkernel-list/2005-May/msg00123.html

After much consultation with Larry, it was determined that the two-process
test program managed to get the two-cpu test system into a condition of
"indefinite postponement" in concurrent loops of the following functions:

        __alloc_pages()
          try_to_free_pages()
            do_try_to_free_pages()
              rebalance_laundry_zone()

with the innermost function continually returning a non-zero "work done"
value.  This behavior comes from 0777.lwoodman.incorrect-oom-kill.patch
(committed to U5), which was a fix for inappropriate OOM killing when
progress was actually being made.  That fix makes rebalance_laundry_zone()
save a zone's inactive-laundry-page count before releasing a lock on the
zone.  Then, after reacquiring the lock, if the current count value differs
from the saved value, it is assumed that some progress has been made, and
a "work done" indicator is incremented.  This utlimately results in the
allocating process staying in the outermost loop to try again, and more
importantly, preventing do_try_to_free_pages() from calling out_of_memory().

The patch below fixes this problem by only bumping the "work done" value
if the current count has been reduced from the saved count.  It also
moves the last of the three tests under the zone lock (where it belongs).

Without this fix, the reproducer repeatedly "hung" my test system for
over an hour.  With this fix, the reproducer would be OOM-killed in 2-3
minutes.

Please review/ack/nak as you see fit.

Thanks.  -ernie



--- linux-2.4.21/mm/vmscan.c.orig
+++ linux-2.4.21/mm/vmscan.c
@@ -847,27 +847,27 @@ int rebalance_laundry_zone(struct zone_s
                         */
                        if ((gfp_mask & __GFP_WAIT) && (work_done < max_work)) {
                               int timed_out;
-
+
                                /* Page is being freed, waiting on lru lock */
+                               local_count = zone->inactive_laundry_pages;
                                if (!atomic_inc_if_nonzero(&page->count)) {
-                                       local_count = zone->inactive_laundry_pages;
                                        lru_unlock(zone);
                                        cpu_relax();
                                        lru_lock(zone);
-                                       if (local_count !=
zone->inactive_laundry_pages)
+                                       if (zone->inactive_laundry_pages <
+                                           local_count)
                                                work_done++;
                                        continue;
                                }
                                /* move page to tail so every caller won't wait
on it */
                                list_del(&page->lru);
                                list_add(&page->lru, &zone->inactive_laundry_list);
-                               local_count = zone->inactive_laundry_pages;
                                lru_unlock(zone);
                                run_task_queue(&tq_disk);
                                timed_out = wait_on_page_timeout(page, 5 * HZ);
                                page_cache_release(page);
                                lru_lock(zone);
-                               if (local_count !=
zone->inactive_laundry_pages)+                               if
(zone->inactive_laundry_pages < local_count)
                                        work_done++;
                                /*
                                 * If we timed out and the page has been in
@@ -902,10 +902,10 @@ int rebalance_laundry_zone(struct zone_s
                        lru_unlock(zone);
                        try_to_release_page(page, 0);
                        UnlockPage(page);
-                       if (local_count != zone->inactive_laundry_pages)
-                               work_done++;
                        page_cache_release(page);
                        lru_lock(zone);
+                       if (zone->inactive_laundry_pages < local_count)
+                               work_done++;
                        if (unlikely((page->buffers != NULL)) &&
                                        PageInactiveLaundry(page)) {
                                del_page_from_inactive_laundry_list(page);

Comment 2 Ernie Petrides 2005-06-10 04:07:43 UTC
A fix for this problem was committed to the RHEL3 U6 patch pool
on 26-May-2005 (in kernel version 2.4.21-32.5.EL).

Comment 7 Red Hat Bugzilla 2005-09-28 15:20:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html