Red Hat Bugzilla – Bug 159991
[taroon patch] fix for indefinite postponement under __alloc_pages()
Last modified: 2007-11-30 17:07:07 EST
From: Ernie Petrides <petrides> Date: Thu, 12 May 2005 21:02:45 -0400 Subject: [taroon patch] fix for indefinite postponement under __alloc_pages() Tracking: 0978.petrides.rebal-laundry-zone.patch Archives: 2005-May/msg00253.html Status: committed in -32.5.EL ------------------------------------------------------------------------ While trying to develop a reproducer for the repeated-OOM-kill problem, I ran into a VM problem that effectively caused my test system to hang (or more specifically, to make no visible progress nor allow ^C via ssh sessions to kill run-away processes). The scenario and reproducer are documented in the RHKL archives here: http://post-office.corp.redhat.com/archives/rhkernel-list/2005-May/msg00123.html After much consultation with Larry, it was determined that the two-process test program managed to get the two-cpu test system into a condition of "indefinite postponement" in concurrent loops of the following functions: __alloc_pages() try_to_free_pages() do_try_to_free_pages() rebalance_laundry_zone() with the innermost function continually returning a non-zero "work done" value. This behavior comes from 0777.lwoodman.incorrect-oom-kill.patch (committed to U5), which was a fix for inappropriate OOM killing when progress was actually being made. That fix makes rebalance_laundry_zone() save a zone's inactive-laundry-page count before releasing a lock on the zone. Then, after reacquiring the lock, if the current count value differs from the saved value, it is assumed that some progress has been made, and a "work done" indicator is incremented. This utlimately results in the allocating process staying in the outermost loop to try again, and more importantly, preventing do_try_to_free_pages() from calling out_of_memory(). The patch below fixes this problem by only bumping the "work done" value if the current count has been reduced from the saved count. It also moves the last of the three tests under the zone lock (where it belongs). Without this fix, the reproducer repeatedly "hung" my test system for over an hour. With this fix, the reproducer would be OOM-killed in 2-3 minutes. Please review/ack/nak as you see fit. Thanks. -ernie --- linux-2.4.21/mm/vmscan.c.orig +++ linux-2.4.21/mm/vmscan.c @@ -847,27 +847,27 @@ int rebalance_laundry_zone(struct zone_s */ if ((gfp_mask & __GFP_WAIT) && (work_done < max_work)) { int timed_out; - + /* Page is being freed, waiting on lru lock */ + local_count = zone->inactive_laundry_pages; if (!atomic_inc_if_nonzero(&page->count)) { - local_count = zone->inactive_laundry_pages; lru_unlock(zone); cpu_relax(); lru_lock(zone); - if (local_count != zone->inactive_laundry_pages) + if (zone->inactive_laundry_pages < + local_count) work_done++; continue; } /* move page to tail so every caller won't wait on it */ list_del(&page->lru); list_add(&page->lru, &zone->inactive_laundry_list); - local_count = zone->inactive_laundry_pages; lru_unlock(zone); run_task_queue(&tq_disk); timed_out = wait_on_page_timeout(page, 5 * HZ); page_cache_release(page); lru_lock(zone); - if (local_count != zone->inactive_laundry_pages)+ if (zone->inactive_laundry_pages < local_count) work_done++; /* * If we timed out and the page has been in @@ -902,10 +902,10 @@ int rebalance_laundry_zone(struct zone_s lru_unlock(zone); try_to_release_page(page, 0); UnlockPage(page); - if (local_count != zone->inactive_laundry_pages) - work_done++; page_cache_release(page); lru_lock(zone); + if (zone->inactive_laundry_pages < local_count) + work_done++; if (unlikely((page->buffers != NULL)) && PageInactiveLaundry(page)) { del_page_from_inactive_laundry_list(page);
A fix for this problem was committed to the RHEL3 U6 patch pool on 26-May-2005 (in kernel version 2.4.21-32.5.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html