Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 591283 - __alloc_pages_nodemask might schedule even if __GFP_WAIT not set in gfp_mask, leading to deadlock
__alloc_pages_nodemask might schedule even if __GFP_WAIT not set in gfp_mask,...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
All Linux
low Severity high
: rc
: ---
Assigned To: Larry Woodman
Qian Cai
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-11 15:38 EDT by Dan Hecht
Modified: 2010-11-11 11:13 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-11 11:13:42 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dan Hecht 2010-05-11 15:38:00 EDT
Description of problem:

__alloc_pages_nodemask can call schedule even if __GFP_WAIT not set in gfp_mask, leading to deadlock.

The path __alloc_pages_nodemask -> __alloc_pages_slowpath -> get_pages_from_freelist -> cpuset_zone_allowed_softwall can schedule, even if __GFP_WAIT is not set.  The problem seems to have been introduced by the patch listed below.

Prior to this patch, the alloc_flags computed by gfp_to_alloc_flags call in __alloc_pages_slowpath would have cleared ALLOC_CPUSET if __GFP_WAIT is not set.  That prevents get_page_from_freelist from calling cpuset_zone_allowed_softwall, which might schedule if __GFP_HARDWALL is not set, which it won't be when called from this slowpath.

After the patch, ALLOC_CPUSET is only cleared if both __GFP_WAIT is not set and __GFP_NOMEMALLOC is not set.  So, in the case __GFP_NOMEMALLOC was set, the code can now go down a path that might schedule even though __GFP_WAIT was clear.

This is the patch that seems to have introduced the problem:

From: Andrea Arcangeli <aarcange@redhat.com>
Date: Mon, 1 Feb 2010 15:17:24 -0500
Subject: [mm] dont alloc harder for gfp nomemalloc even if nowait
Message-id: <20100201152040.198156184@redhat.com>
Patchwork-id: 23035
O-Subject: [RHEL6 27/37] dont alloc harder for gfp nomemalloc even if nowait
Bugzilla: 556572
RH-Acked-by: Larry Woodman <lwoodman@redhat.com>

From: Andrea Arcangeli <aarcange@redhat.com>

Not worth throwing away the precious reserved free memory pool for allocations
that can fail gracefully (either through mempool or because they're transhuge
allocations later falling back to 4k allocations).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ec9b70d..86aa0af 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1762,7 +1762,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
         */
        alloc_flags |= (gfp_mask & __GFP_HIGH);

- if (!wait) {
+ /*
+ * Not worth trying to allocate harder for __GFP_NOMEMALLOC
+ * even if it can't schedule.
+ */
+ if (!wait && !(gfp_mask & __GFP_NOMEMALLOC)) {
                alloc_flags |= ALLOC_HARDER;
                /*
                 * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc.


Version-Release number of selected component (if applicable): 2.6.32-19.el6

How reproducible: 100%


Steps to Reproduce:
1. Run disk intensive workload for a few hours.

See the attached core file for an example deadlock caused by this bug.
  
Actual results: Host hangs due to this deadlock.


Expected results: Host does not hang.


Additional info:
Comment 1 Dan Hecht 2010-05-11 15:41:27 EDT
The gzip'ed core file was rejected as an attachment because it was too large (74MB).  If you want the core, let me know where you'd like it sent.
Comment 3 RHEL Product and Program Management 2010-05-11 17:56:59 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 4 Larry Woodman 2010-05-20 11:15:45 EDT
This patch is being removed in RHEL6-Beta2 as part of a total replacement fo the Transparent Hugepage patch set.

----------------------------------------------------------------------------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ec9b70d..86aa0af 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1762,7 +1762,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
         */
        alloc_flags |= (gfp_mask & __GFP_HIGH);

- if (!wait) {
+ /*
+ * Not worth trying to allocate harder for __GFP_NOMEMALLOC
+ * even if it can't schedule.
+ */
+ if (!wait && !(gfp_mask & __GFP_NOMEMALLOC)) {
                alloc_flags |= ALLOC_HARDER;
                /*
                 * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc.
-----------------------------------------------------------------------------

Larry Woodman
Comment 7 Aristeu Rozanski 2010-05-25 13:43:55 EDT
Patch(es) available on kernel-2.6.32-29.el6
Comment 10 Alok Kataria 2010-06-08 19:37:28 EDT
Aristeu, where are the rpm's available for this kernel ?
Comment 12 Subhendu Ghosh 2010-07-21 22:45:02 EDT
Alok, this should be covered in the public beta 2 referesh released today.
Comment 13 Alok Kataria 2010-08-02 13:09:14 EDT
Yep this seems to be fixed with the beta2 release. Thanks.

Please feel free to close it.
Comment 14 releng-rhel@redhat.com 2010-11-11 11:13:42 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.