Bug 463504
Summary: | Hang in shrink_zone during swap pressure, due to direct reclaim threads | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | John Sobecki <john.sobecki> | ||||||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5.2 | CC: | chris.mason, greg.marsden, john.sobecki, lwoodman, mdavis, riel | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-06-02 13:23:51 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
John Sobecki
2008-09-23 18:54:56 UTC
Created attachment 317511 [details]
2.6.18 proposed patch (backported from mainline)
Created attachment 317531 [details]
Run N+1 threads (where N=#CPUs) using ./bigmalloc 2000 &
Created attachment 319367 [details]
2.6.9 patch
Tested hard with both Oracle DB and non-DB stress loads, and is now in production on Oracle Global Email.
Thanks john, will take care of this. On quick question the comment says "direct reclaiming for contiguous pages", does this mean order>0 ??? + /* + * If we are direct reclaiming for contiguous pages and we do + * not reclaim everything in the list, try again and wait + * for IO to complete. This will stall high-order allocations + * but that should be acceptable to the caller + */ Larry Hi Larry, No check for order, just for nr_freed < nr_taken. Seems like similar was backported from mainline in BZ 495442 - check that out and see if a similar mainline patch to what I posted in https://bugzilla.redhat.com/attachment.cgi?id=317511 is being used. Thanks, John 2.6.18 does not have lumpy reclaim, so the comment in the patch makes little sense. John, what exactly are you trying to achieve with this patch? Also, how does the patch achieve what you want to achieve? Also, why are you forcefully deactivating pages that were activated by shrink_page_list? FIFO page replacement was proven to be a bad idea in the 1960's and not a mistake to repeat 40 decades later. Obviously your patch fixes something and achieves it in some way. Lets get to the bottom of what it really does, so we can get the bug fixed without the bad side effects. Btw, I suspect the bug may already have disappeared in RHEL 5.4, due to never reclaiming more than 32 pages in direct reclaim - that should get the worst excesses of parallel direct reclaim out of the picture alltogether. 5.4 beta 2.6.18-155 is doing: if (nr_reclaimed > swap_cluster_max && priority < DEF_PRIORITY && !current_is_kswapd()) break; The break doesn't relieve the machine from experiencing 'scheduling brownouts' as seen by other software (timing sensitive). My patch was a backport from mainline, where they are calling congestion_wait to force the direct reclaim threads to come up for air, and hence let some other processes get a bit of CPU time. The box is already under heavy mem/swap pressure so punishing the direct reclaimers seemed to be a fair way to keep the machine someone responsive, and avoids clusterware evictions. From 2.6-30.1: if (nr_freed < nr_taken && !current_is_kswapd() && sc->order > PAGE_ALLOC_COSTLY_ORDER) { congestion_wait(WRITE, HZ/10); Not perfect, by any means. Thanks, John Your backport is not only "not perfect", it is also totally unacceptable because it would cause the code to fall through to always reclaiming any page (regardless of whether it is recently referenced), not just for higher order allocations like upstream. If doing just the congestion_wait helps things, that is something worth considering for a RHEL backport. However, I believe that such a congestion wait should only be done if we are already at priority < DEF_PRIORITY, because it is normal that not all pages are reclaimed - we _want_ recently referenced pages to be retained, not reclaimed. This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug. Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support). This is already closed, and this fix most likely addressed the contention problem: - [mm] vmscan: bail out of direct reclaim after max pages (Rik van Riel ) [495442] Fixed in 2.6.18-371 and higher. |