Bug 234278 - CFQ I/O scheduler causes excessive slab usage/system hangs
Summary: CFQ I/O scheduler causes excessive slab usage/system hangs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Eric Sandeen
QA Contact: Martin Jenner
URL:
Whiteboard:
: 158636 232553 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-28 03:22 UTC by John Sobecki
Modified: 2018-10-19 23:38 UTC (History)
8 users (show)

Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-15 16:23:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch from Jens Axboe attached (802 bytes, patch)
2007-03-28 03:26 UTC, John Sobecki
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0791 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6 2007-11-14 18:25:55 UTC

Description John Sobecki 2007-03-28 03:22:12 UTC
Description of problem:

The current seting of nr_requests at 8192 in the cfq I/O scheduler
has resulted in many Oracle database customer system hangs, due
to excessive slab usage.  

This large slab usage then degrades the machine into a VM induced hang, which
can also result in NMI Panic's in shrink_zone if nmi_watchdog=1.

NMI Watchdog detected LOCKUP, CPU=7

Call Trace:<ffffffff80165863>{shrink_zone+3970}
 <ffffffff8015a44a>{find_get_page+65}
 <ffffffff80165ae2>{try_to_free_pages+303}
 <ffffffff8015e177>{__alloc_pages+527}
 <ffffffff8017424e>{alloc_page_vma+268}
 <ffffffff80169888>{do_no_page+651}
 <ffffffff80169e49>{handle_mm_fault+373}
 <ffffffff8015a44a>{find_get_page+65}
 <ffffffff80123e87>{do_page_fault+514} 
 <ffffffff801909f5>{__d_path+180}
 <ffffffff80110da9>{error_exit+0}
 <ffffffff801f3d22>{copy_user_generic_c+8}

Version-Release number of selected component (if applicable):

RHEL4 U4

How reproducible:

Everytime using 8 concurrent dd test jobs, the Slab usage exceeds 3.5GB
on a 16GB machine, like:

  dd if=/dev/sdcb1 of=/dev/sdm1 bs=1048576 count=24000

Steps to Reproduce:
1.  Fire up dd jobs
2.  Wait 20-30 minutes, monitor slab usage in /proc/meminfo
3.  Box will hang or NMI Panic
  
Actual results:

Box will hang or NMI Panic

Expected results:

Box will hang or NMI Panic

Additional info:

Per discussion with Jens Axboe and Chris Mason, cfq should be de-tuned
to reduce the VM pressure caused by cfq nr_request setting of 8192.

Comment 1 John Sobecki 2007-03-28 03:26:37 UTC
Created attachment 151091 [details]
Patch from Jens Axboe attached

Comment 2 John Sobecki 2007-04-11 20:22:47 UTC
Customer's experiencing the problems:

  - Amazon
  - Oracle global email

Thanks, John

Comment 4 RHEL Program Management 2007-05-09 05:16:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Eric Sandeen 2007-06-18 22:58:54 UTC
I have the test kernels built but I'd like to look at the proposed fix a bit
more; a couple things seem odd to me, need to sort it out.

Comment 14 Eric Sandeen 2007-06-19 15:09:40 UTC
x86_64 test kernels at:

http://people.redhat.com/esandeen/bz234278/

If you need src.rpm, debuginfo, or anything else, just shout.

Thanks,
-Eric

Comment 16 Eric Sandeen 2007-06-19 18:00:20 UTC
FWIW, for those hitting this problem... 

echo 128 > /sys/block/<$DEV>/queue/nr_requests

should have pretty much the same effect w/o needing a patch.  It may still make
sense to change the default though...

Comment 18 Eric Sandeen 2007-06-19 20:00:13 UTC
John, was your "dd" testcase 8 instances of dd to/from the same block devices,
or was it to/from 8 different pairs of block devices?

Thanks,
-Eric

Comment 19 Larry Woodman 2007-06-20 20:50:28 UTC
Eric, This morning I looked again at the throttling that does exist in RHEL4
and compared it to upstream and RHEL5.  I think the real RHEL4 issue with 32-bit
devices and bounce buffer limits is the setting of 
"q->nr_requests = 8192" in cfq_init() in drivers/block/cfq-iosched.c
This number multiplied by q->max_sectors(128)*1024 is the maximum number of
bytes that can on the IO queue(which is 1073741824 or 262144 pages) before
__make_requests blocks and starts throttling.  Basically __make_request() can
try allocate 262144 bounce buffer and queue them to the device before it blocks
ans since this is larger than Lowmem we exhaust it and potentially start OOM
killing.  The upstream kernel including RHEL5 changed this by setting
"q->nr_requests = 128" in blk_queue_make_request() thereby limiting the maximum
number of bytes that can on the IO queue to 16777216 or 4096 pages which is much
less than Lowmem.  This is what prevents bounce buffers from exhausting Lowmem
in RHEL5 and other upstream kernels.  The good news is that both the nr_requests
and max_sectors are tunable on a per-device basis via the /proc filesystem. 
/sys/block/<device>/queue/nr_requests is set to 8192 on all devices by default
in RHEL4.  If you lower that value to 512 via "echo 512 >
/sys/block/<device>/queue/nr_requests" it will provide the same functionality on
a stock RHEL4 kernel as the bounce buffer patch does when it limits the bounce
buffers to 64MB.  This should be done for all 32-bit devices since they need
bounce buffers and could exhaust Lowmem.  Can you try this ???

Larry Woodman

Comment 20 Eric Sandeen 2007-06-20 21:02:02 UTC
I sent a note to Jens about the proposed patch:

It seems that really 2 things are going on here: nr_requests is left at
the default of 128 thanks to the last hunk, and the upper limit on
"limit" in cfq_may_queue is no longer enforced thanks to the 2nd hunk;
however, limit can still be no larger than nr_requests:

    int limit = (q->nr_requests - cfqd->cfq_queued) / cfqd->busy_queues;

Before the change, though, the upper limit was max_queued, or 128
(thanks to the assignment from the default nr_requests).  Post-change,
it seems that if nr_requests gets tuned  up high,  "limit" could be
equally high,  and I'm wondering if this was an intended change in
behavior...

Comment 21 Eric Sandeen 2007-06-21 15:57:57 UTC
Hm.  So far I've not been able to recreate this on a handful of boxes.

Comment 26 Eric Sandeen 2007-07-12 15:39:53 UTC
Some clarification on the patch from a conversation with Jens:

>>> Before the change, though, the upper limit was max_queued, or 128
>>> > >> (thanks to the assignment from the default nr_requests).  Post-change,
>>> > >> it seems that if nr_requests gets tuned  up high,  "limit" could be
>>> > >> equally high,  and I'm wondering if this was an intended change in
>>> > >> behavior...
>> > > 
>> > > It's intended, yes. It'll work strangely if we don't also expose a
>> > > per-process queue limit in sysfs for cfq.
>> > > 
> > 
> > But that limit is potentially much higher after the change, right?

Yes, I'd argue that the behaviour before was a bug  :-) 

Comment 29 Larry Woodman 2007-07-17 15:42:56 UTC
*** Bug 232553 has been marked as a duplicate of this bug. ***

Comment 31 Eric Sandeen 2007-07-18 05:32:33 UTC
yes, I think this is a reasonable workaround for now, thanks for the info too.

Comment 32 Jason Baron 2007-07-25 17:54:18 UTC
*** Bug 158636 has been marked as a duplicate of this bug. ***

Comment 33 Jason Baron 2007-07-26 14:18:44 UTC
committed in stream U6 build 55.23. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 37 Eric Sandeen 2007-08-08 14:02:43 UTC
It is set to 128 because that matches what is used in the upstream kernels.  Is
there a documented performance regression with 128 vs. 512?  And if it is only
for some various workloads, the system can always be tuned to 512 if preferred.

Comment 40 John Poelstra 2007-08-29 04:22:51 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6 Beta released on RHN (also available at partners.redhat.com).  

Requested action: Please verify that your issue is fixed to ensure that it is
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 41 John Poelstra 2007-09-05 22:23:18 UTC
A fix for this issue should have been included in the packages contained in 
the RHEL4.6-Snapshot1 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed to ensure that it is 
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed, 
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent 
symptoms of the problem you are having and change the status of the bug to 
FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test 
results to Issue Tracker.  If you need assistance accessing 
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 42 John Poelstra 2007-09-12 00:43:35 UTC
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on
partners.redhat.com.  

Please verify that your issue is fixed to ensure that it is included in this
update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 47 errata-xmlrpc 2007-11-15 16:23:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html



Note You need to log in before you can comment on or make changes to this bug.