Description of problem: The current seting of nr_requests at 8192 in the cfq I/O scheduler has resulted in many Oracle database customer system hangs, due to excessive slab usage. This large slab usage then degrades the machine into a VM induced hang, which can also result in NMI Panic's in shrink_zone if nmi_watchdog=1. NMI Watchdog detected LOCKUP, CPU=7 Call Trace:<ffffffff80165863>{shrink_zone+3970} <ffffffff8015a44a>{find_get_page+65} <ffffffff80165ae2>{try_to_free_pages+303} <ffffffff8015e177>{__alloc_pages+527} <ffffffff8017424e>{alloc_page_vma+268} <ffffffff80169888>{do_no_page+651} <ffffffff80169e49>{handle_mm_fault+373} <ffffffff8015a44a>{find_get_page+65} <ffffffff80123e87>{do_page_fault+514} <ffffffff801909f5>{__d_path+180} <ffffffff80110da9>{error_exit+0} <ffffffff801f3d22>{copy_user_generic_c+8} Version-Release number of selected component (if applicable): RHEL4 U4 How reproducible: Everytime using 8 concurrent dd test jobs, the Slab usage exceeds 3.5GB on a 16GB machine, like: dd if=/dev/sdcb1 of=/dev/sdm1 bs=1048576 count=24000 Steps to Reproduce: 1. Fire up dd jobs 2. Wait 20-30 minutes, monitor slab usage in /proc/meminfo 3. Box will hang or NMI Panic Actual results: Box will hang or NMI Panic Expected results: Box will hang or NMI Panic Additional info: Per discussion with Jens Axboe and Chris Mason, cfq should be de-tuned to reduce the VM pressure caused by cfq nr_request setting of 8192.
Created attachment 151091 [details] Patch from Jens Axboe attached
Customer's experiencing the problems: - Amazon - Oracle global email Thanks, John
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I have the test kernels built but I'd like to look at the proposed fix a bit more; a couple things seem odd to me, need to sort it out.
x86_64 test kernels at: http://people.redhat.com/esandeen/bz234278/ If you need src.rpm, debuginfo, or anything else, just shout. Thanks, -Eric
FWIW, for those hitting this problem... echo 128 > /sys/block/<$DEV>/queue/nr_requests should have pretty much the same effect w/o needing a patch. It may still make sense to change the default though...
John, was your "dd" testcase 8 instances of dd to/from the same block devices, or was it to/from 8 different pairs of block devices? Thanks, -Eric
Eric, This morning I looked again at the throttling that does exist in RHEL4 and compared it to upstream and RHEL5. I think the real RHEL4 issue with 32-bit devices and bounce buffer limits is the setting of "q->nr_requests = 8192" in cfq_init() in drivers/block/cfq-iosched.c This number multiplied by q->max_sectors(128)*1024 is the maximum number of bytes that can on the IO queue(which is 1073741824 or 262144 pages) before __make_requests blocks and starts throttling. Basically __make_request() can try allocate 262144 bounce buffer and queue them to the device before it blocks ans since this is larger than Lowmem we exhaust it and potentially start OOM killing. The upstream kernel including RHEL5 changed this by setting "q->nr_requests = 128" in blk_queue_make_request() thereby limiting the maximum number of bytes that can on the IO queue to 16777216 or 4096 pages which is much less than Lowmem. This is what prevents bounce buffers from exhausting Lowmem in RHEL5 and other upstream kernels. The good news is that both the nr_requests and max_sectors are tunable on a per-device basis via the /proc filesystem. /sys/block/<device>/queue/nr_requests is set to 8192 on all devices by default in RHEL4. If you lower that value to 512 via "echo 512 > /sys/block/<device>/queue/nr_requests" it will provide the same functionality on a stock RHEL4 kernel as the bounce buffer patch does when it limits the bounce buffers to 64MB. This should be done for all 32-bit devices since they need bounce buffers and could exhaust Lowmem. Can you try this ??? Larry Woodman
I sent a note to Jens about the proposed patch: It seems that really 2 things are going on here: nr_requests is left at the default of 128 thanks to the last hunk, and the upper limit on "limit" in cfq_may_queue is no longer enforced thanks to the 2nd hunk; however, limit can still be no larger than nr_requests: int limit = (q->nr_requests - cfqd->cfq_queued) / cfqd->busy_queues; Before the change, though, the upper limit was max_queued, or 128 (thanks to the assignment from the default nr_requests). Post-change, it seems that if nr_requests gets tuned up high, "limit" could be equally high, and I'm wondering if this was an intended change in behavior...
Hm. So far I've not been able to recreate this on a handful of boxes.
Some clarification on the patch from a conversation with Jens: >>> Before the change, though, the upper limit was max_queued, or 128 >>> > >> (thanks to the assignment from the default nr_requests). Post-change, >>> > >> it seems that if nr_requests gets tuned up high, "limit" could be >>> > >> equally high, and I'm wondering if this was an intended change in >>> > >> behavior... >> > > >> > > It's intended, yes. It'll work strangely if we don't also expose a >> > > per-process queue limit in sysfs for cfq. >> > > > > > > But that limit is potentially much higher after the change, right? Yes, I'd argue that the behaviour before was a bug :-)
*** Bug 232553 has been marked as a duplicate of this bug. ***
yes, I think this is a reasonable workaround for now, thanks for the info too.
*** Bug 158636 has been marked as a duplicate of this bug. ***
committed in stream U6 build 55.23. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
It is set to 128 because that matches what is used in the upstream kernels. Is there a documented performance regression with 128 vs. 512? And if it is only for some various workloads, the system can always be tuned to 512 if preferred.
A fix for this issue should have been included in the packages contained in the RHEL4.6 Beta released on RHN (also available at partners.redhat.com). Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot1 on partners.redhat.com. Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on partners.redhat.com. Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html