Bug 456181 - Read speed of /sbin/dump command is critically slow with CFQ I/O scheduler [NEEDINFO]
Read speed of /sbin/dump command is critically slow with CFQ I/O scheduler
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
All Linux
high Severity urgent
: rc
: ---
Assigned To: Jeffrey Moyer
Red Hat Kernel QE team
https://enterprise.redhat.com/issue-t...
: Regression
Depends On:
Blocks: 499522 533192 483701 485920 525215 533932 5.5TechNotes-Updates 570814
  Show dependency treegraph
 
Reported: 2008-07-21 19:34 EDT by Pankaj Saraf
Modified: 2010-11-22 18:14 EST (History)
26 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Some applications (e.g. dump and nfsd) try to improve disk I/O performance by distributing I/O requests to multiple processes or threads. However, when using the Completely Fair Queuing (CFQ) I/O scheduler, this application design negatively affected I/O performance. In Red Hat Enterprise Linux 5.5, the kernel can now detect and merge cooperating queues, Additionally, the kernel can also detect if the queues stop cooperating, and split them apart again.
Story Points: ---
Clone Of:
: 533932 (view as bug list)
Environment:
Last Closed: 2010-03-30 03:19:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jmoyer: needinfo? (psaraf)
cward: needinfo? (psaraf)


Attachments (Terms of Use)
Implement support for interleaving requests between multiple processes (7.93 KB, patch)
2008-10-09 16:53 EDT, Jeffrey Moyer
no flags Details | Diff
read-test2.tar.gz (1.48 KB, application/x-gzip)
2009-02-02 20:32 EST, Moritoshi Oshiro
no flags Details

  None (edit)
Description Pankaj Saraf 2008-07-21 19:34:00 EDT
+++ This bug was initially created as a clone of Bug #427709 +++

Description of Problem:
This is the REGRESSION issue. This problem didn't occur on RHEL4.

Read speed of /sbin/dump command on RHEL5 is critically slower than RHEL4.6,
with CFQ I/O scheduler.

The average transfer rate on RHEL4.6 is 30,571(kB/s), when the /sbin/dump
command reads the 70GB data and writes to /dev/null with cfq I/O scheduler.
However, the average transfer rate on RHEL5.1 is only 5,786(kB/s), when the
same test is done.

We made the test program that do read date on the same way as /sbin/dump
command. And, we measured read speed of each I/O schedulers.

Each read speed on RHEL4.6 and RHEL5.1 is as follows:

             |    read speed(kB/s)
--------------------------------------
I/O scheduler |  RHEL4.6  |  RHEL5.1
======================================
cfq           |   81,685  |    7,404
deadline      |   81,305  |   87,199
as            |   63,044  |   87,393
noop          |   67,133  |   89,405
--------------------------------------

As a result, we have understood the following.
o Read speed of cfq I/O scheduler of RHEL5.1 is critically slower than RHEL4.6.
o Each read speed of other I/O schedulers of RHEL5.1 is similar to RHEL4.6 or
 better.

So, we think that this is a regression of cfq I/O scheduler.

Version-Release number of selected component:
Red Hat Enterprise Linux Version Number: 5
Release Number: 1
Architecture:
Kernel Version: 2.6.18-53.el5
Related Package Version:
Related Middleware / Application:

Drivers or hardware or architecture dependency:
None

How reproducible:
Always

Step to Reproduce:
1. Extract the archive file(read-test.tar.gz).
  The following files are extracted.
  Makefile
  read-test.c
2. Compile the test program.
  $ make
3. Change root user.
  $ su
4. Run the test program(sdX is devie file name).
  # ./read-test /dev/sdX

Actual Results:
Read speed of cfq I/O scheduler is critically slower than RHEL4.6.

Expected Results:
Read speed of cfq I/O scheduler is not slower than RHEL4.6.

Summary of actions taken to resolve issue:
None.

Location of diagnostic data:
None.

Hardware configuration:
Model: PRIMERGY TX200 S3
CPU: 8
Memory: 4G
Hardware Component Information: None
Configuration info: None
Guest Configuration Info: None

Business Impact:
Errata Request5.2.z

Our customer uses /sbin/dump command to backup their data at
maintenance time. This performance regression of cfq I/O scheduler on
RHEL5.1 has made this maintenance time 10-times longer than on RHEL4.6.
So, our customer is suffering badly on their system operations.

Additional Info:
I attached test program and system environment.

We confirmed that this problem occur on RHEL5.2, too.

========================================================================


Dump and other large file operations are up to 7 times slower than previous
kernels in rhel4.x.  This is due to a bug in the kernel io schedular's
"cfq" implementation.

Please reference the following two Kernel.org bugs for details on this issue:

Dump of ext3 runs very slowly:
http://bugzilla.kernel.org/show_bug.cgi?id=8636

Unusable system (ie slow) when copying large files:
http://bugzilla.kernel.org/show_bug.cgi?id=7372

Until our heroes at Kernel.org corrects this problem,
would you consider implementing the following workaround in
our kernels?

modify the kernel's .config
from:

   CONFIG_DEFAULT_IOSCHED="cfq"

to:

   CONFIG_DEFAULT_IOSCHED="anticipatory"


Many thanks,
-T

-- Additional comment from toddandmargo@verizon.net on 2008-05-17 20:49 EST --
Hi,

   Found an easier workaround.   Just add "elevator=as" to the end of
the grub.conf "kernel" line.  For example: 

	kernel /boot/vmlinuz-2.6.18-53.1.14.el5 ro root=LABEL=/1 rhgb quiet elevator=as

Sped up my "dump" backups by a factor of four.

This suggestions should be a lot easier to implement than recompiling the kernel.

Many thanks,
-T
Comment 1 RHEL Product and Program Management 2008-07-21 19:45:04 EDT
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 2 Jeffrey Moyer 2008-07-22 11:28:32 EDT
I'll take a look at this.
Comment 5 Jeffrey Moyer 2008-08-06 13:36:12 EDT
I talked with the CFQ author about this (Jens Axboe), and he is aware of the problem and willing to help out.  We'll update the bugzilla when we have test patches or packages.
Comment 10 Jeffrey Moyer 2008-08-22 17:16:28 EDT
A workaround, for the time being, is to set the slice_idle to 0 during backups.  I would restore it to its default value after backups are complete, though.  You can tune this value by echoing numbers to /sys/block/<blockdev>/queue/iosched/slice_idle.  For example, if your device is /dev/sdb, you would do the following:

echo 0 > /sys/block/sdb/queue/iosched/slice_idle
Comment 18 Tony Fu 2008-09-09 20:19:52 EDT
User psaraf@redhat.com's account has been closed
Comment 19 Jeffrey Moyer 2008-10-09 16:53:12 EDT
Created attachment 319934 [details]
Implement support for interleaving requests between multiple processes

This patch is a backport of some of the close_cooperator changes that were introduced to (and later removed from) the upstream kernel's cfq I/O scheduler implementation.  The intent is to detect multiple processes interleaving sequentialll file I/O.  This patch is still preliminary.  I have tested it with good results against both the read-test reproducer and the dump(8) command.  I am currently working with Jens Axboe to come up with a similar patch for the upstream sources (so that this will not regress again in RHEL 6).
Comment 21 Ric Wheeler 2008-11-25 13:34:17 EST
Moving to 5.4
Comment 23 Alan Matsuoka 2008-12-03 09:08:29 EST
Could you get this customer to try a test package including Jeff's patch?

  * https://bugzilla.redhat.com/attachment.cgi?id=319934
  * https://bugzilla.redhat.com/show_bug.cgi?id=456181#c19
Comment 25 Moritoshi Oshiro 2009-02-02 20:32:31 EST
Created attachment 330700 [details]
read-test2.tar.gz

From FJ:
---
Hi Oshiro-san

I tested CFQ read performance with your kernel
(kernel-PAE-2.6.18-53.1.21) on my machines. The result of
read performance, measured by a new test program (read-test2.tar.gz)
that changes its output from the previous one (read-test.tar.gz),
is as follows.

           | read performance(MB/s)
-------------------------------------
           | deadline  |    cfq    |
-------------------------------------
RHEL5.1     |   74.10   |   8 or 24 |
test kernel |   74.24   |  24 or 74 |

CFQ read performance of test kernel is faster than RHEL5.1. However,
the performance is sometimes critically slow, which is 24 MB/s on
your kernel. Therefore, I think that this problem has not been
completely fixed yet. Additionaly, this performance degradation occurs
on RHEL5.1, too. Below is the usage of this new test program.

$ tar zxvf read-test2.tar.gz
read-test2/
read-test2/test.sh
read-test2/Makefile
read-test2/read-test.c
$ cd read-test2
$ make
gcc -g -Wall -lrt -D _GNU_SOURCE -o read-test2 read-test2.c
$ su
# ./test.sh /dev/sda
***Total Ave 24.687157 MB/sec ***
***Total Ave 74.124758 MB/sec ***
***Total Ave 24.377011 MB/sec ***
***Total Ave 73.683626 MB/sec ***
***Total Ave 24.467749 MB/sec ***
***Total Ave 24.414345 MB/sec ***
***Total Ave 24.389293 MB/sec ***
***Total Ave 74.885984 MB/sec ***
***Total Ave 74.780804 MB/sec ***
***Total Ave 24.365709 MB/sec ***
***Total Ave 74.885984 MB/sec ***
***Total Ave 74.780804 MB/sec ***
...

I do not understand why this performance degradation occurred.
Do you have any information related to this performance degradation?

We need to clarify and fix this degradation. Please investigate it.
I'll continue to test, too.

I'll attach the new test program.
---
Comment 26 RHEL Product and Program Management 2009-02-16 10:07:00 EST
Updating PM score.
Comment 28 Veaceslav Falico 2009-04-02 09:04:43 EDT
Hello,

Any news about this bug?

Thank you!
Comment 29 Jeffrey Moyer 2009-04-02 09:16:51 EDT
The current plan is to backport the iocontext sharing code from upstream and to patch dump to share I/O contexts.
Comment 34 Jeffrey Moyer 2009-05-01 20:19:18 EDT
This work will not make the 5.4 release.

When the problem was initially reported, I talked to Jens Axboe about it, and he seemed receptive to the idea of adding some code to CFQ to detect processes interleaving I/Os.  When I came up with a first patch for this, he then suggested that we would be better off solving the problem in the applications themselves, by having the applications explicitly share I/O contexts (using sys_clone and the CLONE_IO flag*).  I wrote a patch for dump to do this very thing, and it did solve the problem.  However, the list of applications suffering from this kept growing.  The applications I know of that perform interleaved reads between multiple processes include:

dump
nfsd
qemu's posix aio backend
one of the iSCSI target mode implementations
a third-party volume manager

It is evident that this is not too uncommon of a programming paradigm, so Jens decided to take the close cooperator patch set into 2.6.30.  However, the implementation he merged was not quite ready for merging as it can cause some processes to be starved.  I've been working with him to fix the problem properly while preserving fairness.  In the end, the solution may involve a combination of detecting cooperating processes and sharing I/O contexts between them automatically.

This issue is my number one priority, and I will keep this bugzilla updated as progress is made.

* Note that shared I/O contexts (and the CLONE_IO flag) are not supported in RHEL 5, otherwise I would have made that fix available for the 5.4 release.
Comment 39 Jeffrey Moyer 2009-10-30 17:21:02 EDT
I put together another test kernel that implements close cooperator detection logic, and merges the cfq_queue's associated with cooperating processes.  The result is that we get a good speedup.  In 100 runs of the read-test2 program (written to simulate the I/O pattern of the dump utility), these are the throughput numbers in MB/s:

Deadline:
Avg:       101.26907
Std. Dev.:  17.59767

CFQ:
Avg:       100.14914
Std. Dev.:  17.42747

Most of the runs saw 105MB/s, but there were some outliers in the 28-30MB/s range.  I looked into those cases, and found that the cause was processes were scheduled in just the wrong order to introduce seeks into the workload.  Unfortunately, I haven't come up with a good solution for that particular problem, though I'll note that the problem affects other I/O schedulers as well.  Upstream does not exhibit this behaviour, and I believe it may be due to the rewritten readahead code, but I can't be certain without further investigation.

Without the patch set applied, the numbers for cfq were in the 7-10MB/s range.

I wasn't able to test nfs server performance as my test lab was experiencing some networking issue.  I'll get that testing underway once that problem is resolved.

I've uploaded a test kernel here:
  http://people.redhat.com/jmoyer/cfq-cc/

Please take it for a spin and report your results.  If you'd like to test on an architecture other than x86_64, just let me know and I'll kick off a build for whatever architecture is required.
Comment 41 Jeffrey Moyer 2009-11-09 10:25:06 EST
I've kicked off a build for i686 and will update this bug when that build is complete.  In the mean time, I've uploaded the srpm to the location listed above.
Comment 42 Jeffrey Moyer 2009-11-09 15:17:47 EST
An i686 kernel rpm is now available at:
  http://people.redhat.com/jmoyer/cfq-cc/

Happy testing!
Comment 43 Don Zickus 2009-11-10 11:50:26 EST
in kernel-2.6.18-173.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 45 Jeffrey Moyer 2009-11-25 15:14:24 EST
I posted one additional patch for this to rhkernel-list for review.
Comment 47 Don Zickus 2009-12-04 13:58:49 EST
in kernel-2.6.18-177.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 49 Jeffrey Moyer 2010-01-11 09:44:44 EST
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
Some applications (including dump and nfsd) try to improve disk I/O performance by distributing I/O requests to multiple processes or threads.  When using the CFQ I/O scheduler, this application design actually hurt performance, as the I/O scheduler would try to provide fairness between the processes or threads.  This kernel contains a fix for this problem by detecting cooperating queues and merging them together.  If the queues stop issuing requests close to one another, then they are broken apart again.
Comment 51 Ryan Lerch 2010-02-01 23:49:46 EST
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-Some applications (including dump and nfsd) try to improve disk I/O performance by distributing I/O requests to multiple processes or threads.  When using the CFQ I/O scheduler, this application design actually hurt performance, as the I/O scheduler would try to provide fairness between the processes or threads.  This kernel contains a fix for this problem by detecting cooperating queues and merging them together.  If the queues stop issuing requests close to one another, then they are broken apart again.+Some applications (e.g. dump and nfsd) try to improve disk I/O performance by distributing I/O requests to multiple processes or threads. However, when using the Completely Fair Queuing (CFQ) I/O scheduler, this application design negatively affected I/O performance. In Red Hat Enterprise Linux 5.5, the kernel can now detect and merge cooperating queues, Additionally, the kernel can also detect if the queues stop cooperating, and split them apart again.
Comment 52 Chris Ward 2010-02-11 05:21:29 EST
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.
Comment 62 errata-xmlrpc 2010-03-30 03:19:03 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Note You need to log in before you can comment on or make changes to this bug.