+++ This bug was initially created as a clone of Bug #427709 +++ Description of Problem: This is the REGRESSION issue. This problem didn't occur on RHEL4. Read speed of /sbin/dump command on RHEL5 is critically slower than RHEL4.6, with CFQ I/O scheduler. The average transfer rate on RHEL4.6 is 30,571(kB/s), when the /sbin/dump command reads the 70GB data and writes to /dev/null with cfq I/O scheduler. However, the average transfer rate on RHEL5.1 is only 5,786(kB/s), when the same test is done. We made the test program that do read date on the same way as /sbin/dump command. And, we measured read speed of each I/O schedulers. Each read speed on RHEL4.6 and RHEL5.1 is as follows: | read speed(kB/s) -------------------------------------- I/O scheduler | RHEL4.6 | RHEL5.1 ====================================== cfq | 81,685 | 7,404 deadline | 81,305 | 87,199 as | 63,044 | 87,393 noop | 67,133 | 89,405 -------------------------------------- As a result, we have understood the following. o Read speed of cfq I/O scheduler of RHEL5.1 is critically slower than RHEL4.6. o Each read speed of other I/O schedulers of RHEL5.1 is similar to RHEL4.6 or better. So, we think that this is a regression of cfq I/O scheduler. Version-Release number of selected component: Red Hat Enterprise Linux Version Number: 5 Release Number: 1 Architecture: Kernel Version: 2.6.18-53.el5 Related Package Version: Related Middleware / Application: Drivers or hardware or architecture dependency: None How reproducible: Always Step to Reproduce: 1. Extract the archive file(read-test.tar.gz). The following files are extracted. Makefile read-test.c 2. Compile the test program. $ make 3. Change root user. $ su 4. Run the test program(sdX is devie file name). # ./read-test /dev/sdX Actual Results: Read speed of cfq I/O scheduler is critically slower than RHEL4.6. Expected Results: Read speed of cfq I/O scheduler is not slower than RHEL4.6. Summary of actions taken to resolve issue: None. Location of diagnostic data: None. Hardware configuration: Model: PRIMERGY TX200 S3 CPU: 8 Memory: 4G Hardware Component Information: None Configuration info: None Guest Configuration Info: None Business Impact: Errata Request5.2.z Our customer uses /sbin/dump command to backup their data at maintenance time. This performance regression of cfq I/O scheduler on RHEL5.1 has made this maintenance time 10-times longer than on RHEL4.6. So, our customer is suffering badly on their system operations. Additional Info: I attached test program and system environment. We confirmed that this problem occur on RHEL5.2, too. ======================================================================== Dump and other large file operations are up to 7 times slower than previous kernels in rhel4.x. This is due to a bug in the kernel io schedular's "cfq" implementation. Please reference the following two Kernel.org bugs for details on this issue: Dump of ext3 runs very slowly: http://bugzilla.kernel.org/show_bug.cgi?id=8636 Unusable system (ie slow) when copying large files: http://bugzilla.kernel.org/show_bug.cgi?id=7372 Until our heroes at Kernel.org corrects this problem, would you consider implementing the following workaround in our kernels? modify the kernel's .config from: CONFIG_DEFAULT_IOSCHED="cfq" to: CONFIG_DEFAULT_IOSCHED="anticipatory" Many thanks, -T -- Additional comment from toddandmargo on 2008-05-17 20:49 EST -- Hi, Found an easier workaround. Just add "elevator=as" to the end of the grub.conf "kernel" line. For example: kernel /boot/vmlinuz-2.6.18-53.1.14.el5 ro root=LABEL=/1 rhgb quiet elevator=as Sped up my "dump" backups by a factor of four. This suggestions should be a lot easier to implement than recompiling the kernel. Many thanks, -T
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
I'll take a look at this.
I talked with the CFQ author about this (Jens Axboe), and he is aware of the problem and willing to help out. We'll update the bugzilla when we have test patches or packages.
A workaround, for the time being, is to set the slice_idle to 0 during backups. I would restore it to its default value after backups are complete, though. You can tune this value by echoing numbers to /sys/block/<blockdev>/queue/iosched/slice_idle. For example, if your device is /dev/sdb, you would do the following: echo 0 > /sys/block/sdb/queue/iosched/slice_idle
User psaraf's account has been closed
Created attachment 319934 [details] Implement support for interleaving requests between multiple processes This patch is a backport of some of the close_cooperator changes that were introduced to (and later removed from) the upstream kernel's cfq I/O scheduler implementation. The intent is to detect multiple processes interleaving sequentialll file I/O. This patch is still preliminary. I have tested it with good results against both the read-test reproducer and the dump(8) command. I am currently working with Jens Axboe to come up with a similar patch for the upstream sources (so that this will not regress again in RHEL 6).
Moving to 5.4
Could you get this customer to try a test package including Jeff's patch? * https://bugzilla.redhat.com/attachment.cgi?id=319934 * https://bugzilla.redhat.com/show_bug.cgi?id=456181#c19
Created attachment 330700 [details] read-test2.tar.gz From FJ: --- Hi Oshiro-san I tested CFQ read performance with your kernel (kernel-PAE-2.6.18-53.1.21) on my machines. The result of read performance, measured by a new test program (read-test2.tar.gz) that changes its output from the previous one (read-test.tar.gz), is as follows. | read performance(MB/s) ------------------------------------- | deadline | cfq | ------------------------------------- RHEL5.1 | 74.10 | 8 or 24 | test kernel | 74.24 | 24 or 74 | CFQ read performance of test kernel is faster than RHEL5.1. However, the performance is sometimes critically slow, which is 24 MB/s on your kernel. Therefore, I think that this problem has not been completely fixed yet. Additionaly, this performance degradation occurs on RHEL5.1, too. Below is the usage of this new test program. $ tar zxvf read-test2.tar.gz read-test2/ read-test2/test.sh read-test2/Makefile read-test2/read-test.c $ cd read-test2 $ make gcc -g -Wall -lrt -D _GNU_SOURCE -o read-test2 read-test2.c $ su # ./test.sh /dev/sda ***Total Ave 24.687157 MB/sec *** ***Total Ave 74.124758 MB/sec *** ***Total Ave 24.377011 MB/sec *** ***Total Ave 73.683626 MB/sec *** ***Total Ave 24.467749 MB/sec *** ***Total Ave 24.414345 MB/sec *** ***Total Ave 24.389293 MB/sec *** ***Total Ave 74.885984 MB/sec *** ***Total Ave 74.780804 MB/sec *** ***Total Ave 24.365709 MB/sec *** ***Total Ave 74.885984 MB/sec *** ***Total Ave 74.780804 MB/sec *** ... I do not understand why this performance degradation occurred. Do you have any information related to this performance degradation? We need to clarify and fix this degradation. Please investigate it. I'll continue to test, too. I'll attach the new test program. ---
Updating PM score.
Hello, Any news about this bug? Thank you!
The current plan is to backport the iocontext sharing code from upstream and to patch dump to share I/O contexts.
This work will not make the 5.4 release. When the problem was initially reported, I talked to Jens Axboe about it, and he seemed receptive to the idea of adding some code to CFQ to detect processes interleaving I/Os. When I came up with a first patch for this, he then suggested that we would be better off solving the problem in the applications themselves, by having the applications explicitly share I/O contexts (using sys_clone and the CLONE_IO flag*). I wrote a patch for dump to do this very thing, and it did solve the problem. However, the list of applications suffering from this kept growing. The applications I know of that perform interleaved reads between multiple processes include: dump nfsd qemu's posix aio backend one of the iSCSI target mode implementations a third-party volume manager It is evident that this is not too uncommon of a programming paradigm, so Jens decided to take the close cooperator patch set into 2.6.30. However, the implementation he merged was not quite ready for merging as it can cause some processes to be starved. I've been working with him to fix the problem properly while preserving fairness. In the end, the solution may involve a combination of detecting cooperating processes and sharing I/O contexts between them automatically. This issue is my number one priority, and I will keep this bugzilla updated as progress is made. * Note that shared I/O contexts (and the CLONE_IO flag) are not supported in RHEL 5, otherwise I would have made that fix available for the 5.4 release.
I put together another test kernel that implements close cooperator detection logic, and merges the cfq_queue's associated with cooperating processes. The result is that we get a good speedup. In 100 runs of the read-test2 program (written to simulate the I/O pattern of the dump utility), these are the throughput numbers in MB/s: Deadline: Avg: 101.26907 Std. Dev.: 17.59767 CFQ: Avg: 100.14914 Std. Dev.: 17.42747 Most of the runs saw 105MB/s, but there were some outliers in the 28-30MB/s range. I looked into those cases, and found that the cause was processes were scheduled in just the wrong order to introduce seeks into the workload. Unfortunately, I haven't come up with a good solution for that particular problem, though I'll note that the problem affects other I/O schedulers as well. Upstream does not exhibit this behaviour, and I believe it may be due to the rewritten readahead code, but I can't be certain without further investigation. Without the patch set applied, the numbers for cfq were in the 7-10MB/s range. I wasn't able to test nfs server performance as my test lab was experiencing some networking issue. I'll get that testing underway once that problem is resolved. I've uploaded a test kernel here: http://people.redhat.com/jmoyer/cfq-cc/ Please take it for a spin and report your results. If you'd like to test on an architecture other than x86_64, just let me know and I'll kick off a build for whatever architecture is required.
I've kicked off a build for i686 and will update this bug when that build is complete. In the mean time, I've uploaded the srpm to the location listed above.
An i686 kernel rpm is now available at: http://people.redhat.com/jmoyer/cfq-cc/ Happy testing!
in kernel-2.6.18-173.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
I posted one additional patch for this to rhkernel-list for review.
in kernel-2.6.18-177.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Some applications (including dump and nfsd) try to improve disk I/O performance by distributing I/O requests to multiple processes or threads. When using the CFQ I/O scheduler, this application design actually hurt performance, as the I/O scheduler would try to provide fairness between the processes or threads. This kernel contains a fix for this problem by detecting cooperating queues and merging them together. If the queues stop issuing requests close to one another, then they are broken apart again.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Some applications (including dump and nfsd) try to improve disk I/O performance by distributing I/O requests to multiple processes or threads. When using the CFQ I/O scheduler, this application design actually hurt performance, as the I/O scheduler would try to provide fairness between the processes or threads. This kernel contains a fix for this problem by detecting cooperating queues and merging them together. If the queues stop issuing requests close to one another, then they are broken apart again.+Some applications (e.g. dump and nfsd) try to improve disk I/O performance by distributing I/O requests to multiple processes or threads. However, when using the Completely Fair Queuing (CFQ) I/O scheduler, this application design negatively affected I/O performance. In Red Hat Enterprise Linux 5.5, the kernel can now detect and merge cooperating queues, Additionally, the kernel can also detect if the queues stop cooperating, and split them apart again.
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days