Created attachment 398832 [details] Full output of iozone outcache results Description of problem: Comparing iozone results with rhel54 and rhel55 191 kernel, outcache tests show greater than 5% regression on read tests Version-Release number of selected component (if applicable): 2.6.18-191.el5 kernel How reproducible: This happens on both Intel and AMD platforms running the 191 kernel Steps to Reproduce: 1. run iozone test 2. analyize results comparing to rhel54 3. see read regressions on outcache reads Actual results: [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.3 FILE 1: results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 314 335 319 522 509 90 33 271 3257 125 286 328 566 561 2 ALL 292 335 313 438 441 88 34 279 3243 116 286 303 448 447 ALL -6.7 . . -16.1 -13.4 . +2.8 +3.1 . -7.0 . -7.7 -20.9 -20.3 Expected results: Within 5% of known kernel (rhel54) Additional info:
Hi, I was able to pin down the regression between following kernels: 2.6.18-172.el5 - no regression First regression on kernel 2.6.18-173.el5 As suggested by Eric Sandeen <sandeen>, reported regression is probably due to changes to cfq scheduler. ========================================================================== * Mon Nov 09 2009 Don Zickus <dzickus> [2.6.18-173.el5] - [acpi] thinkpad_acpi: disable ecnvram brightness on some (Matthew Garrett ) [522745] - [block] cfq-iosched: don't delay queue kick for merged req (Jeff Moyer ) [456181 448130 427709] - [block] cfq-iosched: fix idling interfering with plugging (Jeff Moyer ) [456181 448130 427709] - [block] cfq: separate merged cfqqs if they stop cooperating (Jeff Moyer ) [456181 448130 427709] - [block] cfq: change the meaning of the cfqq_coop flag (Jeff Moyer ) [456181 448130 427709] - [block] cfq: merge cooperating cfq_queues (Jeff Moyer ) [456181 448130 427709] - [block] cfq: calc seek_mean per cfq_queue not per cfq_io_context (Jeff Moyer ) [456181 448130 427709] - [block] cfq-iosched: cache prio_tree root in cfqq->p_root (Jeff Moyer ) [456181 448130 427709] - [block] cfq-iosched: fix aliased req & cooperation detect (Jeff Moyer ) [456181 448130 427709] - [block] cfq-iosched: default seek when not enough samples (Jeff Moyer ) [456181 448130 427709] - [block] cfq-iosched: make seek_mean converge more quick (Jeff Moyer ) [456181 448130 427709] - [block] cfq-iosched: add close cooperator code (Jeff Moyer ) [456181 448130 427709] - [block] cfq-iosched: development update (Jeff Moyer ) [456181 448130 427709] ============================================================================== I have repeated iozone runs with noop scheduler and there is no regression in fact. See bellow. I'm going to upload all log files (perf7.tar.bz2). ============ noop ==================== iozone_2.6.18-172.el5_2010-Mar-12_07h13m_noop.log: KB reclen write rewrite read reread 4194304 64 127240 128754 116527 124276 iozone_2.6.18-173.el5_2010-Mar-12_07h31m_noop.log: KB reclen write rewrite read reread 4194304 64 127325 128609 116471 125859 =================================== =================== cfq ============== iozone_2.6.18-172.el5_2010-Mar-11_17h30m.log: KB reclen write rewrite read reread 4194304 64 175221 181384 123930 124992 iozone_2.6.18-173.el5_2010-Mar-11_17h03m.log: KB reclen write rewrite read reread 4194304 64 174467 179755 108241 122371 ===================================== Thanks Jirka
Created attachment 399644 [details] iozone results showing that regression has occured with cfq scheduler between 172 ans 173 kernels (2.6.18)
Jeff, is this something you can take a look at?
Sure thing, I'll take a look.
Unfortunately, the system I was testing with (which is remote) just went out to lunch. However, I do have the following data to share: cat 2.6.18-164.15.1.el5-iozone.out | grep "[0-9]" | grep -v "[a-zA-Z]" | awk '{print $3 " " $4}' READ REREAD ------------ 62925 59813 58799 60518 60741 58243 59084 59904 59785 59040 cat 2.6.18-191.el5.jcwt3-iozone.out | grep "[0-9]" | grep -v "[a-zA-Z]" | awk '{print $3 " " $4}' READ REREAD ------------ 88317 84000 89304 86522 85475 87300 86514 85240 87710 87566 So, as you can see, the latest kernel is actually 40% faster than the 5.4 kernel in my testing. The command line I used was: iozone -s 2g -r 64k -f /mnt/test/iozone.0 -i 1 -e -c -w The test script creates an ext3 file system, then creates the iozone file, then does a loop of: sync echo 3 > /proc/sys/vm/drop_caches iozone <options> As you can see, I ran the loop 5 times for each kernel version. Here are the statistics: READ (KB/s): mean stddev -164 60266.8 1664.07 -191 87464 1502.69 Tom volunteered to run the -192 kernel through the tests over the weekend.
This is disturbing. My results show exactly the opposite of Jirki's. I show that the 173 kernel introduced a big performance win, not regression. So, this must be a function of the underlying storage. My storage, as you can see from the results, is rather slow. I'm going to need some faster hardware in order to reproduce this and figure out what's going on.
Jeff You are correct as there is no real difference between the 191 and 192 kernels. Iozone completed and did not see any real difference. The machine I ran on is perf7.lab.bos.redhat.com root/100yard-. Your free to play around with it on Monday as I want to confirm this on the Intel platform. Below is the outcache results comparing RHEL54 164 kernel to RHEL55 192 kernel and RHEL55 191 kernel vs RHEL55 192 kernel. Tom [root@perf7 iozone]# ./analysis-iozone.pl -a results_perf7_rhel54_164kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.3 FILE 1: results_perf7_rhel54_164kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 188 180 175 261 258 80 33 186 1636 102 164 174 272 271 2 ALL 183 186 186 229 229 74 37 194 1613 99 165 177 235 235 ALL . +3.0 +6.4 -12.4 -11.5 -7.0 +13.0 +4.1 . . . . -13.8 -13.2 [root@perf7 iozone]# ./analysis-iozone.pl -a results_perf7_rhel55_191kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.3 FILE 1: results_perf7_rhel55_191kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 182 188 186 227 228 74 37 189 1660 97 166 184 228 230 2 ALL 183 186 186 229 229 74 37 194 1613 99 165 177 235 235 ALL . . . . . . . +2.2 . . . . +3.1 +2.1
Intel platform still confirms the outcache regression. I am going to confirm by rerunning RHEL54 to be sure there is no issues with previous testing or the storage has changed. Though comparing 191 to 192 kernels do no show this regression. [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.3 FILE 1: results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 314 335 319 522 509 90 33 271 3257 125 286 328 566 561 2 ALL 298 338 307 468 455 89 34 284 3260 121 289 296 476 471 ALL . . . -10.4 -10.7 . +2.1 +4.9 . . . -9.8 -15.8 -16.1 [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.3 FILE 1: results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 292 335 313 438 441 88 34 279 3243 116 286 303 448 447 2 ALL 298 338 307 468 455 89 34 284 3260 121 289 296 476 471 ALL . . . +6.7 +3.1 . . . . +4.4 . . +6.4 +5.2
Created attachment 400205 [details] re-run fresh install of RHEL54 testing iozone
Ran fresh install of RHEL54 on AMD and compared these results with RHEL55 192 kernel. Still see inconsistencies with outcache F-Read and FRE-Read. Then compared results from running RHEL54 from 1/2/10 to yesterdays runs and see inconsistencies there too. See attachment for results. Going to rerun rhel54 on Intel to see if we see the same inconsistencies
I have blktrace results for the READ tests. Here is the summary for the -164 kernel: Total (sdd): Reads Queued: 200,656, 16,793MiB Writes Queued: 83, 332KiB Read Dispatches: 139,237, 16,793MiB Write Dispatches: 56, 332KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 139,237, 16,793MiB Writes Completed: 56, 332KiB Read Merges: 61,419, 257,648KiB Write Merges: 27, 108KiB IO unplugs: 131,223 Timer unplugs: 0 And here is the summary for the -192 kernel: Total (sdd): Reads Queued: 200,671, 16,793MiB Writes Queued: 100, 400KiB Read Dispatches: 72,282, 16,793MiB Write Dispatches: 67, 400KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 72,282, 16,793MiB Writes Completed: 67, 400KiB Read Merges: 128,389, 8,325MiB Write Merges: 33, 132KiB IO unplugs: 72,217 Timer unplugs: 0 As you can see, the 192 kernel does a better job of merging requests. This means larger requests are sent to the underlying storage, but at the cost of higher latency per request. Given that this storage is so fast, it is more beneficial to send the requests immediately instead of queueing them. The patch that introduced this regression was from the following upstream commit: commit b029195dda0129b427c6e579a3bb3ae752da3a93 Author: Jens Axboe <jens.axboe> Date: Tue Apr 7 11:38:31 2009 +0200 cfq-iosched: don't let idling interfere with plugging When CFQ is waiting for a new request from a process, currently it'll immediately restart queuing when it sees such a request. This doesn't work very well with streamed IO, since we then end up splitting IO that would otherwise have been merged nicely. For a simple dd test, this causes 10x as many requests to be issued as we should have. Normally this goes unnoticed due to the low overhead of requests at the device side, but some hardware is very sensitive to request sizes and there it can cause big slow downs. Signed-off-by: Jens Axboe <jens.axboe> This was included as part of the cfq close cooperator patch set, most likely because delaying the submission of requests cut down on the number of back seeks for applications interleaving I/O between multiple threads. I'll see what I can do to fix this for both cases.
Looking at the distribution of I/O sizes submitted from the file system, I found that most requests were larger than a single page (as you'd expect). Thus, that commit isn't the problem. The problem is that the code was looking at request->data_len instead of request->nr_sectors. request->data_len is used to store the length of a packet command. Thus, for regular I/O, that field is zero. Looking at the right field fixes the performance regression for me on this faster hardware. I will ensure that the dump(8) workload does not regress.
Created attachment 400325 [details] Fix a mistake in the backport of a patch. When backporting commit b029195dda0129b427c6e579a3bb3ae752da3a93: cfq-iosched: don't let idling interfere with plugging I mistakenly translated request->__data_len to request->data_len. The former gets filled in by the upstream kernel for regular I/O requests, the latter gets filled in for PC (packet command) requests in RHEL 5. The right field in the struct request is nr_sectors translated to bytes. In my testing (using perf7), this fixes the problem. I also verified that the fix does not regress the performance improvements for dump(8).
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-194.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
Testing the 194 kernel shows no regression with the outcache testing. To confirm, will run the full suite of iozone tests. Will have results tomorrow AM Tom [root@perf7 iozone]# ./analysis-iozone.pl -a results_perf7_rhel54_164kernel_4gb_ext3_rebaseline/analysis/iozone_outcache_default_analysis+rawdata.log results_perf7_rhel55_194kernel_4gb_ext3_outcache/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.3 FILE 1: results_perf7_rhel54_164kernel_4gb_ext3_rebaseline/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf7_rhel55_194kernel_4gb_ext3_outcache/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 185 191 185 234 233 72 36 181 1590 98 168 181 268 263 2 ALL 190 187 182 242 240 72 37 196 1657 109 164 182 268 265 ALL +2.2 . . +3.4 +3.2 . +3.7 +8.7 +4.2 +10.9 . . . .
Created attachment 401237 [details] 194 kernel iozone results run on AMD
ran full suite of iozone on 194 kernel on AMD and do not see the regression previously seen on earlier kernels. mmap shows -7.3% on recreating writes but I will view that as run to run variability. Running next on intel to confirm this kernel. Tom
Confirmed another suite of tests on Intel platform and do not see the regression from previous kernels. Look at the latest attachment for details but outcache results look good. Tom
Created attachment 401765 [details] Iozone results running 194 kernel on Intel platform
(In reply to comment #23) > ran full suite of iozone on 194 kernel on AMD and do not see the regression > previously seen on earlier kernels. mmap shows -7.3% on recreating writes but I > will view that as run to run variability. Running next on intel to confirm this > kernel. > > Tom Hi Tom, If run your tests on another AMD, can we still see mmap' performance degression on recreating writes? If it's gone there, I think this bug can be set to VERIFIED then.
We did not see any regression on the Intel platform with mmap. I believe it is run to run varability (seen this before). I will have to re baseline another AMD with RHEL54 then test the 194 kernel. Let me see if I have a system available and try to retest AMD platform Tom
Even if you do reproduce an mmap issue, that's a different bugzilla. Let's not conflate the two issues, please. Thanks.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html