Created attachment 503495 [details] Full analysis of iozone Description of problem: Iozone outcache testing show over 12% degradation in performance compared to RHEL55 in ext3 filesystem Version-Release number of selected component (if applicable): kernel 2.6.18-238.el5 How reproducible: every run shows this degradation as well as DirectIO with initial writes down 7.7% Steps to Reproduce: 1. build rhel57 system 2. 3. Actual results: [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_rhel57_2.6.18-264.el5/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.4 FILE 1: results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf4_rhel57_2.6.18-264.el5/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 299 336 328 456 455 88 39 279 3064 121 278 311 468 466 2 ALL 286 294 276 461 450 88 40 279 3033 121 242 273 458 453 ALL . -12.5 -15.9 . . . +3.7 . . . -12.8 -12.3 . . Expected results: Should be within 5% of RHEL55 Additional info: Attached is the full report of iozone
It's important to note that ttracy also tested this against the RHEL5.6 -238 kernel. Out of cache results were virtually identical to -194, so this is a RHEL5.7 regression We have begun bisecting kernels. Barry
(In reply to comment #1) > We have begun bisecting kernels. thank you! (so it's between 194 and 238?)
Is this being tested with barriers on or off (defaults are off ...)?
This is ext3 with barriers off
Barry says barriers are off, but just an FYI: Bug #667673 fixed an issue where synchronous writes were losing their barriers. Fixing that will slow some things down, and this commit: 48fd4f93a00eac844678629f2f00518e146ed30d block: submit_bh() inadvertently discards barrier flag on a sync write fixed it. It also means that barriers will actually work again, which will slow things down when they're in use. IOW, I would expect a pretty big slowdown on ext4 (defaults) or ext3 (with barriers). Just a note, for future testing. Doesn't seem to explain a regression with ext3 defaults, though. -Eric
Created attachment 503880 [details] outcache comparison with 246,248,251 kernels
Eric Your bugzilla was added in the 253 kernel. I have triaged the kernels and have narrowed it down to the 249 (not available) or the 250 kernel (testing now). I have attached tests from the following 246 kernel no regression 248 kernel no regression 251 kernel regression I have attached the results and you can see where the regression is. Barry is going to build me a 249 kernel to try to narrow done further Tom
The regression is in the 250 kernel and it is not in the 248 kernel. Looking at the changelog. Waiting for the 249 kernel to be built. Tom Comparing the 248 and 250 kernel with outcache results [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_rhel57_2.6.18-248.el5_outcache/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_rhel57_2.6.18-250.el5_outcache/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.4 FILE 1: results_perf4_rhel57_2.6.18-248.el5_outcache/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf4_rhel57_2.6.18-250.el5_outcache/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 301 330 313 474 474 88 38 277 3068 124 280 323 469 473 2 ALL 283 280 263 462 464 88 36 275 3125 126 233 266 465 475 ALL -5.9 -14.9 -15.8 . . . . . . . -16.9 -17.6 . .
Created attachment 504098 [details] Full outache report comparing 248 and 250 kernels
Tom, thanks for doing all the bisecting! Do you guys plan to test 249 as well, or should we start digging through the 248->250 delta? Thanks, -Eric
Eric Barry built a 249 kernel and just from watching the beginning of the test, we could see the regression is in this kernel. Looking at the changelog, the only patch that could affect the performance was the following. [mm] writeback: fix queue handling in blk_congestion_wait (Jeff Layton) [516490] sure enough Barry pulled out this patch and the outcache regression is gone. Attached is the output (for better viewing). Going to run the full suite of tests but looks like we found the culprit Tom [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_rhel57_2.6.18-249.el5.NOblkcongwait_outcache/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.4 FILE 1: results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf4_rhel57_2.6.18-249.el5.NOblkcongwait_outcache/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 299 336 328 456 455 88 39 279 3064 121 278 311 468 466 2 ALL 303 332 312 468 473 88 39 277 3122 128 281 313 480 486 ALL . . . +2.8 +3.8 . . . . +5.7 . . +2.6 +4.2
Created attachment 504433 [details] Full analysis with noblkcongwait patch
Jeff, please see comment #11 ...
Thanks for the bug report. I've cc'ed Kosuke on this bug to see if he would like to comment. For now, I suggest that we pull this patch from the kernel since the benefit to NFS is not worth the performance regression elsewhere. We can revisit it for 5.8 in the event that NEC comes up with a fix.
Kosuke -- would you like to comment or offer thoughts on why this causes that workload to regress? I confess I don't have a strong feel for this area of the code... At this point, I'm going to propose that we pull this patch from 5.7. We can revisit it for 5.8 if you come up with a fix for the performance problems it introduces.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Did a complete iozone run and the removal of the suspected patch removes all regressions. Barry is building a 267 kernel without the following patch [mm] writeback: fix queue handling in blk_congestion_wait (Jeff Layton) [516490] See the report in the attached file. Tom
Created attachment 504544 [details] Comparing RHEL55 194 kernel with RHEL57 249 kernel with NOblkcongwait patch
> Kosuke -- would you like to comment or offer thoughts on why this causes that > workload to regress? I confess I don't have a strong feel for this area of the > code... > > At this point, I'm going to propose that we pull this patch from 5.7. We can > revisit it for 5.8 if you come up with a fix for the performance problems it > introduces. the patch tries to prevent NFS writes from generating too many dirty pages by adding a 100ms delay in balance_dirty_pages() when the backing device is congested. This delay may be slowing down writes to local disks. The delay in the upstream code starts with 1ms and has an exponential backoff (up to 100ms). However, upstream balance_dirty_pages is pretty different because of per-BDI accounting. I think it would be better if the delay could be introduced for NFS backing devices only, but I couldn't find a good way to distinguish NFS backing devices in the current code. Jeff, do you have any good ideas? By removing this patch, the number of dirty and unstable pages will be large under heavy NFS write workload (because the NFS congestion control limits the number of writeback pages). The system will have bad response under heavy NFS write workload.
(In reply to comment #19) > I think it would be better if the delay could be introduced for NFS > backing devices only, but I couldn't find a good way to distinguish NFS > backing devices in the current code. Jeff, do you have any good ideas? > Well...I suppose we could define a new bdi->capabilities flag, and set that only on NFS BDI's. Then you could use that to determine whether to use your new code or not. I'll see if I can come up with a patch...
Created attachment 504654 [details] possible patch -- set a flag in the BDI to indicate whether to do the new code Here's a possible patch -- it adds a new flag to the bdi->capabilities that we can use to indicate whether to do the new code in balance_dirty_pages. Tom & Barry -- could you run this through a test and let me know whether it also fixes the issue?
Jeff If you can build a kernel (latest build) with this patch, then I can test it out. Does not take long to see the regression Tom
Hi Tom, Here are some my results. DIRECTIO RESULTS BETWEEN 5.6 and RHEL5.7-Server-20110601.0 ================================================================================ FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ =====---------------------------------------------------------------------------------- 1 ALL 65.3 75.2 70.8 59.2 98.8 60.6 58.6 44.4 74.6 59.2 2 ALL 64.5 73.6 71.1 55.2 99.4 60.5 57.7 43.8 74.6 58.9 ALL . . . -6.7 . . . . . . -6.7 I see on read. Now I running IOZONE on 2.6.18-265.el5 and ext3. I already have result for incache test. I see stable results. As soon as I will have outcache and directio for this build I will post those results here.
Ok, I've placed a kernel with the patch in comment #23 on my people page: http://people.redhat.com/jlayton/ Kosuke, if you could also test that kernel (or a recent beta kernel with patch), then that would be helpful.
With the 2.6.18-267.el5.jtltest.139 kernel, we do not see the Outcache regression seen with 249 kernels and higher. All the tests are in line but DirectIo initial writes drop -7.5% which should not be affected by this patch. Attached are the reports of the tests. Tom
Created attachment 504859 [details] rhel55 194 kernel with RHEL57 2.6.18-267.el5.jtltest.139 comparisons
(In reply to comment #27) > With the 2.6.18-267.el5.jtltest.139 kernel, we do not see the Outcache > regression seen with 249 kernels and higher. All the tests are in line but > DirectIo initial writes drop -7.5% which should not be affected by this patch. > Attached are the reports of the tests. > So if I'm parsing this correctly, this patch seems to fix the issue for you, but you have an unexplained 7.5% drop in initial DIO writes? Is that reproducible over multiple runs, or did you just see it once?
Jeff Rerunning the DirectIO portion of Iozone to see if there is a run to run variability. Will know more in a few hours Tom
Jeff After a power cycle I re-ran DirectIo portion of iozone and the initial writes came back Call it run to run variability. Everything looks good with this patch Tom [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_directio_default_analysis+rawdata.log results_perf4_rhel57_2.6.18-267.el5.jtltest_dio_1/analysis/iozone_directio_default_analysis+rawdata.log IOZONE Analysis tool V1.4 FILE 1: results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_directio_default_analysis+rawdata.log FILE 2: results_perf4_rhel57_2.6.18-267.el5.jtltest_dio_1/analysis/iozone_directio_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 342 161 153 235 250 191 128 214 113 200 552 1019 2925 3127 2 ALL 342 156 146 233 255 191 127 210 109 202 563 1047 3004 3202 ALL . . . . +2.1 . . . . . +2.0 +2.7 +2.7 +2.4
Thanks for rerunning the test. I'll go ahead and propose this patch.
Jeff, I've measured the NFS performance on kernel-2.6.18-267.el5.jtltest.139. The performance is almost same as the old kernel. I think it's OK. NFS client and NFS server running kernel-2.6.18-267.el5.jtltest.139 real user sys commit write 214.48 0.01 6.05 10 262171 199.26 0.01 6.54 10 262164 205.34 0.02 6.85 11 262175 NFS client and NFS server running kernel-2.6.18-264.el5 real user sys commit write 208.74 0.00 6.46 10 262175 212.57 0.01 7.07 11 262177 205.84 0.01 6.47 11 262188
Patch(es) available in kernel-2.6.18-269.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Basic test is done, result is PASS: kernel:2.6.18-269.el5 distro:RHEL5.7-Server-20110623.0.n machine:intel-d3x1311-01.rhts.eng.bos.redhat.com test case: /kernel/filesystems/nfs/connectathon
Hi Tom, Where could I get "IOZONE Analysis tool V1.4"? Thanks.
Hi, I running iozone outcache test on file sizes between 512MB - 2048MB. I don't see regression there. This is my configuration: Test = /performance/iozone_devel_with_library/certification Date = Mon Jun 20 14:40:58 CEST 2011 Hostname = dell-per210-01.lab.eng.brq.redhat.com Arch = x86_64 Distro = RHEL5.7-Server-20110615.0.n Kernel = 2.6.18-267.el5 SElinux mode = Enforcing CPU count : speeds MHz = 4 : 4 @ 2395.000 CPU model name = Intel(R) Xeon(R) CPU X3430 @ 2.40GHz CPU cache size = 8192 KB BIOS Information Version : Date = 1.2.1 : 01/28/2010 Total Memory = 7972 (MB) Tuned profile = default RHTS information RHTS JOBID = 99821 RHTS RECIPEID = 203269 RHTS RECIPETESTID = 2210875 FILESYSTEM configuration Filesystems to test (requested) = ext3 Filesystems to test (actual) = ext3 Mount point of filesystem under test = /RHTSspareLUN1 LUN for filesystem under test = /dev/sda1 readahead for LUN above = 128 KB Speed test: hdparm -tT /dev/sda1 Timing buffered disk reads => 105.60 MB/sec Timing cached reads => 12863.82 MB/sec Free Diskspace on /RHTSspareLUN1 = 59GB I/O scheduler on testing device = cfq IOZONE version = 3.327 Page Size [smallest file to work on] = 4 (KB) 90% of Free disk space available = 50641 (MB) Command line options = -r 5 --rsync --eatmem2 --swap --iozone_umount --iozone_rec_size --incache --directio --outofcache -f ext3 In Cache test maximum file size = 4096 (MB) Out of Cache test minimum file size = 512 (MB) Out of Cache test maximum file size = 2048 (MB) Free memory after running eatmem = 512 (MB) Direct I/O test maximum file size = 4096 (MB) Number of sequential runs = 5 Results are attached and compared to RHEL56GA.
Created attachment 510044 [details] RHEL6GAvsRHEL5.7-20110615 OUTofCACHEresults
(In reply to comment #39) > Hi Tom, > > Where could I get "IOZONE Analysis tool V1.4"? Thanks. Hi, This script you can download from CVS. http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/performance/iozone_devel_with_library/ Have a nice day! Kamil
Thank you, Kamil.
Tested the 269 kernel running iozone and the regression has been removed on the outcache testing. In fact, Recre Write and FRE Write has improve 2.6% and 6.9% accordingly. InCache testing has improved 3.4% overall as well as Fsync testing at 2.7%. Below is a summary of the outcache test results and will attach a full analysis of the report. Tom [root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_rhel57_2.6.18-269.el5/analysis/iozone_outcache_default_analysis+rawdata.log IOZONE Analysis tool V1.4 FILE 1: results_perf4_rhel55_2.6.18-194.el5_rerun/analysis/iozone_outcache_default_analysis+rawdata.log FILE 2: results_perf4_rhel57_2.6.18-269.el5/analysis/iozone_outcache_default_analysis+rawdata.log TABLE: SUMMARY of ALL FILE and RECORD SIZES Results in MB/sec FILE & REC ALL INIT RE RE RANDOM RANDOM BACKWD RECRE STRIDE F FRE F FRE FILE SIZES (KB) IOS WRITE WRITE READ READ READ WRITE READ WRITE READ WRITE WRITE READ READ =====-------------------------------------------------------------------------------------------------------------- 1 ALL 299 336 328 456 455 88 39 279 3064 121 278 311 468 466 2 ALL 301 334 333 456 454 88 38 279 3144 120 281 332 466 462 ALL . . . . . . . . +2.6 . . +6.9 . .
Created attachment 510271 [details] Full report of Iozone with 269 kernel
fs regression/stress tests passed. Test on -270 and -272 kernel.
Based on test results above(nfs regression/performance verification/vmm regression/fs regression), I set this bug to VERIFIED.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html