Bug 571818 - Iozone Outcache testing has a greater than 5 % performance regression on reads
Summary: Iozone Outcache testing has a greater than 5 % performance regression on reads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jeff Moyer
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-03-09 16:14 UTC by Tom Tracy
Modified: 2010-03-30 07:35 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:35:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Full output of iozone outcache results (8.13 KB, text/plain)
2010-03-09 16:14 UTC, Tom Tracy
no flags Details
iozone results showing that regression has occured with cfq scheduler between 172 ans 173 kernels (2.6.18) (2.18 KB, application/x-bzip2)
2010-03-12 13:09 UTC, Jiri Hladky
no flags Details
re-run fresh install of RHEL54 testing iozone (12.23 KB, text/plain)
2010-03-15 12:38 UTC, Tom Tracy
no flags Details
Fix a mistake in the backport of a patch. (784 bytes, patch)
2010-03-15 23:02 UTC, Jeff Moyer
no flags Details | Diff
194 kernel iozone results run on AMD (6.09 KB, text/plain)
2010-03-19 12:40 UTC, Tom Tracy
no flags Details
Iozone results running 194 kernel on Intel platform (6.19 KB, text/plain)
2010-03-22 13:45 UTC, Tom Tracy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Tom Tracy 2010-03-09 16:14:06 UTC
Created attachment 398832 [details]
Full output of iozone outcache results

Description of problem:
Comparing iozone results with rhel54 and rhel55 191 kernel, outcache tests show greater than 5% regression on read tests

Version-Release number of selected component (if applicable):
2.6.18-191.el5 kernel

How reproducible:
This happens on both Intel and AMD platforms running the 191 kernel

Steps to Reproduce:
1. run iozone test
2. analyize results comparing to rhel54
3. see read regressions  on outcache reads
  
Actual results:
[root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

IOZONE Analysis tool V1.3

FILE 1: results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log
FILE 2: results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

TABLE:  SUMMARY of ALL FILE and RECORD SIZES                        Results in MB/sec

     FILE & REC      ALL  INIT   RE             RE   RANDOM RANDOM BACKWD  RECRE STRIDE  F      FRE     F      FRE 
FILE SIZES (KB)      IOS  WRITE  WRITE   READ   READ   READ  WRITE   READ  WRITE   READ  WRITE  WRITE   READ   READ
=====--------------------------------------------------------------------------------------------------------------
 1         ALL       314    335    319    522    509     90     33    271   3257    125    286    328    566    561 
 2         ALL       292    335    313    438    441     88     34    279   3243    116    286    303    448    447 
           ALL      -6.7      .      .  -16.1  -13.4      .   +2.8   +3.1      .   -7.0      .   -7.7  -20.9  -20.3 


Expected results:
Within 5% of known kernel (rhel54)

Additional info:

Comment 1 Jiri Hladky 2010-03-12 13:06:26 UTC
Hi,

I was able to pin down the regression between following kernels:

2.6.18-172.el5 - no regression
First regression on kernel 2.6.18-173.el5

As suggested by Eric Sandeen <sandeen>, reported regression is probably due to changes to cfq scheduler. 

==========================================================================
* Mon Nov 09 2009 Don Zickus <dzickus> [2.6.18-173.el5]
- [acpi] thinkpad_acpi: disable ecnvram brightness on some (Matthew Garrett ) [522745]
- [block] cfq-iosched: don't delay queue kick for merged req (Jeff Moyer ) [456181 448130 427709]
- [block] cfq-iosched: fix idling interfering with plugging (Jeff Moyer ) [456181 448130 427709]
- [block] cfq: separate merged cfqqs if they stop cooperating (Jeff Moyer ) [456181 448130 427709]
- [block] cfq: change the meaning of the cfqq_coop flag (Jeff Moyer ) [456181 448130 427709]
- [block] cfq: merge cooperating cfq_queues (Jeff Moyer ) [456181 448130 427709]
- [block] cfq: calc seek_mean per cfq_queue not per cfq_io_context (Jeff Moyer ) [456181 448130 427709]
- [block] cfq-iosched: cache prio_tree root in cfqq->p_root (Jeff Moyer ) [456181 448130 427709]
- [block] cfq-iosched: fix aliased req & cooperation detect (Jeff Moyer ) [456181 448130 427709]
- [block] cfq-iosched: default seek when not enough samples (Jeff Moyer ) [456181 448130 427709]
- [block] cfq-iosched: make seek_mean converge more quick (Jeff Moyer ) [456181 448130 427709]
- [block] cfq-iosched: add close cooperator code (Jeff Moyer ) [456181 448130 427709]
- [block] cfq-iosched: development update (Jeff Moyer ) [456181 448130 427709]
==============================================================================

I have repeated iozone runs with noop scheduler and there is no regression in fact. See bellow. I'm going to upload all log files (perf7.tar.bz2).

============ noop ====================
iozone_2.6.18-172.el5_2010-Mar-12_07h13m_noop.log:
     KB  reclen   write rewrite    read    reread
4194304      64  127240  128754   116527   124276

iozone_2.6.18-173.el5_2010-Mar-12_07h31m_noop.log:
     KB  reclen   write rewrite    read    reread
4194304      64  127325  128609   116471   125859
===================================

=================== cfq ==============
iozone_2.6.18-172.el5_2010-Mar-11_17h30m.log:
     KB  reclen   write rewrite    read    reread
4194304      64  175221  181384   123930   124992
iozone_2.6.18-173.el5_2010-Mar-11_17h03m.log:
     KB  reclen   write rewrite    read    reread
4194304      64  174467  179755   108241   122371
=====================================

Thanks
Jirka

Comment 2 Jiri Hladky 2010-03-12 13:09:29 UTC
Created attachment 399644 [details]
iozone results showing that regression has occured with cfq scheduler between 172 ans 173 kernels (2.6.18)

Comment 3 Ric Wheeler 2010-03-12 14:05:27 UTC
Jeff, is this something you can take a look at?

Comment 4 Jeff Moyer 2010-03-12 14:37:52 UTC
Sure thing, I'll take a look.

Comment 5 Jeff Moyer 2010-03-12 19:16:20 UTC
Unfortunately, the system I was testing with (which is remote) just went out to lunch.  However, I do have the following data to share:

cat 2.6.18-164.15.1.el5-iozone.out  | grep "[0-9]" | grep -v "[a-zA-Z]" | awk '{print $3 " " $4}'
READ  REREAD
------------
62925 59813
58799 60518
60741 58243
59084 59904
59785 59040

cat 2.6.18-191.el5.jcwt3-iozone.out | grep "[0-9]" | grep -v "[a-zA-Z]" | awk '{print $3 " " $4}'
READ  REREAD
------------
88317 84000
89304 86522
85475 87300
86514 85240
87710 87566

So, as you can see, the latest kernel is actually 40% faster than the 5.4 kernel in my testing.  The command line I used was:

iozone -s 2g -r 64k -f /mnt/test/iozone.0 -i 1 -e -c -w

The test script creates an ext3 file system, then creates the iozone file, then does a loop of:

sync
echo 3 > /proc/sys/vm/drop_caches
iozone <options>

As you can see, I ran the loop 5 times for each kernel version.  Here are the statistics:

READ (KB/s):
       mean    stddev
-164  60266.8  1664.07  
-191  87464    1502.69

Tom volunteered to run the -192 kernel through the tests over the weekend.

Comment 6 Jeff Moyer 2010-03-12 20:12:36 UTC
This is disturbing.  My results show exactly the opposite of Jirki's.  I show that the 173 kernel introduced a big performance win, not regression.  So, this must be a function of the underlying storage.  My storage, as you can see from the results, is rather slow.  I'm going to need some faster hardware in order to reproduce this and figure out what's going on.

Comment 7 Tom Tracy 2010-03-13 15:28:03 UTC
Jeff
        You are correct as there is no real difference between the 191 and 192 kernels. Iozone completed and did not see any real difference. The machine I ran on is perf7.lab.bos.redhat.com root/100yard-. Your free to play around with it on Monday as I want to confirm this on the Intel platform. Below is the outcache results comparing RHEL54 164 kernel to RHEL55 192 kernel and RHEL55 191 kernel vs RHEL55 192 kernel.

Tom

[root@perf7 iozone]# ./analysis-iozone.pl -a results_perf7_rhel54_164kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

IOZONE Analysis tool V1.3

FILE 1: results_perf7_rhel54_164kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log
FILE 2: results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

TABLE:  SUMMARY of ALL FILE and RECORD SIZES                        Results in MB/sec

     FILE & REC      ALL  INIT   RE             RE   RANDOM RANDOM BACKWD  RECRE STRIDE  F      FRE     F      FRE
FILE SIZES (KB)      IOS  WRITE  WRITE   READ   READ   READ  WRITE   READ  WRITE   READ  WRITE  WRITE   READ   READ
=====--------------------------------------------------------------------------------------------------------------
 1         ALL       188    180    175    261    258     80     33    186   1636    102    164    174    272    271
 2         ALL       183    186    186    229    229     74     37    194   1613     99    165    177    235    235
           ALL         .   +3.0   +6.4  -12.4  -11.5   -7.0  +13.0   +4.1      .      .      .      .  -13.8  -13.2

[root@perf7 iozone]# ./analysis-iozone.pl -a results_perf7_rhel55_191kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

IOZONE Analysis tool V1.3

FILE 1: results_perf7_rhel55_191kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log
FILE 2: results_perf7_rhel55_192kernel_4gb_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

TABLE:  SUMMARY of ALL FILE and RECORD SIZES                        Results in MB/sec

     FILE & REC      ALL  INIT   RE             RE   RANDOM RANDOM BACKWD  RECRE STRIDE  F      FRE     F      FRE
FILE SIZES (KB)      IOS  WRITE  WRITE   READ   READ   READ  WRITE   READ  WRITE   READ  WRITE  WRITE   READ   READ
=====--------------------------------------------------------------------------------------------------------------
 1         ALL       182    188    186    227    228     74     37    189   1660     97    166    184    228    230
 2         ALL       183    186    186    229    229     74     37    194   1613     99    165    177    235    235
           ALL         .      .      .      .      .      .      .   +2.2      .      .      .      .   +3.1   +2.1

Comment 8 Tom Tracy 2010-03-14 16:06:32 UTC
Intel platform still confirms the outcache regression. I am going to confirm by rerunning RHEL54 to be sure there is no issues with previous testing or the storage has changed. Though comparing 191 to 192 kernels do no show this regression.

[root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

IOZONE Analysis tool V1.3

FILE 1: results_perf4_ext3_rhel54_164kernel_4GB/analysis/iozone_outcache_default_analysis+rawdata.log
FILE 2: results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

TABLE:  SUMMARY of ALL FILE and RECORD SIZES                        Results in MB/sec

     FILE & REC      ALL  INIT   RE             RE   RANDOM RANDOM BACKWD  RECRE STRIDE  F      FRE     F      FRE 
FILE SIZES (KB)      IOS  WRITE  WRITE   READ   READ   READ  WRITE   READ  WRITE   READ  WRITE  WRITE   READ   READ
=====--------------------------------------------------------------------------------------------------------------
 1         ALL       314    335    319    522    509     90     33    271   3257    125    286    328    566    561 
 2         ALL       298    338    307    468    455     89     34    284   3260    121    289    296    476    471 
           ALL         .      .      .  -10.4  -10.7      .   +2.1   +4.9      .      .      .   -9.8  -15.8  -16.1 




[root@perf4 iozone]# ./analysis-iozone.pl -a results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

IOZONE Analysis tool V1.3

FILE 1: results_perf4_ext3_rhel55_191kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log
FILE 2: results_perf4_ext3_rhel55_192kernel_4GB_ext3/analysis/iozone_outcache_default_analysis+rawdata.log

TABLE:  SUMMARY of ALL FILE and RECORD SIZES                        Results in MB/sec

     FILE & REC      ALL  INIT   RE             RE   RANDOM RANDOM BACKWD  RECRE STRIDE  F      FRE     F      FRE 
FILE SIZES (KB)      IOS  WRITE  WRITE   READ   READ   READ  WRITE   READ  WRITE   READ  WRITE  WRITE   READ   READ
=====--------------------------------------------------------------------------------------------------------------
 1         ALL       292    335    313    438    441     88     34    279   3243    116    286    303    448    447 
 2         ALL       298    338    307    468    455     89     34    284   3260    121    289    296    476    471 
           ALL         .      .      .   +6.7   +3.1      .      .      .      .   +4.4      .      .   +6.4   +5.2

Comment 9 Tom Tracy 2010-03-15 12:38:46 UTC
Created attachment 400205 [details]
re-run fresh install of RHEL54  testing iozone

Comment 10 Tom Tracy 2010-03-15 12:44:03 UTC
Ran fresh install of RHEL54 on AMD and compared these results with RHEL55 192 kernel. Still see inconsistencies with outcache F-Read and FRE-Read. Then compared results from running RHEL54 from 1/2/10 to yesterdays runs and see inconsistencies there too. See attachment for results. Going to rerun rhel54 on Intel to see if we see the same inconsistencies

Comment 11 Jeff Moyer 2010-03-15 18:54:33 UTC
I have blktrace results for the READ tests.  Here is the summary for the -164 kernel:

Total (sdd):
 Reads Queued:     200,656,   16,793MiB	 Writes Queued:          83,      332KiB
 Read Dispatches:  139,237,   16,793MiB	 Write Dispatches:       56,      332KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:  139,237,   16,793MiB	 Writes Completed:       56,      332KiB
 Read Merges:       61,419,  257,648KiB	 Write Merges:           27,      108KiB
 IO unplugs:       131,223        	 Timer unplugs:           0

And here is the summary for the -192 kernel:

Total (sdd):
 Reads Queued:     200,671,   16,793MiB	 Writes Queued:         100,      400KiB
 Read Dispatches:   72,282,   16,793MiB	 Write Dispatches:       67,      400KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:   72,282,   16,793MiB	 Writes Completed:       67,      400KiB
 Read Merges:      128,389,    8,325MiB	 Write Merges:           33,      132KiB
 IO unplugs:        72,217        	 Timer unplugs:           0

As you can see, the 192 kernel does a better job of merging requests.  This means larger requests are sent to the underlying storage, but at the cost of higher latency per request.  Given that this storage is so fast, it is more beneficial to send the requests immediately instead of queueing them.  The patch that introduced this regression was from the following upstream commit:

    commit b029195dda0129b427c6e579a3bb3ae752da3a93
    Author: Jens Axboe <jens.axboe>
    Date:   Tue Apr 7 11:38:31 2009 +0200
    
        cfq-iosched: don't let idling interfere with plugging
    
        When CFQ is waiting for a new request from a process, currently it'll
        immediately restart queuing when it sees such a request. This doesn't
        work very well with streamed IO, since we then end up splitting IO
        that would otherwise have been merged nicely. For a simple dd test,
        this causes 10x as many requests to be issued as we should have.
        Normally this goes unnoticed due to the low overhead of requests
        at the device side, but some hardware is very sensitive to request
        sizes and there it can cause big slow downs.
    
        Signed-off-by: Jens Axboe <jens.axboe>

This was included as part of the cfq close cooperator patch set, most likely because delaying the submission of requests cut down on the number of back seeks for applications interleaving I/O between multiple threads.

I'll see what I can do to fix this for both cases.

Comment 12 Jeff Moyer 2010-03-15 20:54:41 UTC
Looking at the distribution of I/O sizes submitted from the file system, I found that most requests were larger than a single page (as you'd expect).  Thus, that commit isn't the problem.  The problem is that the code was looking at request->data_len instead of request->nr_sectors.  request->data_len is used to store the length of a packet command.  Thus, for regular I/O, that field is zero.  Looking at the right field fixes the performance regression for me on this faster hardware.  I will ensure that the dump(8) workload does not regress.

Comment 13 Jeff Moyer 2010-03-15 23:02:12 UTC
Created attachment 400325 [details]
Fix a mistake in the backport of a patch.

When backporting commit b029195dda0129b427c6e579a3bb3ae752da3a93:
        cfq-iosched: don't let idling interfere with plugging
I mistakenly translated request->__data_len to request->data_len.  The former gets filled in by the upstream kernel for regular I/O requests, the latter gets filled in for PC (packet command) requests in RHEL 5.  The right field in the struct request is nr_sectors translated to bytes.

In my testing (using perf7), this fixes the problem.  I also verified that the fix does not regress the performance improvements for dump(8).

Comment 16 RHEL Program Management 2010-03-15 23:30:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 19 Jarod Wilson 2010-03-17 15:53:25 UTC
in kernel-2.6.18-194.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 21 Tom Tracy 2010-03-18 12:45:47 UTC
Testing the 194 kernel shows no regression with the outcache testing. To confirm, will run the full suite of iozone tests. Will have results tomorrow AM

Tom

[root@perf7 iozone]# ./analysis-iozone.pl -a results_perf7_rhel54_164kernel_4gb_ext3_rebaseline/analysis/iozone_outcache_default_analysis+rawdata.log results_perf7_rhel55_194kernel_4gb_ext3_outcache/analysis/iozone_outcache_default_analysis+rawdata.log

IOZONE Analysis tool V1.3

FILE 1: results_perf7_rhel54_164kernel_4gb_ext3_rebaseline/analysis/iozone_outcache_default_analysis+rawdata.log
FILE 2: results_perf7_rhel55_194kernel_4gb_ext3_outcache/analysis/iozone_outcache_default_analysis+rawdata.log

TABLE:  SUMMARY of ALL FILE and RECORD SIZES                        Results in MB/sec

     FILE & REC      ALL  INIT   RE             RE   RANDOM RANDOM BACKWD  RECRE STRIDE  F      FRE     F      FRE 
FILE SIZES (KB)      IOS  WRITE  WRITE   READ   READ   READ  WRITE   READ  WRITE   READ  WRITE  WRITE   READ   READ
=====--------------------------------------------------------------------------------------------------------------
 1         ALL       185    191    185    234    233     72     36    181   1590     98    168    181    268    263 
 2         ALL       190    187    182    242    240     72     37    196   1657    109    164    182    268    265 
           ALL      +2.2      .      .   +3.4   +3.2      .   +3.7   +8.7   +4.2  +10.9      .      .      .      .

Comment 22 Tom Tracy 2010-03-19 12:40:46 UTC
Created attachment 401237 [details]
194 kernel iozone results run on AMD

Comment 23 Tom Tracy 2010-03-19 12:43:10 UTC
ran full suite of iozone on 194 kernel on AMD and do not see the regression previously seen on earlier kernels. mmap shows -7.3% on recreating writes but I will view that as run to run variability. Running next on intel to confirm this kernel. 

Tom

Comment 24 Tom Tracy 2010-03-22 13:43:52 UTC
Confirmed another suite of tests on Intel platform and do not see the regression from previous kernels. Look at the latest attachment for details but outcache results look good.

Tom

Comment 25 Tom Tracy 2010-03-22 13:45:55 UTC
Created attachment 401765 [details]
Iozone results running 194 kernel on Intel platform

Comment 26 Igor Zhang 2010-03-22 17:20:49 UTC
(In reply to comment #23)
> ran full suite of iozone on 194 kernel on AMD and do not see the regression
> previously seen on earlier kernels. mmap shows -7.3% on recreating writes but I
> will view that as run to run variability. Running next on intel to confirm this
> kernel. 
> 
> Tom    

Hi Tom,
If run your tests on another AMD, can we still see mmap' performance degression on recreating writes? If it's gone there, I think this bug can be set to VERIFIED then.

Comment 27 Tom Tracy 2010-03-22 17:33:35 UTC
We did not see any regression on the Intel platform with mmap. I believe it is run to run varability (seen this before). I will have to re baseline another AMD with  RHEL54 then test the 194 kernel.  Let me see if I have a system available and try to  retest AMD platform

Tom

Comment 28 Jeff Moyer 2010-03-22 17:52:14 UTC
Even if you do reproduce an mmap issue, that's a different bugzilla.  Let's not conflate the two issues, please.

Thanks.

Comment 31 errata-xmlrpc 2010-03-30 07:35:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.