Bug 688927 - Iozone incache testing shows me regression on all testing file systems ext4, ext3, xfs
Summary: Iozone incache testing shows me regression on all testing file systems ext4, ...
Keywords:
Status: CLOSED DUPLICATE of bug 714180
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: John Feeney
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-18 14:42 UTC by Kamil Kolakowski
Modified: 2013-01-10 08:17 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-14 15:00:13 UTC
Target Upstream Version:


Attachments (Terms of Use)
IOZONE results compared between rhel6.0GA and rhel6.1-20110311.3 (14.39 KB, text/plain)
2011-03-18 14:56 UTC, Kamil Kolakowski
no flags Details
POSTMARK results compared between rhel6.0GA and rhel6.1-20110311.3 (2.63 KB, text/plain)
2011-03-18 14:58 UTC, Kamil Kolakowski
no flags Details
IOZONE ext3 results compared between rhel6.0GA and rhel6.1-20110311.3 (14.53 KB, text/plain)
2011-03-18 15:57 UTC, Kamil Kolakowski
no flags Details
IOZONE xfs results compared between rhel6.0GA and rhel6.1-20110311.3 (14.76 KB, text/plain)
2011-03-18 15:58 UTC, Kamil Kolakowski
no flags Details
Result between -95 and -96 kernel (14.46 KB, text/plain)
2011-03-24 14:48 UTC, Kamil Kolakowski
no flags Details
Comparison -70vs-95 (14.47 KB, text/plain)
2011-03-28 06:17 UTC, Kamil Kolakowski
no flags Details
Comparison -82 and -83 kernel (ext4, cfq) (14.47 KB, text/plain)
2011-03-31 08:30 UTC, Kamil Kolakowski
no flags Details
-82 iozone_incache_default.iozone (22.94 KB, application/octet-stream)
2011-03-31 18:55 UTC, Kamil Kolakowski
no flags Details

Description Kamil Kolakowski 2011-03-18 14:42:34 UTC
I see regression ~6% on iozone incache testing on ext4 file system.

Baseline RHEL6.0GA
Tested version RHEL6.1-20110311.3.

Testing machine:
Hostname                                  = ibm-x3650m3-01.lab.eng.brq.redhat.com
Arch                                      = x86_64
Distro                                    = RHEL6.1-20110311.3
Kernel                                    = 2.6.32-122.el6.x86_64
SElinux mode                              = Permissive

CPU count : speeds MHz                    = 12 : 12 @ 2793.000
CPU model name                            = Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
CPU cache size                            = 12288 KB
BIOS Information   Version : Date         = -D6E145FUS-1.07- : 04/26/2010
Total Memory                              = 12029 (MB)
NUMA is Enabled.  # of nodes              = 1 nodes (0)

Tuned profile                             = default

I/O scheduler on testing device           = cfq
   SPEED TEST: hdparm -tT /dev/sdb1       = 
Timing                                    = buffered disk reads => 249.33 MB/sec Timing cached reads => 8704.62 MB/sec
    Free Diskspace on /RHTSspareLUN1      = 20GB
Type of HDD                               = SSD

I'm able to reproduce those results. I used 2 benchmarks iozone and postmark.
System includes 2HDD one is used for system second (SSD drive) for testing. 


There is no LVM and no RAID.


Results are attached.

Comment 2 Kamil Kolakowski 2011-03-18 14:56:50 UTC
Created attachment 486256 [details]
IOZONE results compared between rhel6.0GA and rhel6.1-20110311.3

Comment 3 Kamil Kolakowski 2011-03-18 14:58:25 UTC
Created attachment 486259 [details]
POSTMARK results compared between rhel6.0GA and rhel6.1-20110311.3

Comment 4 Eric Sandeen 2011-03-18 15:09:58 UTC
Sorry, jumped the gun on the summary edit :)

Comment 5 Eric Sandeen 2011-03-18 15:10:34 UTC
Were other filesystems tested?

Comment 6 Kamil Kolakowski 2011-03-18 15:55:06 UTC
Hi Eric,

Yes I tested ext4, ext3, xfs, ext2 file systems. All file system results shows regression in incache results.

I'm going to change bug description and post here rest of results.

Comment 7 Kamil Kolakowski 2011-03-18 15:57:37 UTC
Created attachment 486276 [details]
IOZONE ext3 results compared between rhel6.0GA and rhel6.1-20110311.3

Comment 8 Kamil Kolakowski 2011-03-18 15:58:16 UTC
Created attachment 486278 [details]
IOZONE xfs results compared between rhel6.0GA and rhel6.1-20110311.3

Comment 9 Eric Sandeen 2011-03-21 15:57:07 UTC
In my testing, I think I may have narrowed down at least some of this regression to some scheduler changes between -95 and -96.  I'm doing brew builds of those now, since they got garbage collected; if you could retest those & compare results it'd be great.  I'll let you know when the builds are available.

Thanks,
-Eric

Comment 10 Kamil Kolakowski 2011-03-21 16:34:30 UTC
I will retest it as soon as you will have this build ready.

Thanks

Kamil

Comment 13 Eric Sandeen 2011-03-23 14:30:37 UTC
Hm, after a bit more careful testing here, averaging 6 iozone runs and comparing them (iozone -a -y 4k -q 16384k -n 4k -g 1g -f /mnt/test/testfile) I'm not seeing much regression other than FWRITE:

IOZONE Analysis tool V1.4

FILE 1: ./iozone-2.6.32-71.el6.x86_64-avg-ext4.txt
FILE 2: ./iozone-2.6.32-122.el6.x86_64-avg-ext4.txt

TABLE:  SUMMARY of ALL FILE and RECORD SIZES
                        Results in MB/sec

     FILE & REC      ALL  INIT   RE             RE   RANDOM RANDOM
FILE SIZES (KB)      IOS  WRITE  WRITE   READ   READ   READ  WRITE
=====-------------------------------------------------------------
 1         ALL      1967    770   1085   3237   3400   3200   1438 
 2         ALL      1949    773   1077   3187   3377   3214   1445
           ALL         .      .      .      .      .      .      .

 BACKWD  RECRE STRIDE  F      FRE     F      FRE 
   READ  WRITE   READ  WRITE  WRITE   READ   READ
-------------------------------------------------
   2931   2217   3144    763   1037   3000   3216
   2941   2173   3102    735   1016   2959   3216 
      .      .      .   -3.7      .      .      .

Comment 14 Eric Sandeen 2011-03-23 14:38:26 UTC
This was tested on an intel SSD, INTEL SSDSA2M160 (entire disk formatted), 2G 2CPU x86_64 box, and all default mount options, cfq, etc.

Comment 15 Kamil Kolakowski 2011-03-24 08:59:59 UTC
Hi Eric,

I have result of -95 -96 running on ibm-x3650m3-01.lab.eng.brq.redhat.com.
I don't see regression between those two results.

I run iozone with those parameters

iozone -U /RHTSspareLUN1 -a -f /RHTSspareLUN1/ext4 -n 4k -g 4096m

I attached results.

Comment 16 Kamil Kolakowski 2011-03-24 14:48:18 UTC
Created attachment 487359 [details]
Result between -95 and -96 kernel

Comment 17 Eric Sandeen 2011-03-24 15:19:00 UTC
Can you compare -71 to -96 a well?

Then at least we'll know if your regression is before or after -96.

Since I can't reproduce this, I wonder if you can bisect a little and spot-check some other kernels prior to -122?

Thanks,
-Eric

Comment 18 Kamil Kolakowski 2011-03-28 06:13:01 UTC
Hi Eric,

I attached comparison between -70 and -95.
Here is significant regression. Now I'm running -80 kernel to track where regression first time occurred.

Comment 19 Kamil Kolakowski 2011-03-28 06:17:37 UTC
Created attachment 488083 [details]
Comparison -70vs-95

Comment 20 Kamil Kolakowski 2011-03-28 15:12:11 UTC
I have initial 3 iozone runs on -80 kernel. 
I see regression between -80 -96 kernel. 
I will continue with running test on -90.

Comment 21 Eric Sandeen 2011-03-28 15:23:27 UTC
Thanks for narrowing it down!  Sorry, if I could reproduce it, I could help :(

Comment 22 Kamil Kolakowski 2011-03-29 08:08:52 UTC
There is stable result between -90 and -95.
Regression must start between -80 and -90. 
Trying -85.

Comment 23 Kamil Kolakowski 2011-03-30 17:08:20 UTC
From first run on -82 kernel it looks that regression occurred first time in -83 kernel.

I will confirm it when I will have average of 3 results on -82 kernel.

Comment 24 Eric Sandeen 2011-03-30 18:32:37 UTC
Thanks!  Hm, only 92 patches now ;)

I don't see anything terribly obvious in fs/ mm/ drivers/block/ block/ ....

-Eric

Comment 25 Kamil Kolakowski 2011-03-31 08:26:24 UTC
Eric,

I have from 2 other runs confirmed that I see regression is between -82 and -83.
Iozone result attached. 

Now I will going to run all file systems on both IOZONE, POSTMARK tests. I will public here results as I will get it.

If you want other reports ie iostat or "echo w > /proc/sysrq-trigger" please let me know.

Thanks!

-Kamil

Comment 26 Kamil Kolakowski 2011-03-31 08:30:06 UTC
Created attachment 489002 [details]
Comparison -82 and -83 kernel (ext4, cfq)

Comment 27 Eric Sandeen 2011-03-31 14:36:45 UTC
Wow, thanks for narrowing that down!  That's a huge help.

Poring over the 90 or so changes now ...

Comment 28 Eric Sandeen 2011-03-31 15:15:16 UTC
Kamil, is there any chance I could access the box and do a proper git bisect between -82 and -83?  or is it running tests overnight for you as well?

I have a couple of suspect changes, but nothing is obvious.

Thanks,
-Eric

Comment 29 Kamil Kolakowski 2011-03-31 15:24:39 UTC
Eric, 

I started "big beaker" job across all file systems and running 2 benchmarks on those 2 kernels. But because this test takes few days I can stop it and you will use it for your investigation. 

Please let me know if I should cancel the job.

Thanks a lot

Kamil

Comment 30 Kamil Kolakowski 2011-03-31 18:55:47 UTC
Created attachment 489197 [details]
-82 iozone_incache_default.iozone

Comment 31 Eric Sandeen 2011-03-31 22:48:07 UTC
Matthew, bisect narrowed down the regression between -82 and -83 to:

 [x86] Add native Intel cpuidle driver

... we saw ondemand governor regressions too; Arjan added some sort of hook to that to make IO look not-idle ... maybe a similar issue here?

Comment 33 Kamil Kolakowski 2011-04-19 07:31:43 UTC
Because of comment 32 and no fix moving it to RHEL6.2.

Comment 36 Matthew Garrett 2011-05-18 15:32:35 UTC
The intel_idle driver means that we'll be getting into deeper C states than we were previously. The "simple fix" would be to avoid dropping into deep states if we're in iowait, but that would have a strong impact on power consumption.

Comment 39 Matthew Garrett 2011-07-14 15:00:13 UTC

*** This bug has been marked as a duplicate of bug 714180 ***


Note You need to log in before you can comment on or make changes to this bug.