Bug 203634

Summary: EXT3 Random write performance using IOZONE down off by 100X on large in cache files
Product: Red Hat Enterprise Linux 4 Reporter: Barry Marson <bmarson>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: bmarson, dshaks, duck, k.georgiou, ksorensen, laurent.jean-rigaud, ltroan, lwoodman, sccarlson
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 16:07:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
iozone output file
none
iozone incache RHEL3-U8 on pe6800-01.rhts.boston.redhat.com none

Description Barry Marson 2006-08-22 19:45:44 UTC
Description of problem:

Using the latest RHTS IOZONE test suite to establish baselines for RHEL5
performance regressions, I've notice random write performance off two orders or
magnitude on large in cache files.

Before IOZONE is run, the test scopes out disk space and memory and defines an
in cache test that never creates files more than half of RAM.  Presently the
test creates the files (2 at most) on the local filesystem.  When the filesize
gets to 512MB or higher random writes plummet for the smallest of I/O's (1KB,
2KB).  As the file gets larger, larger I/O requests tank as well.

I know there's concern that the local filesystem cant handle the I/O rate being
generated, but the I/O performance almost seems synchronous in performance and
after all it did the previous file size (half size).

To rule out the system disk capability, I kicked off the test on a Dell PE6800
with 16CPU (4 socket Xeon Dual Core/HT), 16GB of RAM, and a PERC4 presented
system disk made up of 8 72GB disks presented as RAID 0 and SCSI port optimized.
 In other words, this system should not have storage bottle necks for the test.
 Sure enough the output below shows what happens.  Below are some of the
results.  Note the effect on this system starts happening at 2GB (see ** ) and
gets even worse at 8GB.

        Iozone: Performance Test of File I/O
                Version $Revision: 3.263 $
                Compiled for 64 bit mode.
                Build: linux

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million,
                     Jean-Marc Zucconi, Jeff Blomberg,
                     Erik Habbinga, Kris Strecker, Walter Wong.

        Run began: Tue Aug 15 15:23:12 2006

        Auto Mode
        Cross over of record size disabled.
        Using minimum file size of 524288 kilobytes.
        Using maximum file size of 8388608 kilobytes.
        Using Minimum Record Size 1 KB
        Using Maximum Record Size 1024 KB
        Command line used:
/mnt/tests/performance/iozone/iozone3_263/src/current/iozone -az -n 512m -g
8192m -y 1k -q 1m
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random   
bkwd  record  stride
              KB  reclen   write rewrite    read    reread    read   write   
read rewrite    read   fwrite frewrite   fread  freread
          524288       1  161816  355924   776404   783623  410197  226920 
530322  398333  463528   152415   332390  738608   747426
          524288       2  220937  506627  1105533  1084337  694349  377896 
838656  754976  769535   210603   480571 1084647  1078399
          524288       4  289103  672113  1446510  1487772 1104764  591754
1296725 1342383 1170178   272121   625331 1416538  1451524
          524288       8  302203  716605  1587755  1624057 1308949  680782
1454675 1525916 1354235   288994   664993 1563130  1596214
          524288      16  311328  745770  1719400  1761555 1542295  752667
1649560 1671309 1576132   300986   704942 1705695  1753976
          524288      32  316786  761907  1798912  1842225 1721884  797919
1777186 1797889 1744265   306475   710694 1783656  1835369
          524288      64  318846  769750  1833155  1890054 1830129  820339
1865688 1858129 1838717   309977   719981 1847704  1901958
          524288     128  321850  773871  1872397  1925490 1888692  834055
1901619 1878487 1890095   311956   721992 1866963  1923365
          524288     256  321292  771392  1865925  1927422 1910607  834154
1916453 1896529 1909988   306190   697124 1861156  1929176
          524288     512  318705  757256  1793425  1851351 1839103  818448
1845292 1838504 1840144   288915   617132 1804350  1866351
          524288    1024  293529  624967  1062298  1073687 1070904  669577
1075231  855646 1070941   254150   471242 1062813  1072907
         1048576       1  146600  349419   769539   770169  394533  220754 
526721  397955  458261   149019   324796  735469   738452
         1048576       2  218778  511789  1101896  1108535  692452  375308 
832188  749623  768968   209760   478827 1087661  1082577
         1048576       4  285816  670647  1447370  1484215 1088706  585942
1282196 1344140 1160101   271094   622181 1345722  1382473
         1048576       8  301216  712151  1570244  1610718 1279643  673291
1455060 1512489 1345756   288337   666188 1563219  1604819
         1048576      16  310361  745963  1708223  1764337 1533999  749859
1652156 1681154 1565506   298072   695292 1699673  1751936
         1048576      32  316054  762336  1790551  1848120 1720443  794360
1761214 1787289 1720732   303224   707079 1775995  1815804
         1048576      64  317601  764902  1797165  1859279 1795162  808250
1831490 1815870 1821928   307894   714805 1828268  1879994
         1048576     128  319674  769266  1868806  1929576 1888581  830189
1910813 1878485 1895912   311419   724229 1868091  1928004
         1048576     256  322362  772303  1863979  1929778 1906175  834468
1895322 1888849 1915529   308046   704568 1869622  1930883
         1048576     512  318025  751117  1800059  1859982 1841302  815470
1852971 1604590 1847043   283378   593150 1803595  1863966
         1048576    1024  289695  629830  1054972  1067481 1065390  666425
1068451  882821 1062990   251868   464539 1059666  1071049
**       2097152       1  143498  346291   774483   775223  390962   10522 
515667  399434  460586   145935   318604  747498   747155
**       2097152       2  215274  475187  1100778  1091117  675208   24199 
809861  755177  760342   198126   456326 1074120  1077363
         2097152       4  281147  638620  1439251  1484486 1069935  578250
1296567 1325326 1169319   268474   597618 1384469  1404337
         2097152       8  277511  684002  1567316  1617415 1264203  669682
1447106 1506214 1350098   283587   634407 1533377  1590222
         2097152      16  304768  713644  1681890  1744013 1489938  736803
1611214 1678906 1531677   292558   668430 1682610  1733565
         2097152      32  311748  731624  1781067  1841671 1698775  788853
1748967 1753519 1709426   298691   674719 1769358  1825707
         2097152      64  315570  732603  1810585  1872296 1792895  806577
1861920 1808667 1831749   304310   688080 1846304  1895567
         2097152     128  300747  739309  1834088  1899207 1851609  815329
1886032 1846075 1869392   306942   698956 1819380  1905995
         2097152     256  316021  736568  1856290  1928898 1904901  828616
1903480 1886603 1892950   306032   671820 1873487  1930833
         2097152     512  311142  734494  1783290  1833262 1811927  805214
1846573 1828855 1844391   282024   587642 1807875  1866441
         2097152    1024  283727  610109  1056033  1073939 1070623  669233
1066840  868495 1059748   241102   457193 1057899  1071695
**       4194304       1  140029  324044   763930   764992  376546    7230 
520873  398254  457500   131249   310429  737602   735872
**       4194304       2  197642  485494  1067197  1072562  654258   14285 
792375  746114  747317   180617   444653 1021724  1013801
**       4194304       4  263575  607577  1337525  1380703 1008990   36895
1151248 1357750 1106155   255593   571425 1308052  1325162
**       4194304       8  282543  643060  1486679  1535631 1185034   63081
1351363 1516368 1287977   271062   606484 1466228  1492143
**       4194304      16  289634  671150  1601754  1652854 1443944  107928
1500616 1671919 1445376   279035   625404 1571899  1644741
         4194304      32  296259  662934  1662924  1700351 1596037  162926
1635428 1749558 1616877   289229   637020 1658544  1696663
         4194304      64  297443  665205  1670205  1724712 1685243  218818
1707047 1815963 1706467   288097   638324 1667847  1727600
         4194304     128  296723  685144  1711559  1762996 1732092  205742
1736753 1827816 1704242   292286   643655 1672997  1728259
         4194304     256  298172  678041  1705020  1743127 1734568  222374
1713448 1864348 1716796   288771   630674 1680606  1713100
         4194304     512  294511  670336  1593550  1640235 1613165  233211
1658245 1586382 1657034   269473   542614 1561760  1650082
         4194304    1024  271260  497901   992021  1010458 1000938  262476
1002051  845351 1006542   233879   430839  966425   977441
         8388608       1  129318  300792   730181   731593  357054    4789 
503387  394175  445323   129068   295438  719111   717859
         8388608       2  198324  422714  1083289  1080364  647090    9692 
805642  757102  765852   190845   420101 1029593  1029688
         8388608       4  253556  592264  1345680  1386062 1016150   18982
1262508 1335933 1139225   243122   557402 1347707  1390506
         8388608       8  270792  627633  1489808  1529118 1193188   41979
1400815 1500683 1323268   240704   598622 1498857  1532869
         8388608      16  279981  556054  1534950  1686129 1445565   67233
1583567 1684652 1513680   272200   619267 1640794  1687587
         8388608      32  259959  654230  1700089  1757151 1602490  107832
1701146 1765768 1672346   277291   633532 1720117  1771588
         8388608      64  281534  679831  1735676  1792001 1704315  175830
1780551 1806435 1759993   281183   636315 1764409  1813474
         8388608     128  290084  684719  1774599  1823084 1768438  155762
1808967 1848008 1789890   278728   644063 1793366  1842123
         8388608     256  288992  688300  1778317  1826239 1796404  188600
1814202 1807791 1804139   281289   630696 1771175  1818268
         8388608     512  288135  668990  1712132  1759261 1721039  177822
1714917 1514876 1708120   255299   506196 1712367  1757635
         8388608    1024  263293  534100  1031929  1043696 1035828  190252
1052057  864951 1046600   227524   414236 1030812  1038646

Looking at top shows an interesting thing;  when the performance drops,
kjournald seems to get about 3-10% of a CPU and the random writes go into what
appears a lock step.  On a smaller file (say the previous file size [ half ])
kjournald seems to make use of an entire CPU and the random write rate is good.  

What's going on here ?

Thanks in advance,

Barry

Version-Release number of selected component (if applicable):

Happens on both RHEL4 and RHEL5.

How reproducible:

Every time

Steps to Reproduce:
1. install and run iozone to the local scsi disk (make sure its beefy enough)
   where theres at least 2-4GB of RAM so you can create files at least 1-2GB.
2.
3.
  
Actual results:

See above IOZONE output

Expected results:

See above IOZONE output

Additional info:

Comment 1 Barry Marson 2006-08-22 19:49:17 UTC
Created attachment 134666 [details]
iozone output file

Comment 2 Barry Marson 2006-09-07 14:48:05 UTC
This performance regression occurs on RHEL3-U8 as well.  While the test system
was a NUMA box (HP Olympia - ia64 9 CPU's in 3 CELLs with 128GB RAM) the effect
is there.  It effects all record sizes dramatically by 4GB file size.

Unfortunately this system is underpowered WRT disk space and I/O and disk seeks
might be some of the issue for the biggest of files.  Its 4 36GB drives striped
into a volume.  But that should only effect the largest file sizes which dont
exceed 64GB.

Barry

Comment 3 Chip Coldwell 2006-09-11 15:55:21 UTC
(In reply to comment #2)
> This performance regression occurs on RHEL3-U8 as well.

In what sense is this a regression?  Is the performance acceptable on RHEL3-U7?

Chip


Comment 4 Barry Marson 2006-09-11 17:33:39 UTC
I guess it's best to describe this as a regression against itself not an OS
release.  Smaller (but not insignificant file sizes) perform well.  Either way,
its unacceptable poor performace across numerous released versions of RHEL as
well as being seen in RHEL5.  

I dont know if it performs poorly on RHEL3-U7.  I was considering trying it on
RHEL3 Gold.  

I suspect this is problem has a been around forever.

Barry

Comment 5 Larry Troan 2007-03-06 16:04:25 UTC
Chip, is this a DUP of Bug 227958 
[0.8 kernel exhibits significant performance degradation over 4.4 stock kernel]???

Fujitsu believes they have recreated the customer reported performance problem
as described in bug 227958 between the 0.3 and 0.8 kernels.  

Comment 6 Barry Marson 2007-03-06 20:20:04 UTC
I manually tested this 5 months back on at least one version of RHEL3 and showed
the problem existed there as well.

Barry

Comment 7 Barry Marson 2007-03-07 19:52:21 UTC
I'm in the process of rerunning the incache iozone RHTS test again, this time on
a Dell pe6800 specifically pe6800-01.rhts.boston.redhat.com.  It a quad socket
Xeon Extreme box with 16GB of RAM and one large hardware .5 TB RAID0 LUN (backed
by 8 spindles on two ports).  This LUN is the install/system disk and the place
where the test runs. 

As early as 512 MB filesize we see ~ 15X degredation at 1K rec size.  At each
greater filesize, the degredation spreads to ever increasing record sizes as well

I'll upload the iozone incache run when it completes.

Barry

Comment 8 Barry Marson 2007-03-08 01:44:26 UTC
Created attachment 149542 [details]
iozone incache RHEL3-U8 on pe6800-01.rhts.boston.redhat.com

Note the drop in random write performance at 512MB fileseize 1KB recsize

Comment 9 John Shakshober 2007-03-08 11:42:21 UTC
So this has been a day1 problem/issue with Linux RHEL3, RHEL4 and RHEL5 and is 
thus not a regression ... just an area that the tester (Barry) feel that the 
O.S. "should" do better.  The 100x off is clearly the difference between a file 
that gets cached by the filesystem for 
 - small 1k random writes
 - when file size > 512 MB

To fix this problem, we need EXT3 architects to confirm that it is a defect or 
not.  To rectify would require a design change.  Should cross check GFS 
operations.

Please do not assign as duplicates to other regressions reported in RHEL.
 

Comment 10 Chip Coldwell 2007-09-05 21:28:14 UTC
(In reply to comment #0)
>
> a PERC4 presented system disk

Is PERC4 a re-badged LSI HBA?  Fusion?  Megaraid?

Chip



Comment 11 Barry Marson 2007-09-06 00:07:46 UTC
PERC4 is definitely LSI, at least if you disable its higher RAID capability, the
BIOS comes up as LSI.  In Full RAID mode, it uses the megaraid driver.  This
problem occurs on different storage setups, from the local big system disk to FC
based storage

Barry

Comment 12 Chip Coldwell 2007-09-06 14:32:02 UTC
OK, so it's not driver-dependent.  One more question: is this bug a performance
regression wherein RHEL-5 is slower than RHEL-4?  If so, shouldn't the bug be
against RHEL-5?

Chip


Comment 13 Barry Marson 2007-09-06 16:04:52 UTC
The problem has been shown as early as RHEL3.  Thus its more like broken (for a
long time) as opposed to a regression.

As far as whether the problem regresses more in RHEL5 vs. RHEL4; I believe a
year ago, we may have seen the performance drop occur one file size earlier in
RHEL5 than in RHEL4.  But that was a year ago.  Never the less, the problem
eventually occurs everywhere on every machine we have tried it on.  I could run
a comparison workflow in RHTS against RHEL4/5, if you think thats important.

Barry

Comment 14 Chip Coldwell 2007-09-06 17:23:20 UTC
That shouldn't be necessary.  I'm just trying to understand exactly where the
difference is.  So IIUC, you have measured a 100X performance decrease on writes
to in-cache files of 512KB or larger from RHEL-3 to RHEL-4/5.  Is that correct?

Chip


Comment 15 Barry Marson 2007-09-06 18:12:16 UTC
That is correct ... And the point where it starts decreasing seems to vary with
different systems but is near 512MB (not KB)

Barry

Comment 17 Laurent Jean-Rigaud 2009-02-12 14:36:31 UTC
Any news about this problem ?

It seems that it's closed to something we met on Sun servers + rhel4.5 + megaraid RAID5|RAID0 disks : a file copy of 3.5Go needs 10mins and makes PGsql io problems...

Regards

Comment 18 Barry Marson 2009-05-11 19:39:18 UTC
You may have seen something similar, but if it's the same issue, your layout is but a subset of what we are seeing.  For us this performance degradation happens on multiple versions of RHEL, with or without fancy IO storage/controllers

Barry

Comment 19 Scott 2009-08-07 01:00:11 UTC
I believe we have experienced this same sort of performance situation between ext2 on RH6.2 and ext3 on RH4 systems. 

on RH6.2, ext2 and ext3 are very fast
on RH4, ext2 is slower than RH6.2 ext2
on RH4, ext3 is at least 50% slower than ext3 on RH6.2

For our i/o workload, which is something in the neighborhood of 2000 processes writing to 200 log files in 8k blocks, ext3 does not appear to be able to sustain this w/o having significant problems with processes going into i/o wait while kjournald and pdflush clean up.

No amount of tuning seems to help, ext3 on RH4 just seems to be THAT MUCH slower than ext3 on RH6.2.  The problem also seems to be on RH5

Has there been any progress in this thread or similar 'bugs' ?

Thank you

Comment 20 Scott 2009-08-07 01:05:54 UTC
Kernel's in use from our environment

2.4.34.5-1-i686-HUGEMEM - RH 6.2
2.6.9-78.0.17.ELsmp - RHEL 4

Hardware

HP 360G5
400i Raid controller with 512MB On-Board Cache on the RHEL 4 System

Comment 22 Jiri Pallich 2012-06-20 16:07:55 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.