Description of problem: Using the latest RHTS IOZONE test suite to establish baselines for RHEL5 performance regressions, I've notice random write performance off two orders or magnitude on large in cache files. Before IOZONE is run, the test scopes out disk space and memory and defines an in cache test that never creates files more than half of RAM. Presently the test creates the files (2 at most) on the local filesystem. When the filesize gets to 512MB or higher random writes plummet for the smallest of I/O's (1KB, 2KB). As the file gets larger, larger I/O requests tank as well. I know there's concern that the local filesystem cant handle the I/O rate being generated, but the I/O performance almost seems synchronous in performance and after all it did the previous file size (half size). To rule out the system disk capability, I kicked off the test on a Dell PE6800 with 16CPU (4 socket Xeon Dual Core/HT), 16GB of RAM, and a PERC4 presented system disk made up of 8 72GB disks presented as RAID 0 and SCSI port optimized. In other words, this system should not have storage bottle necks for the test. Sure enough the output below shows what happens. Below are some of the results. Note the effect on this system starts happening at 2GB (see ** ) and gets even worse at 8GB. Iozone: Performance Test of File I/O Version $Revision: 3.263 $ Compiled for 64 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Jean-Marc Zucconi, Jeff Blomberg, Erik Habbinga, Kris Strecker, Walter Wong. Run began: Tue Aug 15 15:23:12 2006 Auto Mode Cross over of record size disabled. Using minimum file size of 524288 kilobytes. Using maximum file size of 8388608 kilobytes. Using Minimum Record Size 1 KB Using Maximum Record Size 1024 KB Command line used: /mnt/tests/performance/iozone/iozone3_263/src/current/iozone -az -n 512m -g 8192m -y 1k -q 1m Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 524288 1 161816 355924 776404 783623 410197 226920 530322 398333 463528 152415 332390 738608 747426 524288 2 220937 506627 1105533 1084337 694349 377896 838656 754976 769535 210603 480571 1084647 1078399 524288 4 289103 672113 1446510 1487772 1104764 591754 1296725 1342383 1170178 272121 625331 1416538 1451524 524288 8 302203 716605 1587755 1624057 1308949 680782 1454675 1525916 1354235 288994 664993 1563130 1596214 524288 16 311328 745770 1719400 1761555 1542295 752667 1649560 1671309 1576132 300986 704942 1705695 1753976 524288 32 316786 761907 1798912 1842225 1721884 797919 1777186 1797889 1744265 306475 710694 1783656 1835369 524288 64 318846 769750 1833155 1890054 1830129 820339 1865688 1858129 1838717 309977 719981 1847704 1901958 524288 128 321850 773871 1872397 1925490 1888692 834055 1901619 1878487 1890095 311956 721992 1866963 1923365 524288 256 321292 771392 1865925 1927422 1910607 834154 1916453 1896529 1909988 306190 697124 1861156 1929176 524288 512 318705 757256 1793425 1851351 1839103 818448 1845292 1838504 1840144 288915 617132 1804350 1866351 524288 1024 293529 624967 1062298 1073687 1070904 669577 1075231 855646 1070941 254150 471242 1062813 1072907 1048576 1 146600 349419 769539 770169 394533 220754 526721 397955 458261 149019 324796 735469 738452 1048576 2 218778 511789 1101896 1108535 692452 375308 832188 749623 768968 209760 478827 1087661 1082577 1048576 4 285816 670647 1447370 1484215 1088706 585942 1282196 1344140 1160101 271094 622181 1345722 1382473 1048576 8 301216 712151 1570244 1610718 1279643 673291 1455060 1512489 1345756 288337 666188 1563219 1604819 1048576 16 310361 745963 1708223 1764337 1533999 749859 1652156 1681154 1565506 298072 695292 1699673 1751936 1048576 32 316054 762336 1790551 1848120 1720443 794360 1761214 1787289 1720732 303224 707079 1775995 1815804 1048576 64 317601 764902 1797165 1859279 1795162 808250 1831490 1815870 1821928 307894 714805 1828268 1879994 1048576 128 319674 769266 1868806 1929576 1888581 830189 1910813 1878485 1895912 311419 724229 1868091 1928004 1048576 256 322362 772303 1863979 1929778 1906175 834468 1895322 1888849 1915529 308046 704568 1869622 1930883 1048576 512 318025 751117 1800059 1859982 1841302 815470 1852971 1604590 1847043 283378 593150 1803595 1863966 1048576 1024 289695 629830 1054972 1067481 1065390 666425 1068451 882821 1062990 251868 464539 1059666 1071049 ** 2097152 1 143498 346291 774483 775223 390962 10522 515667 399434 460586 145935 318604 747498 747155 ** 2097152 2 215274 475187 1100778 1091117 675208 24199 809861 755177 760342 198126 456326 1074120 1077363 2097152 4 281147 638620 1439251 1484486 1069935 578250 1296567 1325326 1169319 268474 597618 1384469 1404337 2097152 8 277511 684002 1567316 1617415 1264203 669682 1447106 1506214 1350098 283587 634407 1533377 1590222 2097152 16 304768 713644 1681890 1744013 1489938 736803 1611214 1678906 1531677 292558 668430 1682610 1733565 2097152 32 311748 731624 1781067 1841671 1698775 788853 1748967 1753519 1709426 298691 674719 1769358 1825707 2097152 64 315570 732603 1810585 1872296 1792895 806577 1861920 1808667 1831749 304310 688080 1846304 1895567 2097152 128 300747 739309 1834088 1899207 1851609 815329 1886032 1846075 1869392 306942 698956 1819380 1905995 2097152 256 316021 736568 1856290 1928898 1904901 828616 1903480 1886603 1892950 306032 671820 1873487 1930833 2097152 512 311142 734494 1783290 1833262 1811927 805214 1846573 1828855 1844391 282024 587642 1807875 1866441 2097152 1024 283727 610109 1056033 1073939 1070623 669233 1066840 868495 1059748 241102 457193 1057899 1071695 ** 4194304 1 140029 324044 763930 764992 376546 7230 520873 398254 457500 131249 310429 737602 735872 ** 4194304 2 197642 485494 1067197 1072562 654258 14285 792375 746114 747317 180617 444653 1021724 1013801 ** 4194304 4 263575 607577 1337525 1380703 1008990 36895 1151248 1357750 1106155 255593 571425 1308052 1325162 ** 4194304 8 282543 643060 1486679 1535631 1185034 63081 1351363 1516368 1287977 271062 606484 1466228 1492143 ** 4194304 16 289634 671150 1601754 1652854 1443944 107928 1500616 1671919 1445376 279035 625404 1571899 1644741 4194304 32 296259 662934 1662924 1700351 1596037 162926 1635428 1749558 1616877 289229 637020 1658544 1696663 4194304 64 297443 665205 1670205 1724712 1685243 218818 1707047 1815963 1706467 288097 638324 1667847 1727600 4194304 128 296723 685144 1711559 1762996 1732092 205742 1736753 1827816 1704242 292286 643655 1672997 1728259 4194304 256 298172 678041 1705020 1743127 1734568 222374 1713448 1864348 1716796 288771 630674 1680606 1713100 4194304 512 294511 670336 1593550 1640235 1613165 233211 1658245 1586382 1657034 269473 542614 1561760 1650082 4194304 1024 271260 497901 992021 1010458 1000938 262476 1002051 845351 1006542 233879 430839 966425 977441 8388608 1 129318 300792 730181 731593 357054 4789 503387 394175 445323 129068 295438 719111 717859 8388608 2 198324 422714 1083289 1080364 647090 9692 805642 757102 765852 190845 420101 1029593 1029688 8388608 4 253556 592264 1345680 1386062 1016150 18982 1262508 1335933 1139225 243122 557402 1347707 1390506 8388608 8 270792 627633 1489808 1529118 1193188 41979 1400815 1500683 1323268 240704 598622 1498857 1532869 8388608 16 279981 556054 1534950 1686129 1445565 67233 1583567 1684652 1513680 272200 619267 1640794 1687587 8388608 32 259959 654230 1700089 1757151 1602490 107832 1701146 1765768 1672346 277291 633532 1720117 1771588 8388608 64 281534 679831 1735676 1792001 1704315 175830 1780551 1806435 1759993 281183 636315 1764409 1813474 8388608 128 290084 684719 1774599 1823084 1768438 155762 1808967 1848008 1789890 278728 644063 1793366 1842123 8388608 256 288992 688300 1778317 1826239 1796404 188600 1814202 1807791 1804139 281289 630696 1771175 1818268 8388608 512 288135 668990 1712132 1759261 1721039 177822 1714917 1514876 1708120 255299 506196 1712367 1757635 8388608 1024 263293 534100 1031929 1043696 1035828 190252 1052057 864951 1046600 227524 414236 1030812 1038646 Looking at top shows an interesting thing; when the performance drops, kjournald seems to get about 3-10% of a CPU and the random writes go into what appears a lock step. On a smaller file (say the previous file size [ half ]) kjournald seems to make use of an entire CPU and the random write rate is good. What's going on here ? Thanks in advance, Barry Version-Release number of selected component (if applicable): Happens on both RHEL4 and RHEL5. How reproducible: Every time Steps to Reproduce: 1. install and run iozone to the local scsi disk (make sure its beefy enough) where theres at least 2-4GB of RAM so you can create files at least 1-2GB. 2. 3. Actual results: See above IOZONE output Expected results: See above IOZONE output Additional info:
Created attachment 134666 [details] iozone output file
This performance regression occurs on RHEL3-U8 as well. While the test system was a NUMA box (HP Olympia - ia64 9 CPU's in 3 CELLs with 128GB RAM) the effect is there. It effects all record sizes dramatically by 4GB file size. Unfortunately this system is underpowered WRT disk space and I/O and disk seeks might be some of the issue for the biggest of files. Its 4 36GB drives striped into a volume. But that should only effect the largest file sizes which dont exceed 64GB. Barry
(In reply to comment #2) > This performance regression occurs on RHEL3-U8 as well. In what sense is this a regression? Is the performance acceptable on RHEL3-U7? Chip
I guess it's best to describe this as a regression against itself not an OS release. Smaller (but not insignificant file sizes) perform well. Either way, its unacceptable poor performace across numerous released versions of RHEL as well as being seen in RHEL5. I dont know if it performs poorly on RHEL3-U7. I was considering trying it on RHEL3 Gold. I suspect this is problem has a been around forever. Barry
Chip, is this a DUP of Bug 227958 [0.8 kernel exhibits significant performance degradation over 4.4 stock kernel]??? Fujitsu believes they have recreated the customer reported performance problem as described in bug 227958 between the 0.3 and 0.8 kernels.
I manually tested this 5 months back on at least one version of RHEL3 and showed the problem existed there as well. Barry
I'm in the process of rerunning the incache iozone RHTS test again, this time on a Dell pe6800 specifically pe6800-01.rhts.boston.redhat.com. It a quad socket Xeon Extreme box with 16GB of RAM and one large hardware .5 TB RAID0 LUN (backed by 8 spindles on two ports). This LUN is the install/system disk and the place where the test runs. As early as 512 MB filesize we see ~ 15X degredation at 1K rec size. At each greater filesize, the degredation spreads to ever increasing record sizes as well I'll upload the iozone incache run when it completes. Barry
Created attachment 149542 [details] iozone incache RHEL3-U8 on pe6800-01.rhts.boston.redhat.com Note the drop in random write performance at 512MB fileseize 1KB recsize
So this has been a day1 problem/issue with Linux RHEL3, RHEL4 and RHEL5 and is thus not a regression ... just an area that the tester (Barry) feel that the O.S. "should" do better. The 100x off is clearly the difference between a file that gets cached by the filesystem for - small 1k random writes - when file size > 512 MB To fix this problem, we need EXT3 architects to confirm that it is a defect or not. To rectify would require a design change. Should cross check GFS operations. Please do not assign as duplicates to other regressions reported in RHEL.
(In reply to comment #0) > > a PERC4 presented system disk Is PERC4 a re-badged LSI HBA? Fusion? Megaraid? Chip
PERC4 is definitely LSI, at least if you disable its higher RAID capability, the BIOS comes up as LSI. In Full RAID mode, it uses the megaraid driver. This problem occurs on different storage setups, from the local big system disk to FC based storage Barry
OK, so it's not driver-dependent. One more question: is this bug a performance regression wherein RHEL-5 is slower than RHEL-4? If so, shouldn't the bug be against RHEL-5? Chip
The problem has been shown as early as RHEL3. Thus its more like broken (for a long time) as opposed to a regression. As far as whether the problem regresses more in RHEL5 vs. RHEL4; I believe a year ago, we may have seen the performance drop occur one file size earlier in RHEL5 than in RHEL4. But that was a year ago. Never the less, the problem eventually occurs everywhere on every machine we have tried it on. I could run a comparison workflow in RHTS against RHEL4/5, if you think thats important. Barry
That shouldn't be necessary. I'm just trying to understand exactly where the difference is. So IIUC, you have measured a 100X performance decrease on writes to in-cache files of 512KB or larger from RHEL-3 to RHEL-4/5. Is that correct? Chip
That is correct ... And the point where it starts decreasing seems to vary with different systems but is near 512MB (not KB) Barry
Any news about this problem ? It seems that it's closed to something we met on Sun servers + rhel4.5 + megaraid RAID5|RAID0 disks : a file copy of 3.5Go needs 10mins and makes PGsql io problems... Regards
You may have seen something similar, but if it's the same issue, your layout is but a subset of what we are seeing. For us this performance degradation happens on multiple versions of RHEL, with or without fancy IO storage/controllers Barry
I believe we have experienced this same sort of performance situation between ext2 on RH6.2 and ext3 on RH4 systems. on RH6.2, ext2 and ext3 are very fast on RH4, ext2 is slower than RH6.2 ext2 on RH4, ext3 is at least 50% slower than ext3 on RH6.2 For our i/o workload, which is something in the neighborhood of 2000 processes writing to 200 log files in 8k blocks, ext3 does not appear to be able to sustain this w/o having significant problems with processes going into i/o wait while kjournald and pdflush clean up. No amount of tuning seems to help, ext3 on RH4 just seems to be THAT MUCH slower than ext3 on RH6.2. The problem also seems to be on RH5 Has there been any progress in this thread or similar 'bugs' ? Thank you
Kernel's in use from our environment 2.4.34.5-1-i686-HUGEMEM - RH 6.2 2.6.9-78.0.17.ELsmp - RHEL 4 Hardware HP 360G5 400i Raid controller with 512MB On-Board Cache on the RHEL 4 System
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.