Bug 203634
Summary: | EXT3 Random write performance using IOZONE down off by 100X on large in cache files | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Barry Marson <bmarson> | ||||||
Component: | kernel | Assignee: | Tom Coughlan <coughlan> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.4 | CC: | bmarson, dshaks, duck, k.georgiou, ksorensen, laurent.jean-rigaud, ltroan, lwoodman, sccarlson | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-06-20 16:07:55 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Barry Marson
2006-08-22 19:45:44 UTC
Created attachment 134666 [details]
iozone output file
This performance regression occurs on RHEL3-U8 as well. While the test system was a NUMA box (HP Olympia - ia64 9 CPU's in 3 CELLs with 128GB RAM) the effect is there. It effects all record sizes dramatically by 4GB file size. Unfortunately this system is underpowered WRT disk space and I/O and disk seeks might be some of the issue for the biggest of files. Its 4 36GB drives striped into a volume. But that should only effect the largest file sizes which dont exceed 64GB. Barry (In reply to comment #2) > This performance regression occurs on RHEL3-U8 as well. In what sense is this a regression? Is the performance acceptable on RHEL3-U7? Chip I guess it's best to describe this as a regression against itself not an OS release. Smaller (but not insignificant file sizes) perform well. Either way, its unacceptable poor performace across numerous released versions of RHEL as well as being seen in RHEL5. I dont know if it performs poorly on RHEL3-U7. I was considering trying it on RHEL3 Gold. I suspect this is problem has a been around forever. Barry Chip, is this a DUP of Bug 227958 [0.8 kernel exhibits significant performance degradation over 4.4 stock kernel]??? Fujitsu believes they have recreated the customer reported performance problem as described in bug 227958 between the 0.3 and 0.8 kernels. I manually tested this 5 months back on at least one version of RHEL3 and showed the problem existed there as well. Barry I'm in the process of rerunning the incache iozone RHTS test again, this time on a Dell pe6800 specifically pe6800-01.rhts.boston.redhat.com. It a quad socket Xeon Extreme box with 16GB of RAM and one large hardware .5 TB RAID0 LUN (backed by 8 spindles on two ports). This LUN is the install/system disk and the place where the test runs. As early as 512 MB filesize we see ~ 15X degredation at 1K rec size. At each greater filesize, the degredation spreads to ever increasing record sizes as well I'll upload the iozone incache run when it completes. Barry Created attachment 149542 [details]
iozone incache RHEL3-U8 on pe6800-01.rhts.boston.redhat.com
Note the drop in random write performance at 512MB fileseize 1KB recsize
So this has been a day1 problem/issue with Linux RHEL3, RHEL4 and RHEL5 and is thus not a regression ... just an area that the tester (Barry) feel that the O.S. "should" do better. The 100x off is clearly the difference between a file that gets cached by the filesystem for - small 1k random writes - when file size > 512 MB To fix this problem, we need EXT3 architects to confirm that it is a defect or not. To rectify would require a design change. Should cross check GFS operations. Please do not assign as duplicates to other regressions reported in RHEL. (In reply to comment #0) > > a PERC4 presented system disk Is PERC4 a re-badged LSI HBA? Fusion? Megaraid? Chip PERC4 is definitely LSI, at least if you disable its higher RAID capability, the BIOS comes up as LSI. In Full RAID mode, it uses the megaraid driver. This problem occurs on different storage setups, from the local big system disk to FC based storage Barry OK, so it's not driver-dependent. One more question: is this bug a performance regression wherein RHEL-5 is slower than RHEL-4? If so, shouldn't the bug be against RHEL-5? Chip The problem has been shown as early as RHEL3. Thus its more like broken (for a long time) as opposed to a regression. As far as whether the problem regresses more in RHEL5 vs. RHEL4; I believe a year ago, we may have seen the performance drop occur one file size earlier in RHEL5 than in RHEL4. But that was a year ago. Never the less, the problem eventually occurs everywhere on every machine we have tried it on. I could run a comparison workflow in RHTS against RHEL4/5, if you think thats important. Barry That shouldn't be necessary. I'm just trying to understand exactly where the difference is. So IIUC, you have measured a 100X performance decrease on writes to in-cache files of 512KB or larger from RHEL-3 to RHEL-4/5. Is that correct? Chip That is correct ... And the point where it starts decreasing seems to vary with different systems but is near 512MB (not KB) Barry Any news about this problem ? It seems that it's closed to something we met on Sun servers + rhel4.5 + megaraid RAID5|RAID0 disks : a file copy of 3.5Go needs 10mins and makes PGsql io problems... Regards You may have seen something similar, but if it's the same issue, your layout is but a subset of what we are seeing. For us this performance degradation happens on multiple versions of RHEL, with or without fancy IO storage/controllers Barry I believe we have experienced this same sort of performance situation between ext2 on RH6.2 and ext3 on RH4 systems. on RH6.2, ext2 and ext3 are very fast on RH4, ext2 is slower than RH6.2 ext2 on RH4, ext3 is at least 50% slower than ext3 on RH6.2 For our i/o workload, which is something in the neighborhood of 2000 processes writing to 200 log files in 8k blocks, ext3 does not appear to be able to sustain this w/o having significant problems with processes going into i/o wait while kjournald and pdflush clean up. No amount of tuning seems to help, ext3 on RH4 just seems to be THAT MUCH slower than ext3 on RH6.2. The problem also seems to be on RH5 Has there been any progress in this thread or similar 'bugs' ? Thank you Kernel's in use from our environment 2.4.34.5-1-i686-HUGEMEM - RH 6.2 2.6.9-78.0.17.ELsmp - RHEL 4 Hardware HP 360G5 400i Raid controller with 512MB On-Board Cache on the RHEL 4 System Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |