Description of problem: The synchronous write() system call of RHEL3.0 is slower than that of RHEL2.1, when the write data exceeds 4 K bytes. We measured the performance using the test program which writes to ext3 file system on SCSI disk(sym53c8xx). We used the data size of 2KB, 4KB, 8KB, and 16KB for writing, the write system call of RHEL3.0 takes as twice time as that of RHEL2.1, in case of 16KB. The following is the summary of the result. [RHEL2.1] size Processing time 2048 77.801379sec 4096 40.136581sec 8192 20.301038sec 16384 10.209447sec [RHEL3.0] size Processing time 2048 77.334858sec 4096 40.356672sec 8192 27.636047sec 16384 21.505170sec Version-Release number of selected component (if applicable): kernel-smp-2.4.21-1.1931.2.423.ent How reproducible: Every Time Steps to Reproduce: 1.Compile the attached test program as follows: $ cc -o write write.c 2.Run the program specifying a write size. (write.c creates 10M bytes of file.) $ ./write 2048 77.334858sec $ ./write 4096 40.356672sec $ ./write 8192 27.636047sec $ ./write 16384 21.505170sec Actual results: The synchronous write() system call is much slower. Expected results: The synchronous write() system call of RHLE3.0 has (at least) the same performance as RHEL2.1. Additional info:
Created attachment 94578 [details] write.c from fujitsu.
*** Bug 104634 has been marked as a duplicate of this bug. ***
*** Bug 104636 has been marked as a duplicate of this bug. ***
The performance issue at hand was introduced with linux-2.4.21-scsi-affine.patch. Essentially, there is an unconditional unplug_device at the end of __make_request in ll_rw_block.c. This prevents write aggregation on a non-busy system doing sequential writes through a filesystem. At this time it is unclear whether this "bug" is triggered by real-world system workloads.
*** Bug 113171 has been marked as a duplicate of this bug. ***
*** Bug 109618 has been marked as a duplicate of this bug. ***
*** Bug 114052 has been marked as a duplicate of this bug. ***
*** Bug 114135 has been marked as a duplicate of this bug. ***
It is very triggered by real world loads. A customer of ours has some systems set up for database operations. An operation with the RHEL2.1 system will take 50% less time than the opteraion with RHEL3. Of course, this percentage is exponential. As the operation gets larger, the time split between 2.1 and 3 gets larger as well. This is a pretty critical bug that needs to be addressed.
I've built a new set of test rpms for this issue. The test rpms can be found at http://people.redhat.com/dledford/.qohsdf/ If people will test this and let me know if the performance issues go away I would greatly appreciate it (please note that these files are only available for the next 48 hours, after that they will be removed automatically).
Initial tests w/ tiobench show significant improvement in the Sequential and random read rates. dledford kernel: File Block Num Seq Read Rand Read Seq Write Rand Write Dir Size Size Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%) ------- ------ ------- --- ----------- ----------- ----------- ----------- . 1792 4096 1 705.3 99.9% 686.8 87.9% 37.62 32.5% 2.439 0.78% . 1792 4096 2 847.2 139.% 857.3 109.% 38.03 47.5% 2.382 0.99% . 1792 4096 4 979.4 199.% 1006. 171.% 38.16 69.3% 2.343 1.74% . 1792 4096 8 1056. 235.% 1059. 203.% 26.28 54.6% 2.286 2.01% RHEL3 Q3 update kernel: File Block Num Seq Read Rand Read Seq Write Rand Write Dir Size Size Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%) ------- ------ ------- --- ----------- ----------- ----------- ----------- . 1792 4096 1 357.5 56.4% 70.83 9.06% 38.68 33.0% 2.144 0.68% . 1792 4096 2 436.0 75.0% 82.55 13.2% 37.48 45.8% 2.191 1.05% . 1792 4096 4 561.1 119.% 120.4 23.1% 36.94 64.2% 2.176 1.76% . 1792 4096 8 658.1 152.% 155.6 32.3% 25.53 51.4% 2.216 2.09%
As a friend kindly pointed out, my file size wasn't nearly big enough. Here are the tests again, 10gig file: dledford kernel: File Block Num Seq Read Rand Read Seq Write Rand Write Dir Size Size Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%) ------- ------ ------- --- ----------- ----------- ----------- ----------- . 10240 4096 1 57.94 19.5% 0.926 0.71% 36.45 33.0% 1.995 1.02% . 10240 4096 2 28.34 13.9% 1.029 0.82% 36.14 44.3% 1.983 1.26% . 10240 4096 4 22.36 10.7% 1.159 0.96% 35.81 58.4% 1.879 1.84% . 10240 4096 8 19.54 9.19% 1.269 1.19% 35.59 69.0% 1.837 2.11% RHEL3 Q3 Update kernel: File Block Num Seq Read Rand Read Seq Write Rand Write Dir Size Size Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%) ------- ------ ------- --- ----------- ----------- ----------- ----------- . 10240 4096 1 21.04 14.2% 0.653 1.08% 36.86 32.6% 1.740 0.89% . 10240 4096 2 18.11 12.1% 0.807 1.39% 36.29 43.7% 1.795 1.20% . 10240 4096 4 16.81 9.98% 0.911 1.53% 35.77 59.6% 1.763 1.76% . 10240 4096 8 16.03 8.88% 1.004 1.63% 34.30 67.8% 1.772 2.04% The results aren't quite "shocking", but there is still decent speed improvement. What are the chances of seeing this pushed out into production updates?
performance problem was solved as a result of the test in kernel-smp-2.4.21-9.EL.noaffine2. The test used the program of Bugzilla ID:#104633. The following is the summary of the result. [RHEL v2.1] file size:10485760 block size Processing time 2048 77.897604sec 4096 40.514986sec 8192 20.426124sec 16384 10.270638sec 32768 5.371835sec 65536 2.794967sec 131072 1.556833sec [RHEL v3(kernel-smp-2.4.21-9.EL.noaffine2)] file size:10485760 block size Processing time 2048 77.532978sec 4096 40.567836sec 8192 20.330292sec 16384 10.42086sec 32768 5.347417sec 65536 2.793838sec 131072 1.543779sec
Comments from Lans Carstensen of Dreamworks - Great news, this worked just fine for us. Good work! Now we just need this in a formal upcoming release. Will this make U2?
Discussion are going on internally about what to do on this issue. Removing the scsi-affine patch solved the problem, but will create other problems under different workloads. Whether to just yank the patch or to try and write a new patch that solves the performance problems under all workloads is the discussion. Note: Since this test kernel has solved people's problems, for now it is *not* being removed from the people.redhat.com web site. It will remain available until an official update kernel with the problem solved is released.
Two ia64 customers would like to try out the patch too. Could you add it ?
A complete set of kernel RPMs for all arches has been put in place of the original set of RPMs on my web site.
Will this fix be included in the RHEL3 errata for CAN-2004-0075 reference: https://rhn.redhat.com/network/errata/details/index.pxt?eid=2015
*** Bug 115273 has been marked as a duplicate of this bug. ***
I am running Mysql database server on Intel SRCS14L SATA RAID controller (gdth module, btw there is typo in certified hardware list) and this update helped totally. Before, when The system was backupping, all system´s resource were in IOWAIT state. Now, I can take any kind of backup, and service is working fine. I am wondering that why this update is not published in RHN, because now other kernel updates could not be installed by up2date.
It's slated for the next update kernel (which is different than an errata update).
Might I inquire as to when we'll see the next kernel update in the mainstream production product? I've made similar observations to the problems noted in previous comments. Our RH 7.3 server is at 2.4.20-20.7smp and the new RHEL 3 AS system is at 2.4.21-9.0.1.ELsmp. Both have largely identical hardware (2 x 2.8GHz Xeon, 1 GB memory, Adaptec 39160, U160 disks). The test used the program of Bugzilla ID:#104633. File size: 10485760. Test iterations gte 3. RHEL 3 AS RH 7.3 block size time (sec) time (sec) 2048 5.32139 2.62641 4096 4.95804 1.00350 8192 2.84885 1.73938 16384 1.75112 0.51394 32768 1.18536 0.32445 65536 0.91370 0.23469 131072 0.77962 0.19020 Any information on the ETA would be appreciated. I would like to solve the throughput issue before the new RHEL system is put into service.
I'll just add a "me too" comment. This bug bites us too on three Poweredge 2550 with kernel 2.4.21-9.0.1EL, making different apps that do lots of write's() slow down considerably. Looking forward to the update ;)
The problem also affects writing small files. Testing an SMTP server accepting 10K messages (avg. size 20K, out of test set of 50 email, all but 2 are 3-5K in size), showed the following results: Version Kernel Time to accept 10k messages ============================================================== RH 7.3: kernel-2.4.20-28.7 470 sec RHEL 3.0ES kernel-2.4.21-4.EL 504 sec " kernel-2.4.21-9.EL 504 sec " kernel-2.4.21-9.01.EL 500 sec " kernel-2.4.21-9.EL.noaffine2 430 sec
Doug: cf. Comment #17, any chance you could build "noaffine" RPMs of the current update kernel (2.4.21-9.0.1.EL)? Or detail briefly how to do the rebuild (just removing the patch from the spec file causes the build to fail on a RHEL 3.0 AS box)? This issue appears to be the culprit -- the noaffine2 kernels fixed it -- in absolutely killing our read performance towards a Dell|EMC CX500 (SAN); but unfortunately I can't use the 2.4.21-9.EL.noaffine2 kernel since the EMC software (binary kernel modules *sigh*) won't work with anything but the 2.4.21-9.0.1.EL kernel. With a looming deadline and the threat of having to revert this project to use `doze instead, (the boss is getting skittish) you might say I'm rather anxious to find a workaround fairly soon. IOW I'd be happy to do any testing (or providing tiobench results or whatever) you would find helpful. :-|
Count me in too as a member of the "I need a fix for this now" list. Management is ready to toss the whole Red Hat project altogether since our NFS performance is so poor.
Count us in as well. Write performance on our external SCSI-to-IDE array with 12 7200rpm spindles went from 55-65Megabytes/second under Redhat 8.0 down to 14-24Megabytes/second under Redhat 3.0 ES. I'm having a hard time trying to convince the boss that we spend over a thousand dollars on this Redhat Enterprise license for "increased performance and stability", when in fact we've experienced just the opposite. We'd be happy to test a patched kernel as well.
The kernel that includes this fix is already available in the Beta channel of RHN. So, for instance, if you go into your RHN account, select the machine, select channels, go into the Beta channel, then go to the third page of k packages, you should see the kernel-2.4.21-12.EL packages. Those include this fix. You should be able to schedule them for installation then via RHN.
Initial -- and I emphasise *initial :-) -- testing with the 2.4.21-12.EL kernel suggests a tenfold improvement in Sequential Read rates in a single-thread test. I'm currently spinning through my test matrix (with tiobench) and will post the results here when done. Thanks Doug!
The 2.4.21-12.EL kernel (similarly to the noaffine kernel) yields about 45-50% (23-25 MB/s) of the read performance I used to have with RHEL 2.1 AS/2.4.9-e.27smp (50 MB/s) on a set of Dell PE1750s (PERC4/Di; RAID1; Maxtor Atlas 36G 10K U320 disks). This is with the default max-readahead=31 and min-readahead=3 settings. By using the QU1 production kernels (2.4.21-9 or 9.0.1) but setting the max-readahead to something like 2048 for instance gives ~70% of the original RHEL2.1 read rates. Without this, with the default values, the read performance is a mere ~10-12% (5-6 MB/s). The combination of the two (i.e. noaffine/beta kernel plus increasing max-readahead) brings me to up to ~60% of the RHEL 2.1 read. On another set of PE1750s (same ROMB and RAID setting but Seagate disks), the read performance with the RHEL3 QU1 kernels and the default settings is more or less acceptable (~30 MB/s) but tweaking max-readahead brings it up to 40MB/s. On RHEL 2.1 systems, increasing the defult max-readahead has no effect or has an adverse affect on the generally good read performance. Any chance to see the same (at least) read performance as with RHEL 2.1? Thanks, --Gabor
I suspect the answer to that is no. There is a huge amount of difference between the AS2.1 and RHEL3 virtual memory subsystems. I suspect that the difference in read rates is that the RHEL3 VM is slower at allocating pages for reading and therefore slowing down the read rate, but better at swap related activities.
Just FYI, these are my initial test results on the new kernel. Running the original test code from Bugzilla ID:#104633 gives the following comparison: 2.4.21-9.0.1.ELsmp 2.4.21-12.EL.smp %-diff ---------------+---------------------+---------------------+------- 2048 5.2999 sec 2.5625 sec 210 % 4096 4.9348 sec 1.5760 sec 313 % 8192 2.8311 sec 0.8484 sec 333 % 16384 1.7320 sec 0.4826 sec 358 % 32768 1.1782 sec 0.3053 sec 385 % 65536 0.9138 sec 0.2164 sec 422 % 131072 0.7780 sec 0.1728 sec 450 % We main deal with large data files, which are continuously written and rewritten directly over NFS. This means a more accurate indicator for us are test results from programs like bonnie++ v1.03. Here they are, filesize is 2G: 2.4.21-9.0.1.ELsmp 2.4.21-12.EL.smp %-diff ----------------+---------------------+---------------------+------- Seq write: 57940 K/sec 54174 K/sec 94 % Seq rewrite: 13321 K/sec 20372 K/sec 153 % Seq read: 19975 K/sec 47501 K/sec 238 % Random Seeks: 237.0 seeks/sec 370.35 seeks/sec 156 % Obviously, as is visible in both tests, the 2.4.21-12.EL.smp kernel grants very improved throughput results, placing us back into the "acceptible" range across the board for our implimentation. Thanks, Doug, for getting this much needed fix to us! It is much appreciated.
*** Bug 102194 has been marked as a duplicate of this bug. ***
Any chance the patch that resolved this issue could be attached to the bug?
It wasn't a simple patch. It was drop the affine queuing patch from the spec file, remove the last patch hunk from the end of the affine queuing patch and tack it onto the selected-acbits patch, then rebuild.
Note that a kernel with that patch is already available in the beta channel for RHEL 3.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-188.html
*** Bug 114054 has been marked as a duplicate of this bug. ***