|Summary:||RHEL4U4 cciss array performance much better than RHEL5|
|Product:||Red Hat Enterprise Linux 5||Reporter:||Peter Klotz <peter.klotz>|
|Component:||kernel||Assignee:||Tom Coughlan <coughlan>|
|Status:||CLOSED WONTFIX||QA Contact:||Red Hat Kernel QE team <kernel-qe>|
|Version:||5.2||CC:||aakpinar, coughlan, h.plankl, ito.kazuo, jarod, mike.miller, w.moser|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2012-07-20 22:02:15 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Peter Klotz 2007-03-28 10:19:37 UTC
Description of problem: We use HP ProLiant DL385 servers (2 dual-core Opteron 280, 2.4GHz, 8GB RAM, 2*72GB RAID1+0 [OS], 3*300GB RAID5 [VMs]) together with VMware Server 1.0.2 to virtualize most of our infrastructure. Recently we switched from RHEL4U4 x86_64 to RHEL5 x86_64 and noticed a severe performance degradation. Tasks that compile software in virtual machines are 50% to 100% slower than they were before. Parallel disk I/O in different virtual machines leads to very high CPU loads (up to 40) on the host machine that did not occur before the upgrade. Virtual RHEL3U8 machines run into SCSI timeouts and bus resets and have to be rebooted. Version-Release number of selected component (if applicable): kernel-2.6.18-8.el5 How reproducible: Always Steps to Reproduce: 1. The high CPU load of the host can be observed by running commands like "find /" in two different virtual machines in parallel. 2. 3. Actual results: High CPU load on host. SCSI timeouts in virtual machines. Poor performance of virtual machines. Expected results: Behavior of RHEL4U4. Additional info:
Comment 1 Peter Klotz 2007-04-02 10:36:17 UTC
The machine uses a Smart Array 6i RAID Controller. We can reproduce the I/O problems without VMware Server and its virtual machines. Simple tests with dd show that parallel read/write operations on the host (especially when performed on the RAID5 array) result in poor performance. RAID5 performance (reading 3GB, writing 1GB): [root@chip icorac_disk]# dd if=3GBtest of=/dev/null 6291456+0 records in 6291456+0 records out 3221225472 bytes (3.2 GB) copied, 188.196 seconds, 17.1 MB/s [root@chip machines]# dd if=/dev/zero of=1GBtest bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 130.694 seconds, 8.2 MB/s The write performance is bad even without a finalizing sync operation. RAID1 performance (reading 3GB, writing 1GB): [root@chip tmp]# dd if=3GBtest of=/dev/null 6291456+0 records in 6291456+0 records out 3221225472 bytes (3.2 GB) copied, 61.4583 seconds, 52.4 MB/s [root@chip tmp]# dd if=/dev/zero of=1GBtest bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.47527 seconds, 240 MB/s The drivers for RHEL4 and RHEL5 differ (according to modinfo): RHEL4 ... cciss 2.6.10.RH1 RHEL5 ... cciss 3.6.14-RH1 Maybe a change that was made to this driver explains our performance issue.
Comment 2 Jarod Wilson 2007-09-02 04:33:32 UTC
The cciss driver has received significant updates for rhel5.1. If you would, please give the latest rhel5.1 beta kernel a try and let us know if the performance problems persist. http://people.redhat.com/dzickus/el5/
Comment 3 Peter Klotz 2007-09-05 09:13:56 UTC
We had to reinstall RHEL4U4 since it is a production machine. The only machine I got left under RHEL5 has no RAID5 (only RAID1+0) and therefore is no good test candidate. Nevertheless I will try to perform some comparison between stock RHEL5 and the updated kernel you supplied. To show the performance difference I repeated the measurements from Comment #1 under RHEL4U4 on the RAID5 array: [root@chip machines]# time dd if=3GBtest of=/dev/null 6291456+0 records in 6291456+0 records out real 0m48.374s user 0m1.299s sys 0m9.015s [root@chip machines]# time dd if=/dev/zero of=1GBtest bs=1M count=1024 1024+0 records in 1024+0 records out real 0m4.725s user 0m0.002s sys 0m3.164s Comparison of RAID5 performance: RHEL4U4 RHEL5 Reading 3GB 48s 188s Writing 1GB 5s 130s I am aware that especially write performance is influenced by caching but since I used the same hardware this should not have been an issue. It seems that RHEL5 does not use caching at all. The disks are U320 SCSI 300GB 10K RPM HDDs so writing should be much faster than the measured 8.2 MB/s (see Comment #1).
Comment 4 Aldemir Akpinar 2007-09-13 08:06:53 UTC
I have installed the latest kernel (2.6.18-47PAE) to our production machine, on which we are having same problems. But this did not make any differences at all. Still having load averages around 1000. The machine I have is: HP DL380 G3 with 3x74GB disks having RAID5. The machine has plenty of RAM for a webserver (8GB). If you want I can provide more information. But I have to revert to machine back to RedHat4 (or some other Distribution I must say) soon since this issue makes our website sluggish and unusable.
Comment 5 Peter Klotz 2007-09-13 08:46:22 UTC
Finally I managed to add a RAID5 (using the already mentioned 300GB HDDs) to the remaining RHEL5 machine. The results are odd since even with the RHEL5 stock kernel (2.6.18-8) I obtained very good results. [root@brain vmtest]# uname -a Linux brain.tilak.ibk 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux [root@brain vmtest]# dd if=3GBtest of=/dev/null 6291456+0 records in 6291456+0 records out 3221225472 bytes (3.2 GB) copied, 17.692 seconds, 182 MB/s [root@brain vmtest]# dd if=/dev/zero of=1GBtest bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 5.16115 seconds, 208 MB/s There are two differences between both machines I used for testing. * The slow one uses 8GB RAM, the fast one only 3GB * Different firmware The firmware changelog does not mention any HDD performance issues fixed. Could the difference in RAM cause such an phenomenon? Since the 8GB machine is a production server (and currently running under RHEL4) it is not very easy to either reduce the amount of RAM or to upgrade the firmware.
Comment 6 Peter Klotz 2007-09-13 09:42:14 UTC
2.6.18-47 performs more or less equal to 2.6.18-8: [root@brain vmtest]# uname -a Linux brain.tilak.ibk 2.6.18-47.el5 #1 SMP Tue Sep 11 17:46:21 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux [root@brain vmtest]# dd if=3GBtest of=/dev/null 6291456+0 records in 6291456+0 records out 3221225472 bytes (3.2 GB) copied, 19.4392 seconds, 166 MB/s [root@brain vmtest]# dd if=/dev/zero of=1GBtest bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 7.1683 seconds, 150 MB/s Tomorrow we will shutdown our production machine and test it with 3GB RAM under RHEL5. This should confirm or rule out the firmware as the origin of our performance issues.
Comment 7 Herbert L. Plankl 2008-07-07 08:29:23 UTC
Now we've updated our production machine from RHEL4U4 to RHEL5.2. The parallel I/O performance remains really poor (in comparison to RHEL4U4). machine: see comment #1 OS: RHEL5.2 x86_64 VMware: VMware-server-1.0.6-91891 kernel: 2.6.18-92.el5
Comment 9 Mike Miller (OS Dev) 2008-11-10 16:16:55 UTC
What controller is being used?
Comment 10 Peter Klotz 2009-01-20 12:39:31 UTC
It is a HP Smart Array 6i RAID Controller (see Comment #1).
Comment 11 Kazuo Ito 2010-06-10 11:58:51 UTC
This might be the same as, or somewhat related to Bug 237605, closed because we haven't paid enough attention to it... I would like to suggest re-running of the test after changing the value of /sys/block/<device>/queue/nr_requests to its old default, 8192, or something lower, yet still higher than the current default of 128.
Comment 12 Tom Coughlan 2012-07-20 22:02:15 UTC
No reply since June '10. Assuming this is resolved. Closing.