From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Description of problem: I have a Dell PE2650 with 2x 2.4GHz Zeons (Hyperthreading enabled), 2GB RAM, 2x18GB internal system disks in a HW mirror configuration using the PERC 3Di, The external RAID unit is an EMC FC4500 with 9x73GB disks in a RAID 5 configuration. The RAID unit is connected using a QLogic QLA2310 FCAL card. When I originally bought the machine I found a read performance problem with RH AS 2.1 (fully up2dated) and had to install RH Linux 7.3 which did not seem to have the same problem. I found that reads from both my external RAID unit and the internal PERC mirrored system disks were awfully slow - about three times slower than with Redhat Linux. There was definitely something wrong somewhere as reads were slower than writes to the same device (one was RAID 1 and other RAID 5!) The time has now come to add another 1TB of disk to the array and at the same time I would like to upgrade the system to RH AS 3.0. I have built a spare PE2650 with RH AS 3.0b2 and have found that the disk read performance is still a problem with the internal PERC mirrored system disk. I have not been able to test it against the external FC4500 as it is used continually on the production system. I have tried further tests in an effort to isolate the cause: 1) I have installed RH 9.0 on the spare PE 2650 described above and this does not exhibit the read performance problem. 2) I have built a desktop Viglen PC with RH AS 3.0b2 and this does not exhibit the read performance problem with either it's IDE system disk or an externally attached SCSI disk or an externally attached Software RAID 0 set. 3) I have built the PE 2650 with RH AS 2.1 and see the read performance problem with the internal PERC mirror but not with an external single SCSI disk or an external Software RAID 0 set. On the surface, it would appear that RH AS products have trouble reading from HW RAID devices whereas the community versions of RH Linux do not. This makes no sense to me though as the PERC system mirror is just presented as a SCSI disk to the O/S via the aacraid module and the external HW RAID 5 device is presented as a SCSI disk to the O/S via the qla2300 module. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Obtain a Dell PE2650 with PERC 3Di 2.test read vs write performance to the internal system disk 3. Actual Results: reads are about 3x slower than with RH community linux and reads are slower than writes to raid 1 device Expected Results: performance should be better Additional info:
I should point out that originally (on 2.4.8-e.3) read performance was not only poor to the external raid (fibre channel) but to internal raid and even to ramdisk.
Created attachment 95343 [details] Disk I/O test
I have attached a very simple testcase (Disk I/O test) that shows up the behaviour that Gary has seen. Ignoring the O_DIRECT results that the program also tests, I get the following results: Test server: HP/Compaq DL380G2 w/1GB RAM Each test consists of writing a 2GB file in 16KB buffers,closing it, reopening and reading back. All rates in MB/s RHEL AS2.1 (2.4.8-e.27smp) Internal RAID 0 (cciss): write: 23.8 read: 8.2 External RAID 5 (qla2300): write: 16.6 read: 9.0 RHEL AS3B2 (2.4.21-1.1931.2.399.entsmp) Internal RAID 0 (cciss): write: 13.9 read: 11.4 External RAID 5 (qla2300): write: 12.4 read: 9.8 RHL 9 (2.4.20-20.9smp) Internal RAID 0 (cciss): write: 16.4 read: 16.8 External RAID 5 (qla2300): write: 11.8 read: 18.0 The concern is that while the write speed for RHEL 2.1 and RHEL 3B2 is comparable with that of RHL 9, the read speed is very much inferior.
Here is the compile line for the testio program that Nick posted for those that need it.... cc -Wall -D _GNU_SOURCE=1 -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -O testio.c -o testio
does echo 127 > /proc/sys/vm/readahead help ?
Nick, Please can you let me know how to get the testio program to work with largefiles as I guess that this is the problem that I am hitting when I use it to try to run it to create a ~4Gb testfile, maybe I need different compile options: [root@dfg tmp]# ./testio 16384 250000 count: 250000 buf_size: 16384 Wrote -189 MB in 143.2 seconds using O_DIRECT at -0.7 MB/s If I run testio on my fully up2dated RHAS 2.1 machine's system disk, with a testfile size that does not hit the largefile size issue above and is cacheable in RAM, I get the following (Note the fast O_DIRECT read speed (from cache? I thought it was supposed to be direct ???) and then note the slow async read speed????): [root@dfg tmp]# ./testio 16384 100000 count: 100000 buf_size: 16384 Wrote 1562 MB in 12.2 seconds using O_DIRECT at 68.1 MB/s Read 1562 MB in 1.9 seconds using O_DIRECT at 434.7 MB/s Wrote 1562 MB in 11.2 seconds using async at 74.3 MB/s Read 1562 MB in 63.3 seconds using async at 13.1 MB/s If I then increase the testfile size to just within the 2GB largefiles limit but with a testfile that is just too large to cache in the 2GB RAM within my server, I get the following (note the reduced O_DIRECT read speed now (no cacheing) and also note the async speed being slower than O_DIRECT??): [root@dfg tmp]# ./testio 16384 125000 count: 125000 buf_size: 16384 Wrote 1953 MB in 37.2 seconds using O_DIRECT at 28.0 MB/s Read 1953 MB in 87.3 seconds using O_DIRECT at 11.9 MB/s Wrote 1953 MB in 36.2 seconds using async at 28.7 MB/s Read 1953 MB in 112.1 seconds using async at 9.3 MB/s I have also tried to change the read-ahead value as suggested and found the following when I ran the previous test again: [root@dfg tmp]# cat /proc/sys/vm/max-readahead 31 [root@dfg tmp]# cat /proc/sys/vm/min-readahead 3 [root@dfg tmp]# echo 127 > /proc/sys/vm/max-readahead [root@dfg tmp]# cat /proc/sys/vm/max-readahead 127 [root@dfg tmp]# ./testio 16384 125000 count: 125000 buf_size: 16384 Wrote 1953 MB in 22.8 seconds using O_DIRECT at 45.6 MB/s Read 1953 MB in 105.0 seconds using O_DIRECT at 9.9 MB/s Wrote 1953 MB in 37.9 seconds using async at 27.5 MB/s Read 1953 MB in 86.4 seconds using async at 12.0 MB/s [root@dfg tmp]#
At the suggestion of Bastien from Redhat Enterprise support, I substituted the up2dated RHAS 2.1 kernel (2.4.9-e.27smp) with the latest RH7.3 kernel (2.4.20-20.7smp) to determine if the problem was in kernel space or user space. To perform this change I also had to update the modutils package from 2.4.13-13 to 2.4.18-3.7x. I then performed the simple dd test and found that read performance was good again - ie it was about twice the write speed. I also ran Nick's testio program and include the results below: kernel 2.4.9-e.27smp: [root@dfg tmp]# ./testio 16384 125000 count: 125000 buf_size: 16384 Wrote 1953 MB in 12.2 seconds using O_DIRECT at 85.5 MB/s Read 1953 MB in 90.1 seconds using O_DIRECT at 11.5 MB/s Wrote 1953 MB in 15.1 seconds using async at 68.8 MB/s Read 1953 MB in 90.8 seconds using async at 11.4 MB/s kernel 2.4.20-20.7smp: [root@dfg tmp]# ./testio 16384 125000 count: 125000 buf_size: 16384 Wrote 1953 MB in 57.2 seconds using O_DIRECT at 18.2 MB/s Read 1953 MB in 24.8 seconds using O_DIRECT at 41.9 MB/s Wrote 1953 MB in 52.6 seconds using async at 19.8 MB/s Read 1953 MB in 25.4 seconds using async at 40.9 MB/s This makes it clear to me that the problem lies with the Kernel
One word of caution: do not look at the O_DIRECT numbers; AS2.1 kernels ignore O_DIRECT while later kernels do not, and O_DIRECT is similar in performance as using O_SYNC, but also for reads..
Created attachment 95574 [details] Redhat supplied Test IO Program This tarball was given to me by Nick Strugnell of Redhat to use to test IO performance on my systems in place of his testio program. I understand that this program is commonly used within Redhat to test disk performance.
I have now run the testio benchmarking program on my test system and here are the results: Redhat AS 2.1 fully updated (2.4.9-e.27smp): [root@dfg tmp]# ./tiotest -f2048 -b16384 -t1 -L -d/tmp Error writing to file: Success Error read from file: Success Tiotest results for 1 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 2048 MBs | 94.2 s | 21.732 MB/s | 0.1 % | 26.5 % | | Random Write 16 MBs | 0.7 s | 21.017 MB/s | 0.0 % | 8.1 % | | Read 2048 MBs | 41.9 s | 48.858 MB/s | 0.3 % | 7.9 % | | Random Read 16 MBs | 0.6 s | 24.389 MB/s | 0.0 % | 0.0 % | `----------------------------------------------------------------------' Redhat 7.3 partially up2dated (2.4.18-27.7.xsmp): [root@dfgsrv 21] /tmp > ./tiotest -f2048 -b16384 -t1 -L -d/tmp Error writing to file: Success Error in randomwrite, off=2082373632, read=-1, seeks=25 : : No space left on device Error read from file: Success Error in seek/read, off=2127396864, read=0, seeks=14 : : Success Tiotest results for 1 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 1981 MBs | 68.4 s | 28.955 MB/s | 0.1 % | 25.8 % | | Random Write 0 MBs | 0.2 s | 2.255 MB/s | 0.0 % | 78.9 % | | Read 1981 MBs | 48.2 s | 41.070 MB/s | 0.3 % | 9.8 % | | Random Read 0 MBs | 0.0 s | 5.180 MB/s | 0.0 % | 4.6 % | `----------------------------------------------------------------------' This program suggests there is minimal performance difference between the two systems. This is obviously contrary to what both Nick and myself have seen todate. Can it be that the dd, iotest and iozone programs work differently on the two different O/S's and that this is the reason for the differences that I am seeing rather than there being a problem with the IO subsystem. Is it something to do with the different memory management systems and how they deal with file cache? I would be keen to hear someone's explanation for this....
Further work with dd has suggested to me that RHAS 2.1 and RHL7.3 are different in the way that they handle cacheing files. I have now re-run the dd tests with a sync before and after and have found the following interesting results: RHL 7.3 write test: [root@dfgsrv 39] /tmp > sync ; time dd if=/dev/zero of=/tmp/testfile bs=16384 count=125000 ; time sync 125000+0 records in 125000+0 records out real 60.105 user 0.117 sys 17.730 pcpu 29.69 real 8.411 user 0.000 sys 0.301 pcpu 3.57 You can see that it took 60s to write the file and then a further 8 to flush the cache. This gives a write speed of 28.5Mb/s RHAS 2.1 write test: [root@dfg tmp]# sync ; time dd if=/dev/zero of=/tmp/testfile bs=16384 count=125000 ; time sync 125000+0 records in 125000+0 records out real 0m12.105s user 0m0.070s sys 0m11.540s real 1m1.945s user 0m0.000s sys 0m0.040s This time it took 12s to write the file and 62s to flush the cache - this is completely different to the way the RHL7.3 handled the task. This gives a write speed of 26.4Mb/s This highlights to me the difference that I was seeing - the earlier dd tests were not including the complete write of the file to disk and so the write performance looked better than they should have and were completely different on the two O/S's due to the different ways that file cacheing is handled. As regards performing the read test again with dd, I renamed the file from testfile to testfile2 in an effort to fool the cache (not sure if this works) and performed the dd test again reading the new file: RHL 7.3: [root@dfgsrv 40] /tmp > mv testfile testfile2 [root@dfgsrv 41] /tmp > time dd if=/tmp/testfile2 of=/dev/null bs=16384 125000+0 records in 125000+0 records out real 40.676 user 0.072 sys 4.750 pcpu 11.85 This gives a read speed of 47.9Mb/s (with the possibility that some of the file was cached in RAM still) RHAS 2.1: [root@dfg tmp]# mv testfile testfile2 [root@dfg tmp]# time dd if=/tmp/testfile2 of=/dev/null bs=16384 125000+0 records in 125000+0 records out real 0m31.219s user 0m0.110s sys 0m2.930s This gives a read speed of 62Mb/s (with the possibility that some of the file was cached in RAM still) These read and write figures better agree with those given by the testio program. Please can someone comment/confirm my suspicions that this is not a bug at all. Thanks Gary Mansell
Gary, could you please test that you get similar results on AS 2.1 and on AS 3?
Hello Still bad disk read performance still occurs, even in RHES 3 update 1. I ran the redhat testio program on a fully updated Dell Poweredge 2600 Server with 3 18Gb disk in RAID 5. with /proc/sys/vm/max-readahead = 31 (standard value !!!) [root@localhost tiobench-0.3.3]# ./tiotest -f2048 -b16384 -t1 -L -d/home Error writing to file: Success Error read from file: Success Tiotest results for 1 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 2048 MBs | 50.7 s | 40.419 MB/s | 0.2 % | 12.2 % | | Random Write 16 MBs | 2.4 s | 6.625 MB/s | 0.0 % | 1.3 % | | Read 2048 MBs | 222.9 s | 9.189 MB/s | 0.1 % | 4.2 % | | Random Read 16 MBs | 10.8 s | 1.452 MB/s | 0.0 % | 0.6 % | `----------------------------------------------------------------------' you notice the poor read performance ! however if you change the max-readahead parameter to 127 you get [root@localhost tiobench-0.3.3]# ./tiotest -f2048 -b16384 -t1 -L -d/home Error writing to file: Success Error read from file: Success Tiotest results for 1 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 2048 MBs | 50.5 s | 40.537 MB/s | 0.2 % | 11.4 % | | Random Write 16 MBs | 2.3 s | 6.920 MB/s | 0.4 % | 1.3 % | | Read 2048 MBs | 100.0 s | 20.487 MB/s | 0.1 % | 4.4 % | | Random Read 16 MBs | 11.3 s | 1.384 MB/s | 0.0 % | 0.4 % | `----------------------------------------------------------------------' read performance now: 20 Mb/s (which is not spectacular) increasing the max-readahead parameter to 256 [root@localhost tiobench-0.3.3]# ./tiotest -f2048 -b16384 -t1 -L -d/home Error writing to file: Success Error read from file: Success Tiotest results for 1 concurrent io threads: ,----------------------------------------------------------------------. | Item | Time | Rate | Usr CPU | Sys CPU | +-----------------------+----------+--------------+----------+---------+ | Write 2048 MBs | 50.4 s | 40.637 MB/s | 0.3 % | 12.5 % | | Random Write 16 MBs | 2.5 s | 6.261 MB/s | 0.0 % | 1.2 % | | Read 2048 MBs | 61.8 s | 33.142 MB/s | 0.2 % | 5.7 % | | Random Read 16 MBs | 10.5 s | 1.487 MB/s | 0.0 % | 0.8 % | `----------------------------------------------------------------------' and the read performance increases to 33 Mb/s my question is: what value should you choose? why is 31 the default parameter if it leads to poor disk performance?
Dell suggests that this disk perf topic is their #1 concern for U4. Adding to mustfix blocker list.
There have been several reports of poor I/O performance in RHEL 3, but it appears that this particular BZ may be resolved. Please confirm the following summary, and update this BZ if you are still having I/O performance problems with RHEL 3. 1. You initially observed that "dd" performance with RHEL 2.1 was less than RHL 7.3, but when you included the time required to flush the data from cache to disk (with a sync command), you found that the performance was comparable. (Comment 13). 2. You found that the default max-readahead = 31 on RHEL 3 produced poor performance. When you increased this to 256 the problem was solved. (Comment 15.) A large readahead value is an advantage for sequential I/O patterns. It is an disadvantage for random I/O. The default was chosen as a compromise. We believe it works well for the majority of the RHEL 3 workloads, and the parameter can be adjusted for the others.
This problem appears to be resolved, as stated in comment 17. The BZ has remained open awaiting confirmation. Since no confirmation has been received, we are assuming it is resolved, and the the BZ is being closed.