106771 – RHEL3: Disk read performance issues

Bug 106771 - RHEL3: Disk read performance issues

Summary: RHEL3: Disk read performance issues

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tom Coughlan
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:	103278 104633 106486 114135 123148 139110 139551
Blocks:
TreeView+	depends on / blocked

Reported:	2003-10-10 13:54 UTC by Gary Mansell
Modified:	2007-11-30 22:06 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-19 13:29:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Disk I/O test (2.97 KB, text/plain) 2003-10-21 12:37 UTC, Nick Strugnell	no flags	Details
Redhat supplied Test IO Program (27.52 KB, application/octet-stream) 2003-10-29 14:24 UTC, Gary Mansell	no flags	Details
View All

Description Gary Mansell 2003-10-10 13:54:45 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
I have a Dell PE2650 with 2x 2.4GHz Zeons (Hyperthreading enabled), 2GB RAM,
2x18GB internal system disks in a HW mirror configuration using the PERC 3Di,
The external RAID unit is an EMC FC4500 with 9x73GB disks in a RAID 5
configuration. The RAID unit is connected using a QLogic QLA2310 FCAL card.

When I originally bought the machine I found a read performance problem with RH
AS 2.1 (fully up2dated) and had to install RH Linux 7.3 which did not seem to
have the same problem. I found that reads from both my external RAID unit and
the internal PERC mirrored system disks were awfully slow - about three times
slower than with Redhat Linux. There was definitely something wrong somewhere as
reads were slower than writes to the same device (one was RAID 1 and other RAID 5!)

The time has now come to add another 1TB of disk to the array and at the same
time I would like to upgrade the system to RH AS 3.0. I have built a spare
PE2650 with RH AS 3.0b2 and have found that the disk read performance is still a
problem with the internal PERC mirrored system disk. I have not been able to
test it against the external FC4500 as it is used continually on the production
system.

I have tried further tests in an effort to isolate the cause:

1) I have installed RH 9.0 on the spare PE 2650 described above and this does
not exhibit the read performance problem.
2) I have built a desktop Viglen PC with RH AS 3.0b2 and this does not
exhibit the read performance problem with either it's IDE system disk or an
externally attached SCSI disk or an externally attached Software RAID 0 set.
3) I have built the PE 2650 with RH AS 2.1 and see the read performance problem
with the internal PERC mirror but not with an external single SCSI disk or an
external Software RAID 0 set.

On the surface, it would appear that RH AS products have trouble reading from HW
RAID devices whereas the community versions of RH Linux do not.

This makes no sense to me though as the PERC system mirror is just
presented as a SCSI disk to the O/S via the aacraid module and the
external HW RAID 5 device is presented as a SCSI disk to the O/S via the qla2300
module.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Obtain a Dell PE2650 with PERC 3Di 
2.test read vs write performance to the internal system disk
3.
    

Actual Results:  reads are about 3x slower than with RH community linux and
reads are slower than writes to raid 1 device

Expected Results:  performance should be better

Additional info:

Comment 1 Nick Strugnell 2003-10-14 15:51:58 UTC

I should point out that originally (on 2.4.8-e.3) read performance was not 
only poor to the external raid (fibre channel) but to internal raid and even 
to ramdisk.

Comment 2 Nick Strugnell 2003-10-21 12:37:02 UTC

Created attachment 95343 [details]
Disk I/O test

Comment 3 Nick Strugnell 2003-10-21 12:58:56 UTC

I have attached a very simple testcase (Disk I/O test) that shows up the 
behaviour that Gary has seen. Ignoring the O_DIRECT results that the program 
also tests, I get the following results: 
 
Test server: HP/Compaq DL380G2 w/1GB RAM 
Each test consists of writing a 2GB file in 16KB buffers,closing it, reopening 
and reading back. All rates in MB/s 
 
RHEL AS2.1 (2.4.8-e.27smp) 
Internal RAID 0 (cciss):   write: 23.8 read: 8.2 
External RAID 5 (qla2300): write: 16.6 read: 9.0 
 
RHEL AS3B2 (2.4.21-1.1931.2.399.entsmp) 
Internal RAID 0 (cciss):   write: 13.9 read: 11.4 
External RAID 5 (qla2300): write: 12.4 read: 9.8 
 
RHL 9 (2.4.20-20.9smp) 
Internal RAID 0 (cciss):   write: 16.4 read: 16.8 
External RAID 5 (qla2300): write: 11.8 read: 18.0 
 
The concern is that while the write speed for RHEL 2.1 and RHEL 3B2 is 
comparable with that of RHL 9, the read speed is very much inferior.

Comment 5 Gary Mansell 2003-10-21 15:19:26 UTC

Here is the compile line for the testio program that Nick posted for those that
need it....

cc -Wall -D _GNU_SOURCE=1 -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -O
testio.c -o testio

Comment 6 Arjan van de Ven 2003-10-21 15:20:59 UTC

does echo 127 > /proc/sys/vm/readahead help ?

Comment 7 Gary Mansell 2003-10-27 17:32:41 UTC

Nick, Please can you let me know how to get the testio program to work with
largefiles as I guess that this is the problem that I am hitting when I use it
to try to run it to create a ~4Gb testfile, maybe I need different compile options:

[root@dfg tmp]# ./testio 16384 250000
count: 250000 buf_size: 16384
Wrote -189 MB in 143.2 seconds using O_DIRECT at -0.7 MB/s

If I run testio on my fully up2dated RHAS 2.1 machine's system disk, with a
testfile size that does not hit the largefile size issue above and is cacheable
in RAM, I get the following (Note the fast O_DIRECT read speed (from cache? I
thought it was supposed to be direct ???) and then note the slow async read
speed????):

[root@dfg tmp]# ./testio 16384 100000
count: 100000 buf_size: 16384
Wrote 1562 MB in 12.2 seconds using O_DIRECT at 68.1 MB/s
Read 1562 MB in 1.9 seconds using O_DIRECT at 434.7 MB/s
Wrote 1562 MB in 11.2 seconds using async at 74.3 MB/s
Read 1562 MB in 63.3 seconds using async at 13.1 MB/s

If I then increase the testfile size to just within the 2GB largefiles limit but
with a testfile that is just too large to cache in the 2GB RAM within my server,
I get the following (note the reduced O_DIRECT read speed now (no cacheing) and
also note the async speed being slower than O_DIRECT??):

[root@dfg tmp]# ./testio 16384 125000
count: 125000 buf_size: 16384
Wrote 1953 MB in 37.2 seconds using O_DIRECT at 28.0 MB/s
Read 1953 MB in 87.3 seconds using O_DIRECT at 11.9 MB/s
Wrote 1953 MB in 36.2 seconds using async at 28.7 MB/s
Read 1953 MB in 112.1 seconds using async at 9.3 MB/s

I have also tried to change the read-ahead value as suggested and found the
following when I ran the previous test again:

[root@dfg tmp]# cat /proc/sys/vm/max-readahead
31
[root@dfg tmp]# cat /proc/sys/vm/min-readahead
3
[root@dfg tmp]# echo 127 > /proc/sys/vm/max-readahead
[root@dfg tmp]# cat /proc/sys/vm/max-readahead
127
[root@dfg tmp]# ./testio 16384 125000
count: 125000 buf_size: 16384
Wrote 1953 MB in 22.8 seconds using O_DIRECT at 45.6 MB/s
Read 1953 MB in 105.0 seconds using O_DIRECT at 9.9 MB/s
Wrote 1953 MB in 37.9 seconds using async at 27.5 MB/s
Read 1953 MB in 86.4 seconds using async at 12.0 MB/s
[root@dfg tmp]#

Comment 8 Gary Mansell 2003-10-28 10:51:11 UTC

At the suggestion of Bastien from Redhat Enterprise support, I substituted the
up2dated RHAS 2.1 kernel (2.4.9-e.27smp) with the latest RH7.3 kernel
(2.4.20-20.7smp) to determine if the problem was in kernel space or user space.

To perform this change I also had to update the modutils package from 2.4.13-13
to 2.4.18-3.7x.

I then performed the simple dd test and found that read performance was good
again - ie it was about twice the write speed. I also ran Nick's testio program
and include the results below:

kernel 2.4.9-e.27smp:

[root@dfg tmp]# ./testio 16384 125000
count: 125000 buf_size: 16384
Wrote 1953 MB in 12.2 seconds using O_DIRECT at 85.5 MB/s
Read 1953 MB in 90.1 seconds using O_DIRECT at 11.5 MB/s
Wrote 1953 MB in 15.1 seconds using async at 68.8 MB/s
Read 1953 MB in 90.8 seconds using async at 11.4 MB/s

kernel 2.4.20-20.7smp:

[root@dfg tmp]# ./testio 16384 125000
count: 125000 buf_size: 16384
Wrote 1953 MB in 57.2 seconds using O_DIRECT at 18.2 MB/s
Read 1953 MB in 24.8 seconds using O_DIRECT at 41.9 MB/s
Wrote 1953 MB in 52.6 seconds using async at 19.8 MB/s
Read 1953 MB in 25.4 seconds using async at 40.9 MB/s

This makes it clear to me that the problem lies with the Kernel

Comment 10 Arjan van de Ven 2003-10-29 13:01:54 UTC

One word of caution: do not look at the O_DIRECT numbers; AS2.1 kernels ignore
O_DIRECT while later kernels do not, and O_DIRECT is similar in performance as
using O_SYNC, but also for reads..

Comment 11 Gary Mansell 2003-10-29 14:24:02 UTC

Created attachment 95574 [details]
Redhat supplied Test IO Program

This tarball was given to me by Nick Strugnell of Redhat to use to test IO
performance on my systems in place of his testio program. I understand that
this program is commonly used within Redhat to test disk performance.

Comment 12 Gary Mansell 2003-10-29 14:32:05 UTC

I have now run the testio benchmarking program on my test system and here are
the results:

Redhat AS 2.1 fully updated (2.4.9-e.27smp):

[root@dfg tmp]# ./tiotest -f2048 -b16384 -t1 -L -d/tmp
Error writing to file: Success
Error read from file: Success
Tiotest results for 1 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2048 MBs |   94.2 s |  21.732 MB/s |   0.1 %  |  26.5 % |
| Random Write   16 MBs |    0.7 s |  21.017 MB/s |   0.0 %  |   8.1 % |
| Read         2048 MBs |   41.9 s |  48.858 MB/s |   0.3 %  |   7.9 % |
| Random Read    16 MBs |    0.6 s |  24.389 MB/s |   0.0 %  |   0.0 % |
`----------------------------------------------------------------------'

Redhat 7.3 partially up2dated (2.4.18-27.7.xsmp):

[root@dfgsrv 21] /tmp > ./tiotest -f2048 -b16384 -t1 -L -d/tmp
Error writing to file: Success
Error in randomwrite, off=2082373632, read=-1, seeks=25 : : No space left on device
Error read from file: Success
Error in seek/read, off=2127396864, read=0, seeks=14 : : Success
Tiotest results for 1 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        1981 MBs |   68.4 s |  28.955 MB/s |   0.1 %  |  25.8 % |
| Random Write    0 MBs |    0.2 s |   2.255 MB/s |   0.0 %  |  78.9 % |
| Read         1981 MBs |   48.2 s |  41.070 MB/s |   0.3 %  |   9.8 % |
| Random Read     0 MBs |    0.0 s |   5.180 MB/s |   0.0 %  |   4.6 % |
`----------------------------------------------------------------------'

This program suggests there is minimal performance difference between the two
systems.

This is obviously contrary to what both Nick and myself have seen todate.

Can it be that the dd, iotest and iozone programs work differently on the two
different O/S's and that this is the reason for the differences that I am seeing
rather than there being a problem with the IO subsystem. Is it something to do
with the different memory management systems and how they deal with file cache?

I would be keen to hear someone's explanation for this....

Comment 13 Gary Mansell 2003-10-29 15:03:18 UTC

Further work with dd has suggested to me that RHAS 2.1 and RHL7.3 are different
in the way that they handle cacheing files. I have now re-run the dd tests with
a sync before and after and have found the following interesting results:

RHL 7.3 write test:

[root@dfgsrv 39] /tmp > sync ; time dd if=/dev/zero of=/tmp/testfile bs=16384
count=125000 ; time sync
125000+0 records in
125000+0 records out

real 60.105     user 0.117      sys 17.730      pcpu 29.69


real 8.411      user 0.000      sys 0.301       pcpu 3.57

You can see that it took 60s to write the file and then a further 8 to flush the
cache. This gives a write speed of 28.5Mb/s

RHAS 2.1 write test:

[root@dfg tmp]# sync ; time dd if=/dev/zero of=/tmp/testfile bs=16384
count=125000 ; time sync
125000+0 records in
125000+0 records out

real    0m12.105s
user    0m0.070s
sys     0m11.540s

real    1m1.945s
user    0m0.000s
sys     0m0.040s

This time it took 12s to write the file and 62s to flush the cache -  this is
completely different to the way the RHL7.3 handled the task. This gives a write
speed of 26.4Mb/s

This highlights to me the difference that I was seeing - the earlier dd tests
were not including the complete write of the file to disk and so the write
performance looked better than they should have and were completely different on
the two O/S's due to the different ways that file cacheing is handled.

As regards performing the read test again with dd, I renamed the file from
testfile to testfile2 in an effort to fool the cache (not sure if this works)
and performed the dd test again reading the new file:

RHL 7.3:

[root@dfgsrv 40] /tmp > mv testfile testfile2
[root@dfgsrv 41] /tmp > time dd if=/tmp/testfile2 of=/dev/null bs=16384
125000+0 records in
125000+0 records out

real 40.676     user 0.072      sys 4.750       pcpu 11.85

This gives a read speed of 47.9Mb/s (with the possibility that some of the file
was cached in RAM still)

RHAS 2.1:

[root@dfg tmp]# mv testfile testfile2
[root@dfg tmp]# time dd if=/tmp/testfile2 of=/dev/null bs=16384
125000+0 records in
125000+0 records out

real    0m31.219s
user    0m0.110s
sys     0m2.930s

This gives a read speed of 62Mb/s (with the possibility that some of the file
was cached in RAM still)

These read and write figures better agree with those given by the testio program.

Please can someone comment/confirm my suspicions that this is not a bug at all.

Thanks

Gary Mansell

Comment 14 Bastien Nocera 2003-11-03 11:15:25 UTC

Gary, could you please test that you get similar results on AS 2.1 and
on AS 3?

Comment 15 Werner Maes 2004-01-28 13:46:22 UTC

Hello

Still bad disk read performance still occurs, even in RHES 3 update 1.
I ran the redhat testio program on a fully updated Dell Poweredge 2600
Server with 3 18Gb disk in RAID 5.

with /proc/sys/vm/max-readahead = 31 (standard value !!!)

[root@localhost tiobench-0.3.3]# ./tiotest -f2048 -b16384 -t1 -L -d/home
Error writing to file: Success
Error read from file: Success
Tiotest results for 1 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2048 MBs |   50.7 s |  40.419 MB/s |   0.2 %  |  12.2 % |
| Random Write   16 MBs |    2.4 s |   6.625 MB/s |   0.0 %  |   1.3 % |
| Read         2048 MBs |  222.9 s |   9.189 MB/s |   0.1 %  |   4.2 % |
| Random Read    16 MBs |   10.8 s |   1.452 MB/s |   0.0 %  |   0.6 % |
`----------------------------------------------------------------------'

you notice the poor read performance !

however if you change the max-readahead parameter to 127 you get

[root@localhost tiobench-0.3.3]# ./tiotest -f2048 -b16384 -t1 -L -d/home
Error writing to file: Success
Error read from file: Success
Tiotest results for 1 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2048 MBs |   50.5 s |  40.537 MB/s |   0.2 %  |  11.4 % |
| Random Write   16 MBs |    2.3 s |   6.920 MB/s |   0.4 %  |   1.3 % |
| Read         2048 MBs |  100.0 s |  20.487 MB/s |   0.1 %  |   4.4 % |
| Random Read    16 MBs |   11.3 s |   1.384 MB/s |   0.0 %  |   0.4 % |
`----------------------------------------------------------------------'
read performance now: 20 Mb/s (which is not spectacular)

increasing the max-readahead parameter to 256

[root@localhost tiobench-0.3.3]# ./tiotest -f2048 -b16384 -t1 -L -d/home
Error writing to file: Success
Error read from file: Success
Tiotest results for 1 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2048 MBs |   50.4 s |  40.637 MB/s |   0.3 %  |  12.5 % |
| Random Write   16 MBs |    2.5 s |   6.261 MB/s |   0.0 %  |   1.2 % |
| Read         2048 MBs |   61.8 s |  33.142 MB/s |   0.2 %  |   5.7 % |
| Random Read    16 MBs |   10.5 s |   1.487 MB/s |   0.0 %  |   0.8 % |
`----------------------------------------------------------------------'

and the read performance increases to 33 Mb/s

my question is: what value should you choose?
why is 31 the default parameter if it leads to poor disk performance?

Comment 16 Tim Burke 2004-09-01 21:36:54 UTC

Dell suggests that this disk perf topic is their #1 concern for U4.

Adding to mustfix blocker list.

Comment 17 Tom Coughlan 2004-09-20 13:28:55 UTC

There have been several reports of poor I/O performance in RHEL 3, but
it appears that this particular BZ may be resolved.  Please confirm
the following summary, and update this BZ if you are still having I/O
performance problems with RHEL 3.

1. You initially observed that "dd" performance with RHEL 2.1 was less
than RHL 7.3, but when you included the time required to flush the
data from cache to disk (with a sync command), you found that the
performance was comparable. (Comment 13).

2. You found that the default max-readahead = 31 on RHEL 3 produced
poor performance.  When you increased this to 256 the problem was
solved. (Comment 15.)

A large readahead value is an advantage for sequential I/O patterns.
It is an disadvantage for random I/O.  The default was chosen as a
compromise. We believe it works well for the majority of the RHEL 3
workloads, and the parameter can be adjusted for the others.

Comment 20 Tom Coughlan 2005-09-19 13:29:43 UTC

This problem appears to be resolved, as stated in comment 17. The BZ has
remained open awaiting confirmation. Since no confirmation has been received, we
are assuming it is resolved, and the the BZ is being closed.

Note You need to log in before you can comment on or make changes to this bug.