Red Hat Bugzilla – Bug 183598
high iowaits with long transfers
Last modified: 2007-11-30 17:07:09 EST
Description of problem:
A IBM x345 with double Xeon 2.8GHz processors is connected to a diskarray (IBM
EXP400) via a ServerRaid 4Lx Ultra Scsi-controller. The diskarray has a Raid 5
configuration with two partitions which is used as physical volumes (PV) in a
volume group (VG). This VG is then used for /home
If I start a long copy (big file) will iowait go up to 90-100% on all processors
and the stay where for the rest of the copy operation. The system will get
unresponsive. This behaviour mostly occurs then a samba client starts a long
copy. However I have got it with nfs too.
If I make the copy to the internal disks is everything OK and iowait never goes
higher than 35%
I have another x345 but with a internal disks and raid 5 with one PV. If I make
a copy on that will iowait never go higher than 30%
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start a long copy (a big file)
2. See iowait rise to 90-100% on all processors
3. Get unsresponsive system
Copying lots of smaller files is no problem.
Please provide a sysreport (or at least /var/log/messages).
Make sure you have the latest firmware for the ServerRaid 4Lx.
I believe this controller has a battery-backed up cache, and the cache is not
used if the battery is dead. So please check the ServerRaid BIOS utility for any
errors. Also, let me know the firmware settings you are using for the RAID 5
(e.g. does it let you set the chunk size?).
I suspect you would not have this problem with a RAID 1. Is it feasible for you
to test this theory?
Whan did this start happening? Was there a version of Linux where you did not
have this problem?
Created attachment 126077 [details]
Sysreport provided above
Going to install the latest firmware. 7.12.07 instead of 7.10.18. This means
weekend work for me :-(
The controller has no battery-backed up cache installed. Raid 5 settings is
Strip unit size 8kb which is optimal for file/print servers. If the firmware
update makes no difference should I construct a raidset with 32 or 64kb stripesize?
It is a production system so I needed extra disks to test the raid 1 theory. I
created a Raid 1 setup with two 146 Gb disks and tried making long copies. The
iowait never got over 45% and the system remained responsive. Tested both with
and without LVM but that made no difference.
I am not sure I has ever worked a it should. During the first period we had a
lot of network errors but some small amount of them could have been this problem.
Firmware updated and problem still exists.
I also updated to Update 7 and kernel 2.4.21-40.ELsmp but the problem still exists.
Based on your comment #4, the problem seems specific to RAID 5 on the ServeRaid
adapter. RAID 1 performs well.
I am surprised this board does not have a battery backed up cache. I thought
they all did. You might ask IBM whether this would solve the problem. Maybe they
can also advise you about the optimal stripe size for your workload.
Beyond that, you can try elvtune
and you can try adjusting min/max-readahead:
See "Tuning the VM"
I have a report that the following values helped in at least one situation:
echo 8192 > /proc/sys/vm/max-readahead
echo 2048 > /proc/sys/vm/min-readahead
I am going to close this, since it appears to be a ServeRaid RAID 5 performance
problem. Re-open it if there is more to it than that.