Red Hat Bugzilla – Bug 129545
High iowait and system load while copying files on SATA raid drive
Last modified: 2013-07-02 22:21:30 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a3)
Description of problem:
Pretty similar to bug
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121434 but i
didn't want to pollute that bug anymore and this is happening with a
While copying files (either big or small files) from a computer
connected to a network share (samba, nfs, netatalk) or from a local
partition to another partition, the iowait figures go through the roof
(high 90's) and the whole system becomes unresponsive, load goes up
(as high as 6 or 7 depending on the time it takes to copy the files).
Dell PowerEdge 700
Dell CERC RAID5 Configuration (4 x 120 GB Seagate 359.8 GB according
Intel PIV 3.2 Ghz (HyperThreading)
1 GB RAM
RedHat 3.0 ES
Also installed Fedora Core 2 and 3 to see if the problem also occured
with the 2.6.x kernels, this was indeed the case, although less
visible but still way too high figures for both iowait and system load.
The module in question is aacraid
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Copy a (large) file from one partition to another or from a
computer connected to the server (samba, nfs, netatalk)
2. Open a console and use "top"
3. Watch the iowait stats and system load go up
Actual Results: After a while the system load goes up in to the
extremes (6 to 7) system becomes totally unresponsive and you will
have to wait until the load either drops or do a hard reboot.
Expected Results: System should not get unresponsive and the iowait
shouldn't be in the high 90's.
I'm seeing the same figures as presented in the above mentioned bug,
but if needed i could post some stats here. Also tried booting with a
non SMP kernel, didn't work.
Just reconfigured the raid drive to a RAID 1 config (giving me a
storage capacity of 240 GB, ext3). In this case the load remains very
low ( < ~0.5) system stays responsive, this is with the latest kernel
for ES 3.0 (2.4.21-18.ELsmp).
I copied a 4GB folder from my workstation to the samba shared folder
(/dev/sda9) on the server, as stated above, the load remained low.
I'm going to reconfigure it back to a RAID 5 config and see if i'm
getting the same results as with a RAID 1 config.
[root@heinekenserver root]# iostat -k
Linux 2.4.21-18.ELsmp (heinekenserver.localdomain) 08/19/2004
avg-cpu: %user %nice %sys %iowait %idle
4.67 0.03 2.50 8.67 84.14
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 37.65 139.00 2549.57 251329 4609821
sda1 0.02 0.05 0.01 97 17
sda2 4.12 19.84 23.32 35881 42164
sda3 5.85 45.15 55.66 81637 100644
sda4 0.00 0.00 0.00 2 0
sda5 0.02 0.08 0.02 141 36
sda6 0.01 0.05 0.00 84 0
sda7 3.63 27.82 48.83 50305 88284
sda8 0.11 0.12 1.11 221 2000
sda9 22.18 0.48 2420.62 865 4376676
we've noticed the same problem on a CERC RAID-1 config with RHEL 3
Update 2. We've filed a support ticket with Red Hat; it's ticket
354372 . Also at least one other person on Dell's forums is
experiencing a very similar problem - see
We've also noticed bug 92129 on bugzilla.redhat.com - different
controller (PERC rather than CERC), but we're wondering if the
excessive spinlock holds mentioned by one poster in that thread could
coincide with this problem.
Jeff, any thoughts?
Created attachment 102908 [details]
vmstat during problem occurrence
This 'vmstat 1 600' shows two instances when the bug seems to manifest for us.
The link you provided seems broken :) About the RAID 1 configuration,
that actually "solved" the problems for me, i was getting these insane
high iowait and system load figures with a RAID 5 config.
Still didn't get to revert it back to RAID 5, which i kind of need.
About <a href="show_bug.cgi?id=92129" title="ASSIGNED - (SCSI
AACRAID)kernel: aacraid: Host adapter reset request. SCSI hang ?">bug
92129</a> i did look in to that one, but i'm not getting time-outs
sorry about that, Martijn, I'll try again:
I can definitely confirm that a similar problem happens here on our
RAID-1 setup - perhaps it is worse on RAID-5, or perhaps we're seeing
two separate problems with similar symptoms?
We are also seeing hi iowait figures on a system using Core 3 and RAID 5 on a
3ware 9500-S. Any resolution to this issue? Does it still exist in RHEL4?
We are seeing it with a Clariion RAID 10 connected to a DL585 AMD, using
qlogic. It continues to bring down the site, either via a reboot or system
degradation. Running the latest linux kernel. Anybody have anything?
I am seeing this problem with a Dell CERC 6Ch raid controller on RHEL 4. My
driver version is 1.1-5.
I have a:
Dell Poweredge 1800
3.0 GHz Dual Pentium Xeon in 64 bit mode
4 GB RAM
3x80 SATA Drives connected to a Dell CERC 6ch RAID controller in a RAID-5
LVM is being used to manage the RAID device.
Up to date, with a non-tainted kernel.
Exactly the same cause. Copying files to the disk (in my case from a DVD)
results in extremely high iowait. The system becomes almost completely
unresponsive until the disk activity stops.
The unresponsivness lasts 5 minutes or so, which is long enough to cause network
timeouts, and so is a reliability problem (not just a performance problem).
I am more than happy to look into this, provide debugging information, try new
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.