Bug 138656

Summary: Sustained heavy write IO causes severe UI freezes w/ IDE storage and write cache disabled
Product: Red Hat Enterprise Linux 3 Reporter: dmichaud <dmichaud>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: andre, k.georgiou, petrides, riel, smithj4
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:14:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dmichaud 2004-11-10 15:25:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.1)
Gecko/20040811

Description of problem:
I have a locally attached parallel-IDE disk with write-caching
disabled,  formatted with ext3.

When performing sustained serial disk write operations, I see severe
UI freezes after a short initial delay even when no other applications
are running. The system appears to become completely unresponsible for
10+ seconds at a time, every 3-5 seconds, for the duration of the
write operations.

HARDWARE:

Dell Precision 530 (Dual Xeon 2.0Ghz, 1GB ram, 7200RPM Seagate
Barracuda ATA-IV drive)

OS:
Redhat enterprise 3.0, update 3

DRIVE CONFIG:
hdparm /dev/hda
Configuration:
/dev/hda:
 multcount    = 16 (on)
 IO_support   =  3 (32-bit w/sync)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 9729/255/63, sectors = 156301488, start = 0

Write caching is DISABLED with hdparm -W 0 /dev/hda

Version-Release number of selected component (if applicable):
kernel-2.4.21-20smp

How reproducible:
Always

Steps to Reproduce:
1.Boot into X
2.launch a gnome-terminal, run "vmstat 1"
3. Disabled write caching on IDE drives (hdparm -W 0 /dev/hda) if it's
not already.
4.launch another gnome-terminal, run "dd if=/dev/zero of=/tmp/testfile
bs=1024 count=2000000" on an ext3 filesystem

    

Actual Results:  After 10-20 seconds the vmstat display begins
stuttering, showing vastly irregular I/O, the UI freezes for 10+
seconds at a time constantly, clock stops, mouse refuses to move,
nothing updates, disk lights are locked solid.

Expected Results:  This behavior appears at first glance to be
specific to the redhat enterprise kernel. 

On a stock kernel.org 2.4.27 kernel, the exact same procedure yields
no noticable UI impact, and "vmstat 1" shows writes happening at a
steady rate.

Additional info:

I've tried to find these symptoms in bugzilla, but haven't found an
exact match so far. The test kernels that appeared to work for the
iowait issure have not worked for me. I've also tried the 2.4.20-23pre
kernel with agp fix and it did not help.

I've tried tuning pagecache down to "1 15 30" with no difference in
results, same with the other likely VM tunables. I also tried tuning
elvtune to various smaller values. No tuning has made the UI freezes
go away.

Turning hardware write caching on the IDE drive reduces the severity
of the freezes, but they are still noticable. Also, this is
unacceptable for my application. Also, the most recent stock
kernel.org kernel I tried does not exhibit the same behavior.

Comment 1 dmichaud 2004-11-11 22:48:32 UTC
I've been experimenting a little more with this, trying to go back to
previous kernel releases. The data I see is not making a great deal of
sense to me and I wish I had many more days to isolate it rigorously.

2.4.21-15.0.4smp still has these UI freezes like 2.4.20-20smp,
although they take longer to manifest and are less severe.. On 15.0.4
they happen more severely if I use gnome-terminals in the reproduction
procedure, and much less severely if I use xterms. I don't know why
this would be, except that gnome-terminal uses pthreads and xterm does
not.

In 2.4.20-20 and 2.4.20-23beta both xterms and gnome-terminals exhibit
the UI freezes about the same degree of severity.

I'm wondering what can really cause such a severe UI freeze? Is there
an interrupt problem, or deadlock-like condition happening somewhere
in the threading with all of the IO? I hope this meager data is
helpful and does not simply cloud the issue further.

Comment 2 dmichaud 2004-11-11 22:50:51 UTC
Also to mention.. results are reproducable on other hardware. I have
access to a large number of Dell Precision 530 and 650 dual-xeon
workstations and they behave similarly.

Comment 3 Jason Smith 2005-01-18 16:22:34 UTC
I am having the exact same problem with my Dell Precision 360 desktop
workstation.  The system completely freezes for minutes at a time up
when doing any sort of extensive IO, like compiling or taring large
files. It is harder to tell since we don't run X on our Dell PowerEdge
1750 servers, but during times of heavy IO, the iowait climbs to near
100% and console response often freezes for minutes.

I originally thought this was related to bug #121434, or other similar
bugs, but kernel 2.4.21-27.0.1.ELsmp, which is supposed to have many
pagecaching fizes in it didn't help.  I now believe there is a
hardware issue related to certain chips or controlers found in Dell
computers.  Can anyone from RedHat respond to this bug?  Have you
tested RHEL3 on newer Dell hardware and been able to reproduce this
problem?  Are you working with Dell on fixing this?


Comment 4 RHEL Program Management 2007-10-19 19:14:30 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.