From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.1) Gecko/20040811 Description of problem: I have a locally attached parallel-IDE disk with write-caching disabled, formatted with ext3. When performing sustained serial disk write operations, I see severe UI freezes after a short initial delay even when no other applications are running. The system appears to become completely unresponsible for 10+ seconds at a time, every 3-5 seconds, for the duration of the write operations. HARDWARE: Dell Precision 530 (Dual Xeon 2.0Ghz, 1GB ram, 7200RPM Seagate Barracuda ATA-IV drive) OS: Redhat enterprise 3.0, update 3 DRIVE CONFIG: hdparm /dev/hda Configuration: /dev/hda: multcount = 16 (on) IO_support = 3 (32-bit w/sync) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 9729/255/63, sectors = 156301488, start = 0 Write caching is DISABLED with hdparm -W 0 /dev/hda Version-Release number of selected component (if applicable): kernel-2.4.21-20smp How reproducible: Always Steps to Reproduce: 1.Boot into X 2.launch a gnome-terminal, run "vmstat 1" 3. Disabled write caching on IDE drives (hdparm -W 0 /dev/hda) if it's not already. 4.launch another gnome-terminal, run "dd if=/dev/zero of=/tmp/testfile bs=1024 count=2000000" on an ext3 filesystem Actual Results: After 10-20 seconds the vmstat display begins stuttering, showing vastly irregular I/O, the UI freezes for 10+ seconds at a time constantly, clock stops, mouse refuses to move, nothing updates, disk lights are locked solid. Expected Results: This behavior appears at first glance to be specific to the redhat enterprise kernel. On a stock kernel.org 2.4.27 kernel, the exact same procedure yields no noticable UI impact, and "vmstat 1" shows writes happening at a steady rate. Additional info: I've tried to find these symptoms in bugzilla, but haven't found an exact match so far. The test kernels that appeared to work for the iowait issure have not worked for me. I've also tried the 2.4.20-23pre kernel with agp fix and it did not help. I've tried tuning pagecache down to "1 15 30" with no difference in results, same with the other likely VM tunables. I also tried tuning elvtune to various smaller values. No tuning has made the UI freezes go away. Turning hardware write caching on the IDE drive reduces the severity of the freezes, but they are still noticable. Also, this is unacceptable for my application. Also, the most recent stock kernel.org kernel I tried does not exhibit the same behavior.
I've been experimenting a little more with this, trying to go back to previous kernel releases. The data I see is not making a great deal of sense to me and I wish I had many more days to isolate it rigorously. 2.4.21-15.0.4smp still has these UI freezes like 2.4.20-20smp, although they take longer to manifest and are less severe.. On 15.0.4 they happen more severely if I use gnome-terminals in the reproduction procedure, and much less severely if I use xterms. I don't know why this would be, except that gnome-terminal uses pthreads and xterm does not. In 2.4.20-20 and 2.4.20-23beta both xterms and gnome-terminals exhibit the UI freezes about the same degree of severity. I'm wondering what can really cause such a severe UI freeze? Is there an interrupt problem, or deadlock-like condition happening somewhere in the threading with all of the IO? I hope this meager data is helpful and does not simply cloud the issue further.
Also to mention.. results are reproducable on other hardware. I have access to a large number of Dell Precision 530 and 650 dual-xeon workstations and they behave similarly.
I am having the exact same problem with my Dell Precision 360 desktop workstation. The system completely freezes for minutes at a time up when doing any sort of extensive IO, like compiling or taring large files. It is harder to tell since we don't run X on our Dell PowerEdge 1750 servers, but during times of heavy IO, the iowait climbs to near 100% and console response often freezes for minutes. I originally thought this was related to bug #121434, or other similar bugs, but kernel 2.4.21-27.0.1.ELsmp, which is supposed to have many pagecaching fizes in it didn't help. I now believe there is a hardware issue related to certain chips or controlers found in Dell computers. Can anyone from RedHat respond to this bug? Have you tested RHEL3 on newer Dell hardware and been able to reproduce this problem? Are you working with Dell on fixing this?
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.