Description of problem: I have 45 machines with 2 x WD320GB drives in each machine. As soon as any of the machines receives moderate load (such as accessing large number of files, acting as a webserver with 90 queries per second, where each query is file download request), either hda or hdc drive gets its dma turned off and machine gets very slow. I have tried various kernels, tried turning off ACPI in BIOS, passed acpi=off and noacpi to kernel (in grub.conf), it makes no difference. Version-Release number of selected component (if applicable): kernel-2.6.15-1.1824_FC4.i686.rpm How reproducible: very Steps to Reproduce: Run machine on 15 mbps with 80 to 90 web queries per second. Actual results: Jan 25 13:17:21 localhost kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 25 13:17:31 localhost kernel: hda: DMA timeout error Jan 25 13:17:31 localhost kernel: hda: dma timeout error: status=0x51 { DriveReady SeekComplete Error } Jan 25 13:17:31 localhost kernel: hda: dma timeout error: error=0x04 { DriveStatusError } Jan 25 13:17:31 localhost kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error } Jan 25 13:17:31 localhost kernel: hda: multwrite_intr: error=0x04 { DriveStatusError } Jan 25 13:17:31 localhost kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error } Jan 25 13:17:31 localhost kernel: hda: multwrite_intr: error=0x04 { DriveStatusError } Jan 25 13:19:02 localhost kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error } Jan 25 13:19:03 localhost kernel: hda: multwrite_intr: error=0x04 { DriveStatusError } Jan 25 13:19:03 localhost kernel: hda: multwrite_intr: status=0x51 { DriveReady SeekComplete Error } Jan 25 13:19:03 localhost kernel: hda: multwrite_intr: error=0x04 { DriveStatusError } Jan 25 13:19:04 localhost kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 25 13:19:04 localhost kernel: hda: DMA timeout error Jan 25 13:19:04 localhost kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Jan 25 13:19:04 localhost kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 25 13:19:05 localhost kernel: hda: DMA timeout error Jan 25 13:19:05 localhost kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Jan 25 13:19:06 localhost kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 25 13:19:06 localhost kernel: hda: DMA timeout error Jan 25 13:19:06 localhost kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } multcount = 0 (off) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 0 (off) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 38913/255/63, sectors = 320072933376, start = 0 img43:root ~ $ hdparm /dev/hdc /dev/hdc: multcount = 16 (on) IO_support = 1 (32-bit) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 38913/255/63, sectors = 320072933376, start = 0 img43:root ~ $ hdparm -t /dev/hda /dev/hda: Timing buffered disk reads: 6 MB in 3.18 seconds = 1.89 MB/sec /dev/hdc: Timing buffered disk reads: 20 MB in 3.01 seconds = 6.65 MB/sec img43:root ~ $ Expected results: No DMA timeouts Additional info: Reboot returns machine back to normal but only for few hours depending on the load... Here is some more data: img43:root ~ $ hdparm -i /dev/hda /dev/hda: Model=WDC WD3200JB-22KFA0, FwRev=08.05J08, SerialNo=WD-WCAMR1936348 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=65 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=off CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 *udma4 udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: device does not report version: * signifies the current active mode img43:root ~ $ lspci 00:00.0 Host bridge: VIA Technologies, Inc. P4M266 Host Bridge 00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP] 00:0f.0 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06 ) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78) 01:00.0 VGA compatible controller: S3 Inc. VT8375 [ProSavage8 KM266/KL266] Jan 13 05:56:45 localhost kernel: ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:pio Jan 13 05:56:45 localhost kernel: hda: WDC WD3200JB-22KFA0, ATA DISK drive Jan 13 05:56:45 localhost kernel: hda: max request size: 1024KiB Jan 13 05:56:45 localhost kernel: hda: 625142448 sectors (320072 MB) w/8192KiB Cache, CHS=38913/255/63, UDMA( 100) Jan 13 05:56:45 localhost kernel: hda: hda1 hda2 Jan 13 05:56:46 localhost kernel: EXT3 FS on hda1, internal journal Jan 14 01:23:20 localhost kernel: ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:pio Jan 14 01:23:20 localhost kernel: hda: WDC WD3200JB-22KFA0, ATA DISK drive Jan 14 01:23:20 localhost kernel: hda: max request size: 1024KiB Jan 14 01:23:20 localhost kernel: hda: 625142448 sectors (320072 MB) w/8192KiB Cache, CHS=38913/255/63, UDMA( 100) Jan 14 01:23:20 localhost kernel: hda: hda1 hda2 Jan 14 01:23:20 localhost kernel: EXT3-fs: hda1: orphan cleanup on readonly fs Jan 14 01:23:20 localhost kernel: EXT3-fs: hda1: 40 orphan inodes deleted Jan 14 01:23:21 localhost kernel: EXT3 FS on hda1, internal journal Some machines report EXT3-fs error (device hda1) in start_transaction: Journal has aborted EXT3-fs error (device hda1) in start_transaction: Journal has aborted EXT3-fs error (device hda1) in start_transaction: Journal has aborted shortly after, lock up, and require manual fsck to bring back up to operation. (we end up losing some data in the process). img43:root ~ $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 188 2468 106592 247044 0 0 28 3 13 30 7 14 34 45 0 1 188 1540 106744 247644 0 0 724 1308 4698 147 2 47 4 47 0 1 188 3328 105540 246792 0 0 1344 0 4621 210 3 43 13 42 2 3 188 1928 105600 248088 0 0 1348 0 5122 181 3 53 20 24 1 3 188 11440 104208 239900 0 0 1800 0 5685 98 1 68 0 31 1 2 188 7860 104276 242160 0 0 2324 0 5900 87 2 81 0 17 0 1 188 8824 104452 242780 0 0 752 1148 4649 129 3 56 0 41 0 1 188 7484 104696 243568 0 0 1020 0 4265 189 3 36 0 61 0 1 188 6220 104976 244384 0 0 1076 0 3360 226 6 23 0 71 0 1 188 4812 105248 245788 0 0 1672 44 4165 184 2 39 0 59 0 0 188 2764 105480 246920 0 0 1344 0 3798 255 5 24 5 66 0 0 188 2316 105552 247408 0 0 516 1632 5331 205 2 53 27 18 0 1 188 2064 104996 248196 0 0 1116 0 3697 201 2 25 7 66
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing due to lack of response.