Bug 214566 - DMA timeout errors on VT82C586A/B/VT82C686/A/B/VT823x/A/C
DMA timeout errors on VT82C586A/B/VT82C686/A/B/VT823x/A/C
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
6
All Linux
medium Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-08 07:25 EST by Tuomas Mursu
Modified: 2008-08-02 19:40 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-07 19:32:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Full startup log from /var/log/messages (28.33 KB, text/plain)
2006-11-08 07:25 EST, Tuomas Mursu
no flags Details
rtf of lspci -vvv (8.94 KB, application/rtf)
2006-12-06 07:50 EST, Ted Kaczmarek
no flags Details
lmsod dump (3.12 KB, application/rtf)
2006-12-06 07:51 EST, Ted Kaczmarek
no flags Details
cpuninfo, ioports, iomem (3.92 KB, application/rtf)
2006-12-06 07:55 EST, Ted Kaczmarek
no flags Details

  None (edit)
Description Tuomas Mursu 2006-11-08 07:25:18 EST
Description of problem:

I keep having DMA timeout errors and harddrive "lockups" and slowdowns. Usually
happens when both of my harddrives are reading/writing, but sometimes it seems
to be quite random. The drives somewhat recover after a while, but hdparm shows
all properties as zero (off).

I have two ATA harddrives connected to MSI motherboard with these chipsets:

# lspci | grep VIA
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
(rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)

Version-Release number of selected component (if applicable):
2.6.18-1.2798.fc6

How reproducible:
Doing something that causes harddrive activity

Actual results:
This is /var/log/messages after one of the worst cases:
Oct 30 12:23:55 yuki kernel: hda: dma_timer_expiry: dma status == 0x61
Oct 30 12:24:05 yuki kernel: hda: DMA timeout error
Oct 30 12:24:05 yuki kernel: hda: dma timeout error: status=0xd0 { Busy }
Oct 30 12:24:05 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:24:05 yuki kernel: hda: DMA disabled
Oct 30 12:24:05 yuki kernel: hdb: DMA disabled
Oct 30 12:24:05 yuki kernel: ide0: reset: success
Oct 30 12:25:13 yuki kernel: hdb: dma_timer_expiry: dma status == 0x60
Oct 30 12:25:13 yuki kernel: hdb: DMA timeout retry
Oct 30 12:25:13 yuki kernel: hdb: timeout waiting for DMA
Oct 30 12:25:13 yuki kernel: hdb: status error: status=0x58 { DriveReady SeekCom
plete DataRequest }
Oct 30 12:25:13 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:25:13 yuki kernel: hdb: drive not ready for command
Oct 30 12:25:19 yuki kernel: hdb: status timeout: status=0xd0 { Busy }
Oct 30 12:25:19 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:25:19 yuki kernel: hda: DMA disabled
Oct 30 12:25:19 yuki kernel: hdb: drive not ready for command
Oct 30 12:25:19 yuki kernel: ide0: reset: success
Oct 30 12:29:32 yuki kernel: hda: status timeout: status=0xd0 { Busy }
Oct 30 12:29:32 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:29:32 yuki kernel: hdb: DMA disabled
Oct 30 12:29:32 yuki kernel: hda: no DRQ after issuing MULTWRITE_EXT
Oct 30 12:29:33 yuki kernel: ide0: reset: success
Oct 30 12:30:21 yuki kernel: hda: status timeout: status=0xd0 { Busy }
Oct 30 12:30:21 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:30:21 yuki kernel: hda: no DRQ after issuing MULTWRITE_EXT
Oct 30 12:30:22 yuki kernel: ide0: reset: success
Oct 30 12:31:08 yuki kernel: hda: status timeout: status=0xd0 { Busy }
Oct 30 12:31:08 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:31:08 yuki kernel: hda: no DRQ after issuing MULTWRITE_EXT
Oct 30 12:31:08 yuki kernel: ide0: reset: success

Expected results:
No errors

Additional info:
Fedora Core 5 kernels (2.6.17) didn't cause any errors.
Comment 1 Tuomas Mursu 2006-11-08 07:25:19 EST
Created attachment 140644 [details]
Full startup log from /var/log/messages
Comment 2 Dan Brunetts 2006-11-16 02:51:37 EST
Hi,

I have a similar problem after upgrading from FC5 to FC6, very tedious messages
like:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

The HD is few weeks old and in good state, I have tested it with full
diagnostics sw provided by the factory (Hitachi).
On the same hd I have installed Knoppix 5.0.1, and there are not problems at all.
Then I can exclude hw problems at 100% as well, but I cannot use DMA and
disabled it. This is not a solution, my system is very tedious on long files
transfers.
On the net I found only a possible solution: patching kernel iosched, see
http://lkml.org/lkml/2006/8/27/108.

How reproducible:
/sbin/hdparm -Tt /dev/hda

I confirm: Fedora Core 5 kernels (2.6.17) didn't cause any errors.
Comment 3 Dan Brunetts 2006-11-16 03:10:42 EST
Problems concern 2.6.18-1.2849.fc6 kernel as well.
Comment 4 Volker 2006-11-18 17:28:16 EST
Ebenfalls auf 2.6.18-1.2849.fc6:
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

lspci | grep VIA
00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/
C PIPC Bus Master IDE (rev 06)

Comment 5 Dan Brunetts 2006-11-20 05:55:27 EST
In my opinion this bug should be set to URGENT severity.
Comment 6 Tim Waugh 2006-11-28 08:18:57 EST
I see these errors too, but with a different IDE controller:

# lspci | grep IDE
00:08.0 IDE interface: nVidia Corporation nForce3 IDE (rev a5)
# cat /proc/ide/hda/model 
ExcelStor Technology J8160
Comment 7 Ted Kaczmarek 2006-12-06 07:50:52 EST
Created attachment 142946 [details]
rtf of lspci -vvv
Comment 8 Ted Kaczmarek 2006-12-06 07:51:55 EST
Created attachment 142947 [details]
lmsod dump
Comment 9 Ted Kaczmarek 2006-12-06 07:55:05 EST
Created attachment 142948 [details]
cpuninfo, ioports, iomem
Comment 10 Volker 2006-12-19 14:56:46 EST
2.6.18-1.2868.fc6 #1 SMP has this problem still. Is anyone really working on it?
It's definitely bringing down any performance and annoys after so many weeks.
Comment 11 Dan Brunetts 2006-12-20 05:32:51 EST
I agree, for this problem I am seriously thinking to change Linux distribution.

This is my hd configuration on Compaq Presario 2701EA:

/sbin/lspci | grep Intel
00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge (rev 02)
00:01.0 PCI bridge: Intel Corporation 82830 830 Chipset AGP Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 41)
00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 (rev 01)
00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM AC'97 Audio
Controller (rev 01)
02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM)
Ethernet Controller (rev 41)


/sbin/hdparm -i /dev/hda

/dev/hda:

 Model=HTS541080G9AT00, FwRev=MB4OA60A, SerialNo=MPB4LAXKJMS2WM
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=7539kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 3a:  ATA/ATAPI-2 ATA/ATAPI-3
ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6

 
Comment 12 Tuomas Mursu 2006-12-20 06:06:39 EST
There's a kernel update to 2.6.19 coming sometime soon, I'm hoping it would fix
this.
Comment 13 Volker 2007-01-02 13:45:29 EST
2 months later and with 2.6.18-1.2869.fc6 #1 SMP it's still the same.
Comment 14 Tuomas Mursu 2007-01-16 07:19:51 EST
I updated to 2.6.19-1.2895.fc6 from updates-testing couple of days ago, and
haven't seen any DMA errors since. But I guess others should try it too and
report if these errors are really fixed.
Comment 15 Tim Waugh 2007-01-17 06:31:23 EST
I am still seeing these errors with 2.6.19-1.2895.fc6 on x86_64:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
Comment 16 Tuomas Mursu 2007-01-17 07:21:14 EST
Confirmed. I rebooted my box yesterday and the shutdown process hung at
unmounting the partitions, followed by very familiar errors. *sigh*
Comment 17 Dan Brunetts 2007-01-26 06:05:35 EST
Me too...
I am still seeing these errors with 2.6.19-1.2895.fc6 on x86_64:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

FC5 was better then FC6... I think it is getting worse
Comment 18 Volker 2007-02-15 13:52:00 EST
Still the same with 2.6.19-1.2911.fc6 #1 SMP...
Comment 19 Tuomas Mursu 2007-02-15 16:24:19 EST
2.6.19-1.2911.fc6 takes this even further. Now it's effecting my cd/dvd-drive
(HL-DT-ST DVDRAM GSA-4163B) too, rendering it totally unusable.

I put a disc in the drive and close the tray:
hdc: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdc: drive not ready for command
hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdc: drive not ready for command
(Repeating continuosly the same messages as above...)

I have to reboot the box to get the disc out. Not nice.
Comment 20 Chuck Ebbert 2007-02-16 14:53:41 EST
(In reply to comment #17)
> I am still seeing these errors with 2.6.19-1.2895.fc6 on x86_64:
> 
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> ide: failed opcode was: unknown

Those are real hardware errors. Your drive is probably failing.
BadCRC means there is a real problem...
Comment 21 Tuomas Mursu 2007-02-18 14:56:29 EST
I've noticed that my second harddrive's performance is really poor. Reading and
writing are both significantly slower than the first drive. Copying a cd-image
over ethernet directly to hdb takes several minutes, and I can hear the drive
"shriek". It sounds like the heads are reading/writing like hell but still can't
keep up. Same cd-image over ethernet to hda takes only about a minute, and the
drive stays quiet like it's supposed to (benchmarked this today).

So, to sum it up hda works fine, hdb doesn't. I tried to find out what's the
difference between the two, but I don't know how to gather all the info. So far,
here's hdparm stuff:

--------------------

# hdparm -i /dev/hda

/dev/hda:

 Model=SAMSUNG SP0822N, FwRev=WA100-32, SerialNo=S06QJ10Y959079
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=156368016
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 1:  ATA/ATAPI-1 ATA/ATAPI-2
ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

 * signifies the current active mode

# hdparm -i /dev/hdb

/dev/hdb:

 Model=SAMSUNG SV0802N, FwRev=TP100-24, SerialNo=S019J10X679393
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=156368016
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 ATA/ATAPI-2
ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

 * signifies the current active mode

--------------------

Only the last line seems to differ. Well this didn't help much, is there
anything else I could look at?
Comment 22 Dan Brunetts 2007-02-22 11:59:01 EST
SOLVED!!!

There is an incompatibility with HAL, just stop the HAL daemon:
> /etc/rc.d/rc.hald stop

That's all.

Bye
D.
Comment 23 Tim Waugh 2007-02-22 12:30:30 EST
"Narrowed down" rather than solved I think.  Can we work out why HAL causes this
error message?
Comment 24 Dan Brunetts 2007-02-23 05:46:21 EST
no worries, autofs is much better and reliable!
Comment 25 Jon Stanley 2007-12-31 17:10:10 EST
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!
Comment 26 Jon Stanley 2008-01-07 19:32:34 EST
Closing per previous comment.  If you can provide the requested information,
please feel free to re-open this bug.

Note You need to log in before you can comment on or make changes to this bug.