Bug 214566

Summary: DMA timeout errors on VT82C586A/B/VT82C686/A/B/VT823x/A/C
Product: [Fedora] Fedora Reporter: Tuomas Mursu <tuomas.mursu>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 6CC: db64, jonstanley, tedkaz, twaugh, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-08 00:32:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full startup log from /var/log/messages
none
rtf of lspci -vvv
none
lmsod dump
none
cpuninfo, ioports, iomem none

Description Tuomas Mursu 2006-11-08 12:25:18 UTC
Description of problem:

I keep having DMA timeout errors and harddrive "lockups" and slowdowns. Usually
happens when both of my harddrives are reading/writing, but sometimes it seems
to be quite random. The drives somewhat recover after a while, but hdparm shows
all properties as zero (off).

I have two ATA harddrives connected to MSI motherboard with these chipsets:

# lspci | grep VIA
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
(rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)

Version-Release number of selected component (if applicable):
2.6.18-1.2798.fc6

How reproducible:
Doing something that causes harddrive activity

Actual results:
This is /var/log/messages after one of the worst cases:
Oct 30 12:23:55 yuki kernel: hda: dma_timer_expiry: dma status == 0x61
Oct 30 12:24:05 yuki kernel: hda: DMA timeout error
Oct 30 12:24:05 yuki kernel: hda: dma timeout error: status=0xd0 { Busy }
Oct 30 12:24:05 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:24:05 yuki kernel: hda: DMA disabled
Oct 30 12:24:05 yuki kernel: hdb: DMA disabled
Oct 30 12:24:05 yuki kernel: ide0: reset: success
Oct 30 12:25:13 yuki kernel: hdb: dma_timer_expiry: dma status == 0x60
Oct 30 12:25:13 yuki kernel: hdb: DMA timeout retry
Oct 30 12:25:13 yuki kernel: hdb: timeout waiting for DMA
Oct 30 12:25:13 yuki kernel: hdb: status error: status=0x58 { DriveReady SeekCom
plete DataRequest }
Oct 30 12:25:13 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:25:13 yuki kernel: hdb: drive not ready for command
Oct 30 12:25:19 yuki kernel: hdb: status timeout: status=0xd0 { Busy }
Oct 30 12:25:19 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:25:19 yuki kernel: hda: DMA disabled
Oct 30 12:25:19 yuki kernel: hdb: drive not ready for command
Oct 30 12:25:19 yuki kernel: ide0: reset: success
Oct 30 12:29:32 yuki kernel: hda: status timeout: status=0xd0 { Busy }
Oct 30 12:29:32 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:29:32 yuki kernel: hdb: DMA disabled
Oct 30 12:29:32 yuki kernel: hda: no DRQ after issuing MULTWRITE_EXT
Oct 30 12:29:33 yuki kernel: ide0: reset: success
Oct 30 12:30:21 yuki kernel: hda: status timeout: status=0xd0 { Busy }
Oct 30 12:30:21 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:30:21 yuki kernel: hda: no DRQ after issuing MULTWRITE_EXT
Oct 30 12:30:22 yuki kernel: ide0: reset: success
Oct 30 12:31:08 yuki kernel: hda: status timeout: status=0xd0 { Busy }
Oct 30 12:31:08 yuki kernel: ide: failed opcode was: unknown
Oct 30 12:31:08 yuki kernel: hda: no DRQ after issuing MULTWRITE_EXT
Oct 30 12:31:08 yuki kernel: ide0: reset: success

Expected results:
No errors

Additional info:
Fedora Core 5 kernels (2.6.17) didn't cause any errors.

Comment 1 Tuomas Mursu 2006-11-08 12:25:19 UTC
Created attachment 140644 [details]
Full startup log from /var/log/messages

Comment 2 Dan Brunetts 2006-11-16 07:51:37 UTC
Hi,

I have a similar problem after upgrading from FC5 to FC6, very tedious messages
like:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

The HD is few weeks old and in good state, I have tested it with full
diagnostics sw provided by the factory (Hitachi).
On the same hd I have installed Knoppix 5.0.1, and there are not problems at all.
Then I can exclude hw problems at 100% as well, but I cannot use DMA and
disabled it. This is not a solution, my system is very tedious on long files
transfers.
On the net I found only a possible solution: patching kernel iosched, see
http://lkml.org/lkml/2006/8/27/108.

How reproducible:
/sbin/hdparm -Tt /dev/hda

I confirm: Fedora Core 5 kernels (2.6.17) didn't cause any errors.


Comment 3 Dan Brunetts 2006-11-16 08:10:42 UTC
Problems concern 2.6.18-1.2849.fc6 kernel as well.

Comment 4 Volker 2006-11-18 22:28:16 UTC
Ebenfalls auf 2.6.18-1.2849.fc6:
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

lspci | grep VIA
00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/
C PIPC Bus Master IDE (rev 06)



Comment 5 Dan Brunetts 2006-11-20 10:55:27 UTC
In my opinion this bug should be set to URGENT severity.

Comment 6 Tim Waugh 2006-11-28 13:18:57 UTC
I see these errors too, but with a different IDE controller:

# lspci | grep IDE
00:08.0 IDE interface: nVidia Corporation nForce3 IDE (rev a5)
# cat /proc/ide/hda/model 
ExcelStor Technology J8160

Comment 7 Ted Kaczmarek 2006-12-06 12:50:52 UTC
Created attachment 142946 [details]
rtf of lspci -vvv

Comment 8 Ted Kaczmarek 2006-12-06 12:51:55 UTC
Created attachment 142947 [details]
lmsod dump

Comment 9 Ted Kaczmarek 2006-12-06 12:55:05 UTC
Created attachment 142948 [details]
cpuninfo, ioports, iomem

Comment 10 Volker 2006-12-19 19:56:46 UTC
2.6.18-1.2868.fc6 #1 SMP has this problem still. Is anyone really working on it?
It's definitely bringing down any performance and annoys after so many weeks.

Comment 11 Dan Brunetts 2006-12-20 10:32:51 UTC
I agree, for this problem I am seriously thinking to change Linux distribution.

This is my hd configuration on Compaq Presario 2701EA:

/sbin/lspci | grep Intel
00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge (rev 02)
00:01.0 PCI bridge: Intel Corporation 82830 830 Chipset AGP Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 41)
00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 (rev 01)
00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM AC'97 Audio
Controller (rev 01)
02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM)
Ethernet Controller (rev 41)


/sbin/hdparm -i /dev/hda

/dev/hda:

 Model=HTS541080G9AT00, FwRev=MB4OA60A, SerialNo=MPB4LAXKJMS2WM
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=7539kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 3a:  ATA/ATAPI-2 ATA/ATAPI-3
ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6

 

Comment 12 Tuomas Mursu 2006-12-20 11:06:39 UTC
There's a kernel update to 2.6.19 coming sometime soon, I'm hoping it would fix
this.

Comment 13 Volker 2007-01-02 18:45:29 UTC
2 months later and with 2.6.18-1.2869.fc6 #1 SMP it's still the same.

Comment 14 Tuomas Mursu 2007-01-16 12:19:51 UTC
I updated to 2.6.19-1.2895.fc6 from updates-testing couple of days ago, and
haven't seen any DMA errors since. But I guess others should try it too and
report if these errors are really fixed.

Comment 15 Tim Waugh 2007-01-17 11:31:23 UTC
I am still seeing these errors with 2.6.19-1.2895.fc6 on x86_64:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown


Comment 16 Tuomas Mursu 2007-01-17 12:21:14 UTC
Confirmed. I rebooted my box yesterday and the shutdown process hung at
unmounting the partitions, followed by very familiar errors. *sigh*

Comment 17 Dan Brunetts 2007-01-26 11:05:35 UTC
Me too...
I am still seeing these errors with 2.6.19-1.2895.fc6 on x86_64:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown

FC5 was better then FC6... I think it is getting worse

Comment 18 Volker 2007-02-15 18:52:00 UTC
Still the same with 2.6.19-1.2911.fc6 #1 SMP...


Comment 19 Tuomas Mursu 2007-02-15 21:24:19 UTC
2.6.19-1.2911.fc6 takes this even further. Now it's effecting my cd/dvd-drive
(HL-DT-ST DVDRAM GSA-4163B) too, rendering it totally unusable.

I put a disc in the drive and close the tray:
hdc: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdc: drive not ready for command
hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdc: drive not ready for command
(Repeating continuosly the same messages as above...)

I have to reboot the box to get the disc out. Not nice.

Comment 20 Chuck Ebbert 2007-02-16 19:53:41 UTC
(In reply to comment #17)
> I am still seeing these errors with 2.6.19-1.2895.fc6 on x86_64:
> 
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> ide: failed opcode was: unknown

Those are real hardware errors. Your drive is probably failing.
BadCRC means there is a real problem...


Comment 21 Tuomas Mursu 2007-02-18 19:56:29 UTC
I've noticed that my second harddrive's performance is really poor. Reading and
writing are both significantly slower than the first drive. Copying a cd-image
over ethernet directly to hdb takes several minutes, and I can hear the drive
"shriek". It sounds like the heads are reading/writing like hell but still can't
keep up. Same cd-image over ethernet to hda takes only about a minute, and the
drive stays quiet like it's supposed to (benchmarked this today).

So, to sum it up hda works fine, hdb doesn't. I tried to find out what's the
difference between the two, but I don't know how to gather all the info. So far,
here's hdparm stuff:

--------------------

# hdparm -i /dev/hda

/dev/hda:

 Model=SAMSUNG SP0822N, FwRev=WA100-32, SerialNo=S06QJ10Y959079
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=156368016
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 1:  ATA/ATAPI-1 ATA/ATAPI-2
ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

 * signifies the current active mode

# hdparm -i /dev/hdb

/dev/hdb:

 Model=SAMSUNG SV0802N, FwRev=TP100-24, SerialNo=S019J10X679393
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=156368016
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 ATA/ATAPI-2
ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

 * signifies the current active mode

--------------------

Only the last line seems to differ. Well this didn't help much, is there
anything else I could look at?

Comment 22 Dan Brunetts 2007-02-22 16:59:01 UTC
SOLVED!!!

There is an incompatibility with HAL, just stop the HAL daemon:
> /etc/rc.d/rc.hald stop

That's all.

Bye
D.

Comment 23 Tim Waugh 2007-02-22 17:30:30 UTC
"Narrowed down" rather than solved I think.  Can we work out why HAL causes this
error message?

Comment 24 Dan Brunetts 2007-02-23 10:46:21 UTC
no worries, autofs is much better and reliable!

Comment 25 Jon Stanley 2007-12-31 22:10:10 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!

Comment 26 Jon Stanley 2008-01-08 00:32:34 UTC
Closing per previous comment.  If you can provide the requested information,
please feel free to re-open this bug.