132584 – hdc: dma_timer_expiry: dma status == 0x21

Bug 132584 - hdc: dma_timer_expiry: dma status == 0x21

Summary: hdc: dma_timer_expiry: dma status == 0x21

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	132585 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-09-14 20:22 UTC by David Kaplan
Modified:	2015-01-04 22:09 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-07-29 05:18:03 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
2.6 DMA timeout logs and info (16.03 KB, text/plain) 2005-06-21 13:23 UTC, rambler8	no flags	Details
interrupts, lspci, lspci -n, uname -a, dmesg, lsmod (20.34 KB, text/plain) 2006-01-03 11:28 UTC, Monti	no flags	Details
View All

Description David Kaplan 2004-09-14 20:22:06 UTC

Description of problem:
I am using FC2 on a Dell Inspiron 1150 and have all sorts of DMA
timeouts that make the hard drive very slow.  I could try turning off
DMA, but that appears to really slow things down.  I have seen other
DMA type bugs and they all appear related, but no two appear to be
exactly the same.

What I really want to know is what is this doing to my hard drive? 
Should I turn DMA off?  

Version-Release number of selected component (if applicable):
FC2 with all the updates

How reproducible:
Always

Steps to Reproduce:
1. Save a big enough file to notice
2.
3.
  
Actual results:
DMA Timeouts.  dmesg output below

Expected results:
No DMA timeouts

Additional info:

hdc: dma_timer_expiry: dma status == 0x21
hdc: DMA timeout error
hdc: dma timeout error: status=0xd0 { Busy }
 
hdc: DMA disabled
ide1: reset: success
Losing some ticks... checking if CPU frequency changed.
hdc: DMA disabled
Losing some ticks... checking if CPU frequency changed.
Losing some ticks... checking if CPU frequency changed.
Losing some ticks... checking if CPU frequency changed.
Losing too many ticks!
TSC cannot be used as a timesource.
Possible reasons for this are:
  You're running with Speedstep,
  You don't have DMA enabled for your hard disk (see hdparm),
  Incorrect TSC synchronization on an SMP system (see dmesg).
Falling back to a sane timesource now.
hdc: dma_timer_expiry: dma status == 0x21
hdc: DMA timeout error
hdc: dma timeout error: status=0xd0 { Busy }
 
hdc: DMA disabled
ide1: reset: success

Comment 1 Bill Nottingham 2004-09-15 02:33:13 UTC

*** Bug 132585 has been marked as a duplicate of this bug. ***

Comment 2 Manuel Morales 2004-11-01 16:22:21 UTC

I have noticed a similar problem on my Dell Lattitude D600. The error
message I get is the same as originally posted. I also only have the
problem when dealing with large files.

I have found that if I set the transfer mode to udma2 (-d1 -X66), I
don't get these error messages. The disk is otherwise set for udma5,
which it should support.

Comment 3 Manuel Morales 2004-11-02 00:44:46 UTC

Here's the relevant section from my dmesg outut. I wonder if the the
Inspiron is also using and ICH4 chipset...

ICH4: IDE controller at PCI slot 0000:00:1f.1
PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11
ICH4: chipset revision 1
ICH4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:DMA, hdd:pio
hda: IC25N030ATMR04-0, ATA DISK drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

Comment 4 Ron Yorston 2004-11-14 09:46:05 UTC

I've started to see this issue on my Dell Latitude L400 laptop since
installing FC3 yesterday.  Until then I was using FC2 and didn't have
this problem.  Under FC2 I was getting 20MB/s from the disk; FC3 tries
to use DMA but fails and falls back to some terribly slow mode that
gives 0.7MB/s.  By adding 'PIIX4: ide=nodma' to the kernel line in
grub.conf I can get half-way decent performance without DMA, but we
should be able to do better.

IDE controller at PCI slot 0000:00:07.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xfcd0-0xfcd7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xfcd8-0xfcdf, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: FUJITSU MHN2200AT, ATA DISK drive
Using cfq io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

Comment 5 Ron Yorston 2004-11-18 10:36:13 UTC

The 2.6.9-1.678_FC3 kernel fixes this issue for me.  I note that the
log is now reporting hda:pio rather than hda:DMA, as above.

Comment 6 David Kaplan 2004-11-18 21:36:24 UTC

kernel-2.6.9-1.3_FC2 gives me the same problem as before.

Comment 7 Ron Yorston 2004-11-19 20:10:53 UTC

An update:  the 2.6.9-1.678_FC3 kernel fixes this issue in the case
where the system is cold booted.  After a reboot it still reports
hda:DMA and the disk runs very, very slowly.

Comment 8 Alan Cox 2004-11-28 15:37:43 UTC

If there is a difference between cold and warm boot behaviour that is almost
certainly a BIOS problem. You might try acpi=off and see if it then reboots sanely. 

Also does Manuel's suggestion in comment #2 help. If so it gives me something to
investigate further, either as a software bug or a cable limit we need to add.

Comment 9 David Kaplan 2004-11-28 18:56:00 UTC

What sort of info do you need?  I still have this problem with
kernel-2.6.9-1.6_FC2 and the above reporter states that he still sees it with FC3.

Comment 10 Ron Yorston 2004-11-29 10:13:03 UTC

The 2.6.9-1.681_FC3 kernel does finally seem to fix the problems I was
seeing.  I've tested this with the following sequence:

   cold boot 681:  OK
   reboot 681:     OK (reports hda:DMA)
   reboot 678:     OK
   reboot 678:     DMA errors on reading partition table, disk slow
   reboot 681:     DMA errors, disk slow
   cold boot 681:  OK
   reboot 681:     OK

Comment 11 Manuel Morales 2004-11-29 17:17:11 UTC

I wonder if the issue reported in comment #4 and noted as resolved in
comment #10 is different than that originally posted. I continue to
get the specific DMA error message below with kernel 2.6.9-1.681_FC3
on my Dell Latitude D600. So, at least two people are reporting these
specific DMA errors for Dell laptops, even with the most recent kernel
installed on FC3... I'm happy to provide whatever information is needed.

hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error
hda: dma timeout error: status=0xd0 { Busy }

ide: failed opcode was: unknown
hda: DMA disabled
ide0: reset: success

Comment 12 David Kaplan 2004-11-29 19:41:14 UTC

I am inclined to think that Ron's problem is a different one as well.
 Ron - do you get the dma timeout error that is mentioned above?

Comment 13 Ron Yorston 2004-11-29 20:58:30 UTC

My timeout error has a different status:

hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error
hda: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest }

ide: failed opcode was: unknown

I had to get that out of the log from yesterday.  Every attempt to
reproduce the problem today has failed.  Even the 667 kernel is
working properly today.

Comment 14 David Kaplan 2004-12-01 19:50:36 UTC

I definitely still get this error with FC3 and kernel-2.6.9-1.681_FC3.

dmesg shows:

hdc: dma_timer_expiry: dma status == 0x21
hdc: DMA timeout error
hdc: dma timeout error: status=0xd0 { Busy }

ide: failed opcode was: unknown
hdc: DMA disabled
ide1: reset: success
Losing some ticks... checking if CPU frequency changed.

Comment 15 David Kaplan 2004-12-01 21:09:35 UTC

I am changing the severity of this bug as my system is basically
unusable for heavy disk use with FC3.

Comment 16 Alan Cox 2004-12-01 21:24:13 UTC

dmkaplan: What effect does booting with acpi=off have. Also what
effect does disabling hal have

"service haldaemon stop"

Comment 17 David Kaplan 2004-12-01 23:40:32 UTC

Hmmm.  I didn't think FC2 used the hal daemon, so I thought this
wouldn't fix the problem (I am now using FC3).  I turned off the hal
daemon and it took me a while to get the problem to appear (lots of
gimp and openoffice stuff open at the same time), but yes the problem
still appears.  Next I will try rebooting with acpi=off to see I that
helps.  Keep tooned...

Comment 18 David Kaplan 2004-12-01 23:57:07 UTC

Rebooted with acpi=off.  Was able to reproduce the problem.  Then I
turned haldaemon off as well (both acpi and hald stopped).  Was able
to reproduce the problem again.

Back to the drawing board....

Comment 19 David Kaplan 2004-12-03 20:14:53 UTC

Is there anything else I can do to debug this problem?

Comment 20 David Kaplan 2004-12-14 20:44:57 UTC

For some reason I did not see Alan's original comment #8.  I tried
setting hdparm -d1 -X66 (i.e. changing the system from udma5 to
udma2).  This appears to make the DMA timeout problems go away, though
the system still appears quite sluggish.

Comment 21 David Kaplan 2005-01-04 00:24:29 UTC

This bug persists in the 2.6.9-1.724_FC3 kernel if you use udma5,
though the actual error code has changed (below).  udma2 is slow but
without the dma timeouts.

hdc: DMA timeout error
hdc: dma timeout error: status=0xd0 { Busy }

ide: failed opcode was: unknown
hdc: DMA disabled
ide1: reset: success
Losing some ticks... checking if CPU frequency changed.
Losing too many ticks!
TSC cannot be used as a timesource.
Possible reasons for this are:
  You're running with Speedstep,
  You don't have DMA enabled for your hard disk (see hdparm),
  Incorrect TSC synchronization on an SMP system (see dmesg).
Falling back to a sane timesource now.

Comment 22 william.hoffmann 2005-01-05 18:35:35 UTC

I've got the same pb & messages when I move big files on FC3
* kernel => 2.6.9-1.724_FC3
*/etc/sysconfig/hardisks:
USE_DMA=1 
MULTIPLE_IO=16
* dmesg | grep hda
    ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio
hda: IC25N040ATMR04-0, ATA DISK drive
hda: max request size: 1024KiB
hda: 78140160 sectors (40007 MB) w/1740KiB Cache, CHS=16383/255/63, UDMA(100)
hda: cache flushes supported
* /var/log/message:
Jan  5 18:51:36 william kernel: hda: dma_timer_expiry: dma status == 0x21
Jan  5 18:51:46 william kernel: hda: DMA timeout error
Jan  5 18:51:46 william kernel: hda: dma timeout error: status=0xd0 { Busy }
Jan  5 18:51:46 william kernel: 
Jan  5 18:51:46 william kernel: ide: failed opcode was: unknown
Jan  5 18:51:46 william kernel: hda: DMA disabled
Jan  5 18:51:46 william kernel: ide0: reset: success
Jan  5 18:52:24 william kernel: hda: dma_timer_expiry: dma status == 0x21
Jan  5 18:52:34 william kernel: hda: DMA timeout error
Jan  5 18:52:34 william kernel: hda: dma timeout error: status=0xd0 { Busy }
Jan  5 18:52:34 william kernel: 
Jan  5 18:52:34 william kernel: ide: failed opcode was: unknown
Jan  5 18:52:34 william kernel: hda: DMA disabled
Jan  5 18:52:34 william kernel: ide0: reset: success
Jan  5 18:53:21 william kernel: hda: dma_timer_expiry: dma status == 0x21
Jan  5 18:53:31 william kernel: hda: DMA timeout error
Jan  5 18:53:31 william kernel: hda: dma timeout error: status=0xd0 { Busy }
Jan  5 18:53:31 william kernel: 
Jan  5 18:53:31 william kernel: ide: failed opcode was: unknown
Jan  5 18:53:31 william kernel: hda: DMA disabled
Jan  5 18:53:32 william kernel: ide0: reset: success
Jan  5 18:54:02 william kernel: hda: dma_timer_expiry: dma status == 0x21
Jan  5 18:54:12 william kernel: hda: DMA timeout error
Jan  5 18:54:12 william kernel: hda: dma timeout error: status=0xd0 { Busy }
Jan  5 18:54:12 william kernel: 
Jan  5 18:54:12 william kernel: ide: failed opcode was: unknown
Jan  5 18:54:12 william kernel: hda: DMA disabled
Jan  5 18:54:12 william kernel: ide0: reset: success
Jan  5 18:54:15 william kernel: Losing too many ticks!
Jan  5 18:54:15 william kernel: TSC cannot be used as a timesource.  
Jan  5 18:54:15 william kernel: Possible reasons for this are:
Jan  5 18:54:15 william kernel:   You're running with Speedstep,
Jan  5 18:54:15 william kernel:   You don't have DMA enabled for your hard disk
(see hdparm),
Jan  5 18:54:15 william kernel:   Incorrect TSC synchronization on an SMP system
(see dmesg).
Jan  5 18:54:15 william kernel: Falling back to a sane timesource now.

* lspci
00:00.0 Host bridge: Intel Corp. 82855PM Processor to I/O Controller (rev 03)
00:01.0 PCI bridge: Intel Corp. 82855PM Processor to AGP Controller (rev 03)
00:1d.0 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB
UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB
UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB
UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corp. 82801 Mobile PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corp. 82801DBM (ICH4-M) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corp. 82801DBM (ICH4-M) IDE Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM
(ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
00:1f.6 Modem: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem
Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV28 [GeForce4 Ti 4200 Go
AGP 8x] (rev a1)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705M Gigabit
Ethernet (rev 01)
02:01.0 CardBus bridge: Texas Instruments: Unknown device ac47 (rev 01)
02:01.1 CardBus bridge: Texas Instruments: Unknown device ac4a (rev 01)
02:01.2 FireWire (IEEE 1394): Texas Instruments: Unknown device 802b
02:01.3 System peripheral: Texas Instruments: Unknown device 8204
* hdparm /dev/hda : 
 multcount    = 16 (on)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 16383/255/63, sectors = 40007761920, start = 0

Comment 23 Robert Fries 2005-01-05 22:33:19 UTC

I had a similar problem on a system when I upgraded from Redhat-9 to Fedora 
Core-3. Disabling DMA on the drive let the disk work but I was still getting 
crashes when there was any traffic on the ethernet.  An upgrade to 2.6.10 
didn't change the ethernet problem.  Then I realized that I had been using 
both the disk and ethernet OK when I did the install (the DVD was on another 
machine) and when I was using the rescue mode.  I checked the kernel used 
durring the rescue mode and it was the same as the kernel I was using when I 
booted from the hard disk.  Furthermore in rescue mode the disk had DMA turned 
on.  I then tried hitting the ethernet in single user mode and it worked.  So 
I started turning on the services that would be used in level 3 a bit at a 
time.  The problem came back when I turned on cpuspeed.  I then disabled 
cpuspeed, enabled disk DMA and rebooted to level 5.  All works OK.  It seems 
that there is some interaction with what cpuspeed does and both the disk and 
ethernet.  One additional note is that I tried booting into single user mode, 
starting up cpuspeed, and then shutting it down.  The first time I touched the 
ethernet I got a crash.  It seems that the crash is not associated with 
actions of the daemon but rather with a residual effect of previous actions of 
the daemon.  Below is a copy of cpuinfo for the machine I did this on: 
processor       : 0 
vendor_id       : CentaurHauls 
cpu family      : 6 
model           : 6 
model name      : VIA Samuel 
stepping        : 3 
cpu MHz         : 668.574 
cache size      : 128 KB 
fdiv_bug        : no 
hlt_bug         : no 
f00f_bug        : no 
coma_bug        : no 
fpu             : yes 
fpu_exception   : yes 
cpuid level     : 1 
wp              : yes 
flags           : fpu de tsc msr mce cx8 mtrr pge mmx pni 3dnow 
bogomips        : 1306.62

Comment 24 David Kaplan 2005-01-05 22:56:55 UTC

Robert - sounds like a different problem.  I have had no ethernet or install
problems.  Do you get the same dma timeout we are talking about?  If you aren't
using a Dell laptop, I suspect it isn't the same problem.

Comment 25 Alan Cox 2005-01-06 14:07:27 UTC

CPUspeed + VIA was a known problem. I believe DaveJ fixed that by disabling VIA
CPU speed in kernel errata ?

Comment 26 Robert Fries 2005-01-06 16:34:41 UTC

It was the same DMA timeout. I can't connect to that machine right now so I 
can't quote the log but I will do so later in case I missed something. I also 
did not have any trouble durring installation.  It was that contrast between 
things working durring installation and then failing after reboot that made me 
go looking for the difference.   This machine is  not a laptop so perhaps they 
are unrelated problems with the same symptom.

Comment 27 David Kaplan 2005-01-12 01:42:26 UTC

The DMA timeout problem continues to appear in udma5 mode for
kernel-2.6.10-1.737_FC3.

Comment 28 Andrei Arion 2005-02-08 16:28:21 UTC

I also have the same problem :

Jul 30 21:07:30 ada kernel: hdc: DMA timeout error
Jul 30 21:07:30 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Jul 30 22:43:45 ada kernel: hdc: DMA timeout error
Jul 30 22:43:45 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Jul 31 00:48:12 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Jul 31 00:48:12 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Aug  4 21:12:11 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Aug  4 21:12:11 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Aug  5 15:28:52 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Aug  5 15:28:52 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Aug 11 18:09:10 ada kernel: hdc: DMA timeout error
Aug 11 18:09:10 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Aug 11 18:09:40 ada kernel: hdc: DMA timeout error
Aug 11 18:09:40 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Aug 18 19:59:08 ada kernel: hdc: DMA timeout error
Aug 18 19:59:08 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Sep  2 21:49:14 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Sep  2 21:49:14 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Sep 20 07:12:46 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Sep 20 07:12:46 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Sep 22 17:03:11 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Sep 22 17:03:11 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Sep 22 22:25:07 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Sep 22 22:25:07 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Sep 22 22:25:07 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Sep 22 22:25:07 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Sep 22 22:26:54 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Sep 22 22:26:54 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d
Sep 30 15:32:36 ada kernel: hdc: DMA timeout error
Sep 30 15:32:36 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Sep 30 17:29:37 ada kernel: hdc: DMA timeout error
Sep 30 17:29:37 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Sep 30 18:06:06 ada kernel: hdc: DMA timeout error
Sep 30 18:06:06 ada kernel: hdc: dma timeout error: status=0xd0 { Busy }
Oct  1 20:57:03 ada kernel: hda: irq timeout: status=0xd0 { Busy }
Oct  1 20:57:03 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d


It appears only/mostly when I am doing a lot of I/O operations (writting/reading
several hundreds of MB with  BDB (a database application) ). I have an Inspiron
5100 and I allways thought that I have a bad hard-drive. I am running Fedora
Core 2, kernel 2.6.7 so i guess that is not a FC3 problem.



What hard disc do you have? Mine is:
Oct  1 18:39:05 ada smartd[1469]: Device: /dev/hdc, opened
Oct  1 18:39:05 ada smartd[1469]: Device: /dev/hdc, not found in smartd database.
Oct  1 18:39:05 ada smartd[1469]: Device: /dev/hdc, is SMART capable. Adding to
"monitor" list.
Oct  1 18:39:14 ada kernel:     ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings:
hdc:DMA, hdd:pio
Oct  1 18:39:14 ada kernel: hdc: IC25N040ATMR04-0, ATA DISK drive
Oct  1 18:39:14 ada kernel: hdc: max request size: 1024KiB
Oct  1 18:39:14 ada kernel: hdc: 78140160 sectors (40007 MB) w/1740KiB Cache,
CHS=16383/255/63, UDMA(100)
Oct  1 18:39:14 ada kernel:  hdc: hdc1 hdc2 hdc3 < hdc5 hdc6 hdc7 >
Oct  1 18:39:23 ada kernel: EXT3 FS on hdc5, internal journal

Andrei






(In reply to comment #0)
> Description of problem:
> I am using FC2 on a Dell Inspiron 1150 and have all sorts of DMA
> timeouts that make the hard drive very slow.  I could try turning off
> DMA, but that appears to really slow things down.  I have seen other
> DMA type bugs and they all appear related, but no two appear to be
> exactly the same.
> 
> What I really want to know is what is this doing to my hard drive? 
> Should I turn DMA off?  
> 
> Version-Release number of selected component (if applicable):
> FC2 with all the updates
> 
> How reproducible:
> Always
> 
> Steps to Reproduce:
> 1. Save a big enough file to notice
> 2.
> 3.
>   
> Actual results:
> DMA Timeouts.  dmesg output below
> 
> Expected results:
> No DMA timeouts
> 
> Additional info:
> 
> hdc: dma_timer_expiry: dma status == 0x21
> hdc: DMA timeout error
> hdc: dma timeout error: status=0xd0 { Busy }
>  
> hdc: DMA disabled
> ide1: reset: success
> Losing some ticks... checking if CPU frequency changed.
> hdc: DMA disabled
> Losing some ticks... checking if CPU frequency changed.
> Losing some ticks... checking if CPU frequency changed.
> Losing some ticks... checking if CPU frequency changed.
> Losing too many ticks!
> TSC cannot be used as a timesource.
> Possible reasons for this are:
>   You're running with Speedstep,
>   You don't have DMA enabled for your hard disk (see hdparm),
>   Incorrect TSC synchronization on an SMP system (see dmesg).
> Falling back to a sane timesource now.
> hdc: dma_timer_expiry: dma status == 0x21
> hdc: DMA timeout error
> hdc: dma timeout error: status=0xd0 { Busy }
>  
> hdc: DMA disabled
> ide1: reset: success

Comment 29 Tomislav Vujec 2005-02-15 20:38:23 UTC

The same problem appears on my Latitude D800. I am able to reproduce it by
copying large files. After the first timeout hdparm -i still shows udma5 as
selected, but transfer rate given by hdparm -t drops to 2.5 Mb/s. If I manually
select udma2 (hdparm -d1 -X66) the rate is back to 25 Mb/s.

Comment 30 David Kaplan 2005-02-15 20:56:58 UTC

I have heard reports that this has to do with the smartd daemon.  Has anyone
tried turning that off and then doing the test?

Also, scanning the web I have seen that forms of this problem have been bouncing
around in a number of distributions since the 2.4 kernels.

Comment 31 tony 2005-03-07 05:29:09 UTC

We have had the same problem with a number of dual Xeon boxes (Supermicro
X5DPR-iG2). 14 out of 16 showed dma timeouts over a 6 week period, after
upgrading to the 2.6.x kernel. There were no problems under several 2.4.x
kernels. Currently we run 2.6.10-1.766_FC3smp, but successive kernel upgrades
from the stock FC3 kernel have not helped. 

It therefore seems that this problem is not limited to laptops

I tried turning smartd off, but the timeouts persisted. So the suggestion that
turning off smartd might help (#30) seems to be incorrect.

I also tried switching to UDMA2 but timeouts persisted.

Most recently, I switched to udma1 and turned off acpi on boot (#8). We have
been running for 5 days with only 3 timeouts on 16 nodes; much less than before
and insufficient to provoke a switch to piix4. I would not call this a solution:
the read speed is down to 15MB/sec (from 50 in udma5), but it it preferable to
piix4.

I plan to wait a week or two and then try udma2.

Comment 32 Len Brown 2005-03-15 00:57:02 UTC

Is it true that the failure in this bug report seems
specific to system based on the ICH4 (and ICH-4M)?

Comment 33 Jon Campbell 2005-03-23 20:41:18 UTC

I have this problem, and I am also not using a laptop. This problem just 
started yesterday for me though, so I am going to check to see what packages 
where upgraded yesterday, and see if I can see the problem. The other funny 
thing is that my server is unreachable on any port but 25, replies to pings and 
TTL's.

Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21
Mar 23 03:41:16 zeus kernel: hda: DMA timeout error
Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Mar 23 03:41:16 zeus kernel:
Mar 23 03:41:16 zeus kernel: hda: status timeout: status=0xd0 { Busy }
Mar 23 03:41:16 zeus kernel:
Mar 23 03:41:16 zeus kernel: hda: drive not ready for command
Mar 23 03:41:16 zeus kernel: ide0: reset: success
Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21
Mar 23 03:41:16 zeus kernel: hda: DMA timeout error
Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x5a { DriveReady 
SeekComplete DataRequest Index }
Mar 23 03:41:16 zeus kernel:
Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21
Mar 23 03:41:16 zeus kernel: hda: DMA timeout error
Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Mar 23 03:41:16 zeus kernel:
Mar 23 03:41:16 zeus kernel: hda: status timeout: status=0xd0 { Busy }
Mar 23 03:41:16 zeus kernel:
Mar 23 03:41:16 zeus kernel: hda: drive not ready for command
Mar 23 03:41:16 zeus kernel: ide0: reset: success
Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21
Mar 23 03:41:16 zeus kernel: hda: DMA timeout error
Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Mar 23 03:41:16 zeus kernel:
Mar 23 03:41:16 zeus kernel: hda: status timeout: status=0xd0 { Busy }
Mar 23 03:41:16 zeus kernel:
Mar 23 03:41:16 zeus kernel: hda: drive not ready for command
Mar 23 03:41:16 zeus kernel: ide0: reset: success

When I get access to the server I will post the list of applications that were 
automatically updated to hopefully give you guys some ideas.

Comment 34 Tony Ladd 2005-03-25 04:55:06 UTC

To Len Brown (#32). I dont think so. My boxes use the 7501 chipset (S'micro
X5DPR-iG2+) which includes the ICH3-S I/O controller. I dont think it is a
hardware issue. I have similar boxes also with the 7501 (S'micro X5DPA-TGM) but
with SATA and these have no problem (using the SCSI based SATA driver). I think
it is just the IDE driver.

Comment 35 SlowTCP 2005-04-05 11:28:39 UTC

Hi!

Alan! Maybe this helps to locate it.

I own 8 linux machines. I have this problem for the kernel series 2.6.0 (-
test) came out! I use 

FC2 and FC3. If I use 2.4 (RedHat compiled or original from kernel.org with 
the necessary options 

only) it's everything ok. If I use 2.6 (RedHat compiled or original from 
kernel.org with the 

necessary options only) the problem comes out, BUT ONLY IF I USE Pentium 1 
class machines! No 

matters if I use ide=nodma, or /sbin/hdparm -d1 -X mdma1 /dev/hda1 (ok, a bit 
infrequent).
The problem exists in 2.6.11 too (and I've tried all the versions from 2.6.0).

One of them (lspci):

00:00.0 Host bridge: Intel Corp. 430TX - 82439TX MTXC (rev 01)
00:01.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 01)
00:01.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01)
00:01.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01)
00:01.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 01)
:-)  00:09.0 VGA compatible controller: S3 Inc. 86c325 [ViRGE] (rev 06)
00:0a.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 
30)

Just a few programs are running:

  PID TTY      STAT   TIME COMMAND
    1 ?        S      0:00 init [3]
    2 ?        SWN    0:00 [ksoftirqd/0]
    3 ?        SW<    0:00 [events/0]
    4 ?        SW<    0:00 [khelper]
    5 ?        SW<    0:00 [kblockd/0]
    6 ?        SW     0:00 [khubd]
   25 ?        SW     0:00 [pdflush]
   26 ?        SW     0:00 [pdflush]
   28 ?        SW<    0:00 [aio/0]
   27 ?        SW     0:00 [kswapd0]
  611 ?        SW     0:00 [kseriod]
  632 ?        SW     0:00 [kjournald]
  977 ?        SW     0:00 [kjournald]
 1505 ?        S      0:00 syslogd -m 0
 1509 ?        S      0:00 klogd -x
 1535 ?        S      0:01 /usr/sbin/sshd
 1718 ?        S      0:00 sshd: root@pts/0
 1720 pts/0    S      0:00 -bash
 1747 ?        S      0:01 sshd: root@pts/1
 1749 pts/1    S      0:00 -bash
 2584 ?        S<     0:00 /sbin/wland
 2809 tty1     S      0:00 /sbin/mingetty tty1
 2815 tty2     S      0:00 /sbin/mingetty tty2
 2816 tty3     S      0:00 /sbin/mingetty tty3
 2817 tty4     S      0:00 /sbin/mingetty tty4
 2818 tty5     S      0:00 /sbin/mingetty tty5
 2819 tty6     S      0:00 /sbin/mingetty tty6
 2998 pts/2    S      0:00 bash -rcfile .bashrc
 3020 pts/0    R      0:00 ps ax

And the log:

Apr  5 04:04:15 XXXXX kernel: hda: dma_timer_expiry: dma status == 0x21
Apr  5 04:04:25 XXXXX kernel: hda: DMA timeout error
Apr  5 04:04:25 XXXXX kernel: hda: dma timeout error: status=0x58 { DriveReady 
SeekComplete
Apr  5 04:04:25 XXXXX kernel:
Apr  5 04:04:25 XXXXX kernel: ide: failed opcode was: unknown
Apr  5 04:04:46 XXXXX kernel: hda: dma_timer_expiry: dma status == 0x21
Apr  5 04:04:56 XXXXX kernel: hda: DMA timeout error
Apr  5 04:04:56 XXXXX kernel: hda: dma timeout error: status=0x58 { DriveReady 
SeekComplete



I have no losing ticks.

Comment 36 David Kaplan 2005-04-05 18:35:06 UTC

Regarding the previous comment, this problem is not limited to Pentium I
machines.  I am using a laptop that is no more than 6 months old (i.e. it should
be a Pentium 4), and it has this problem.  It could be some sort of timing
constraint that applies to older machines and laptops perhaps.

Comment 37 Monti 2005-04-06 08:41:47 UTC

I have the same problem on a Dell Inspiron m500 with a ICH4 controller:

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH4: IDE controller at PCI slot 0000:00:1f.1
PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11
ICH4: chipset revision 1
ICH4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:DMA, hdd:pio
hda: IC25N030ATMR04-0, ATA DISK drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: max request size: 1024KiB
hda: 58605120 sectors (30005 MB) w/1740KiB Cache, CHS=16383/255/63, UDMA(100)
 /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4
hdc: SAMSUNG CDRW/DVD SN-324F, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15

The errors I get are these:

Apr  6 09:47:33 localhost kernel: hda: dma_timer_expiry: dma status == 0x21
Apr  6 09:47:44 localhost kernel: hda: DMA timeout error
Apr  6 09:47:44 localhost kernel: hda: dma timeout error: status=0xd0 { Busy }
Apr  6 09:47:44 localhost kernel:
Apr  6 09:47:44 localhost kernel: hda: DMA disabled
Apr  6 09:47:44 localhost kernel: ide0: reset: success

I run on a Debian system, so the problem is not isolated to Fedora kernels.  I 
get this error three or four times a week on my current kernel: 2.6.8-2-686-smp.
This is annoying, but I can live with it.

With 2.6.9 and 2.6.10 kernels, I get this problem much more frequently -- right 
past the point where I give up working on the machine.

I reset the harddrive with hdparm parameters -d1 -c1 -Xudma5.

Comment 38 Macy Gasp 2005-04-07 15:35:47 UTC

I have a similar problem with the 2.6.11 kernel, but running on a Gentoo.
 
My computer slows down A LOT when copying large files on my Seagate drive,
although DMA is enabled (UDMA5).

LSPCI:

0000:00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
Interface (rev 02)
0000:00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller
(rev 02)
0000:00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
0000:00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
0000:00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
#3 (rev 02)
0000:00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #4 (rev 02)
0000:00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
Bridge (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE
Controller (rev 02)
0000:00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller
(rev 02)
0000:00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER
(ICH5/ICH5R) AC'97 Audio Controller (rev 02)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX
5200] (rev a1)
0000:02:09.0 Ethernet controller: Marvell Technology Group Ltd. Gigabit Ethernet
Controller (rev 13)

DMESG:

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH5: IDE controller at PCI slot 0000:00:1f.1
ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 177
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
hda: ST380011A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: DV-516D, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 1024KiB
hda: Host Protected Area detected.
        current capacity is 156299375 sectors (80025 MB)
        native  capacity is 156301488 sectors (80026 MB)
hda: Host Protected Area disabled.
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100)
hda: cache flushes supported
 /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 >


If i'm copying files and during this operation I change the UDMA settings, i get
errors like these:

Apr  7 18:19:25 nevermore hda: dma_intr: status=0x58 { DriveReady SeekComplete
DataRequest }
Apr  7 18:19:25 nevermore
Apr  7 18:19:25 nevermore ide: failed opcode was: unknown
Apr  7 18:19:25 nevermore hda: CHECK for good STATUS
Apr  7 18:19:34 nevermore hda: dma_intr: status=0x51 { DriveReady SeekComplete
Error }
Apr  7 18:19:34 nevermore hda: dma_intr: error=0x04 { DriveStatusError }
Apr  7 18:19:34 nevermore ide: failed opcode was: unknown
Apr  7 18:20:05 nevermore hda: dma_timer_expiry: dma status == 0x21
Apr  7 18:20:05 nevermore hda: DMA timeout error
Apr  7 18:20:05 nevermore hda: dma timeout error: status=0xd0 { Busy }
Apr  7 18:20:05 nevermore
Apr  7 18:20:05 nevermore ide: failed opcode was: unknown
Apr  7 18:20:05 nevermore hda: DMA disabled
Apr  7 18:20:05 nevermore ide0: reset: success

I even managed to lock the computer up when instead of copying, I was doing 
dd if=/dev/zero of=temp bs=1024 count=20000

Comment 39 Dave Jones 2005-04-16 04:56:34 UTC

Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Comment 40 Manuel Morales 2005-04-16 14:25:33 UTC

This bug should be reopened - the problem persists with FC3, kernel 2.6.11-1.14_FC3.

To clarify a previous comment (#2), setting the transfer rate to udma2 works
most of the time, but not always - I still get a low incidence of these DMA errors.

Comment 41 Monti 2005-04-22 08:45:08 UTC

Since my last comment (#37) I have updated to Debian kernel 2.6.11-1-686 (no SMP
this time, just checking).  The problem persists, although probably not as much.
 I get small lags when there is much disk activity (e.g. software updates with
apt-get).  I suspect audio playback together with disk activity might have
something to do with this, but I haven't had time to check this thouroghly yet.

Comment 42 rambler8 2005-06-21 13:23:55 UTC

Created attachment 115746 [details]
2.6 DMA timeout logs and info

I've had this problem with every 2.6 kernel from the fedora project and all of
the few 2.6 kernels I tried from source going back about a year. Since then
I've been running 2.4.28 from source with no problems. I tried Fedora Core 4
with 2.6.11-1.1369_FC4 this weekend and had the same bad results as before. My
experience is that everything will work fine until there is a moderately heavy
amount of disk activity. Disabling the smart daemon did not help me, nor did
disabling UDMA in the BIOS. I'm attaching some log excerpts showing the problem
and a bunch of other system info I've gathered.

After some searching I ran across the following workaround


******** Possible Work Around ********

pass the ide=nodma parameter to the kernel at boot, i.e.

     kernel /vmlinuz-2.6.11-1.1369_FC4 ro root=LABEL=/ rhgb ide=nodma


I haven't had any problems since, YMMM

Comment 43 Dave Jones 2005-07-15 18:57:32 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 44 Monti 2005-08-09 08:03:46 UTC

This is still a problem on debian kernel linux-image-2.6.12-1-686-smp.  I know I
shouldn't bring my debian problems here, but it seems to me this is a general
linux kernel issue.

Comment 45 Adriaan Peeters 2005-08-25 08:01:08 UTC

I see this problem on my Dell Latitude D505 as well, latest kernel
(2.6.12-1.1372_FC3):

Aug 24 17:00:31 twiadria kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 24 17:00:46 twiadria kernel: hda: DMA timeout error
Aug 24 17:00:46 twiadria kernel: hda: dma timeout error: status=0xd0 { Busy }
Aug 24 17:00:46 twiadria kernel: 
Aug 24 17:00:46 twiadria kernel: ide: failed opcode was: unknown
Aug 24 17:00:46 twiadria kernel: hda: DMA disabled
Aug 24 17:00:46 twiadria kernel: ide0: reset: success

Comment 46 Manuel Morales 2005-08-25 11:56:25 UTC

I haven't had any problems with this for a few kernel releases (including the
initial kernel in FC4, although I'm runnning FC3 again). On the other hand, I
replaced my hard drive soon after the problems stopped, for unrelated reasons.
It would be worth hearing from the original reporter if he is still having issues.

Comment 47 Joe Harrington 2005-08-27 05:14:18 UTC

Sorry for the length of this.  I tried to be concise while giving enough info to
go on.  The short version is that the 2.6.12-1.1372 kernel gave horrific errors
and crashes on one machine while running fine on another.  The 2.6.11-1.35
kernel gave some problems, one of them fatal, and the 2.6.11-1.27 kernel ran for
months without issues.

I have 3 machines, and I recently started getting dma_timeout_expiry and related
errors on 2 of them, with different results.  Glup and Oobleck run FC3 and get
nightly yum updates, including kernel.  Voom ran FC2 but fell off updates when
Legacy's repo started not supporting them consistently.  Oobleck is worst
affected, Glup least.

Glup is an IBM ThinkPad T40 with an IBM/Hitachi 80GB ATA drive.  It
has run all of the kernels in the FC3 updates without trouble:

00:00.0 Host bridge: Intel Corporation 82855PM Processor to I/O Controller (rev 03)
00:01.0 PCI bridge: Intel Corporation 82855PM Processor to AGP Controller (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M)
USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M)
USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M)
USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge
(rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus
Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM
(ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97
Modem Controller (rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 Lf [FireGL
9000] (rev 02)
02:00.0 CardBus bridge: Texas Instruments PCI1520 PC card Cardbus Controller
(rev 01)
02:00.1 CardBus bridge: Texas Instruments PCI1520 PC card Cardbus Controller
(rev 01)
02:01.0 Ethernet controller: Intel Corporation 82540EP Gigabit Ethernet
Controller (Mobile) (rev 03)
02:02.0 Ethernet controller: Atheros Communications, Inc. AR5211 802.11ab NIC
(rev 01)

Voom was an FC2 x86_64 machine with an Opteron 148 CPU and 2 WD 250GB
SATA disks.  It got ata timeout messages in its logs, but I
never followed them up, since it only happened 3 times and I only
noticed them in the logs the following day.  No performance issues
were obvious.

2.6.9-1.6_FC2
Dec 18 04:02:52 voom kernel: ata1: command 0x25 timeout, stat 0x51 host_stat 0x60
Dec 18 04:02:52 voom kernel: ata1: status=0x51 { DriveReady SeekComplete Error
}Dec 18 04:02:52 voom kernel: ata1: error=0x04 { DriveStatusError }
Dec 18 04:02:52 voom kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Dec 18 04:02:52 voom kernel: Current sda: sense key Aborted Command
Dec 18 04:02:52 voom kernel: end_request: I/O error, dev sda, sector 29257296

2.6.10-1.770_FC2
Mar 16 09:42:49 voom kernel: ata1: command 0x25 timeout, stat 0x51 host_stat 0x60
Mar 16 09:42:49 voom kernel: ata1: status=0x51 { DriveReady SeekComplete Error
}Mar 16 09:42:49 voom kernel: ata1: error=0x04 { DriveStatusError }
Mar 16 09:42:49 voom kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Mar 16 09:42:49 voom kernel: Current sda: sense key Aborted Command
Mar 16 09:42:49 voom kernel: end_request: I/O error, dev sda, sector 40641392

2.6.10-1.771_FC2 
Aug  2 04:09:38 voom kernel: ata1: command 0x25 timeout, stat 0x51 host_stat 0x60
Aug  2 04:09:38 voom kernel: ata1: status=0x51 { DriveReady SeekComplete Error
}Aug  2 04:09:38 voom kernel: ata1: error=0x04 { DriveStatusError }
Aug  2 04:09:38 voom kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Aug  2 04:09:38 voom kernel: Current sda: sense key Aborted Command
Aug  2 04:09:38 voom kernel: end_request: I/O error, dev sda, sector 27114072

00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge
01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
01:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit
Ethernet (rev 10)
01:0a.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller
01:0b.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
01:0c.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc)
SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
01:0d.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 Gigabit
Ethernet (rev 03)
01:0e.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 Gigabit
Ethernet (rev 03)

Voom no longer runs an FC release.  It's the master of a beowulf
cluster and runs cAos's experimental FNN kernels.  Current kernel is
2.6.12-76.caoscustom and there have been no more ata-related error
messages since 12 August.

Oobleck has an Asus A7V600 mobo with VIA chipset and Athlon XP 2800+
CPU.  It has a WD 250GB ATA root disk, 2 Maxtor 250s, and a Maxtor 160.

2.6.12-1.1372_FC3

Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 19 17:18:16 oobleck kernel: hda: DMA timeout error
Aug 19 17:18:16 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady
SeekComplete DataRequest }
Aug 19 17:18:16 oobleck kernel:
Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown
Aug 19 17:18:16 oobleck smartd[3475]: Device: /dev/hdg, enabled SMART Automatic
Offline Testing.
Aug 19 17:18:16 oobleck kernel: ide0: reset: success
Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 19 17:18:16 oobleck kernel: hda: DMA timeout error
Aug 19 17:18:16 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady
SeekComplete DataRequest }
Aug 19 17:18:16 oobleck kernel:
Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown
Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 19 17:18:16 oobleck kernel: hda: DMA timeout error
Aug 19 17:18:16 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady
SeekComplete DataRequest }

That is a sample of error output.  Similar output occurred at each of
these times:

# grep expiry /var/log/messages
2.6.12-1.1372_FC3 
Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:40:59 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:40:59 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:41:00 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:41:00 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21

...and the last one had some different error codes:

Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error
Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x50 { DriveReady
SeekComplete }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady
SeekComplete Error }
Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError }
Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error
Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady
SeekComplete DataRequest }
Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error
Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady
SeekComplete DataRequest }
Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21
Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error
Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady
SeekComplete DataRequest }

Then things got really bad.  The system was down much of the time from
Aug 22 to today on the 2.6.12-1.1372 kernel.  The timeout errors
occurred a *lot*, though the machine would hang immediately
thereafter.  None of the messages from this period made it into the
log.  Boots failed, usually after kernel init but still while going
through startup scripts.  Other boots went exceedingly slowly (an
hour) before failing.  Slowness started at random times and without
error messages.  Other boots seemed fine, but the machine crashed
during periods of heavy network and graphics card use (VNC grabbing
the physical display while running over an unreliable wireless
network), or just randomly.  I lost 2 work days swapping cards around
looking for IRQ conflicts and memory problems.  The memory tested good
with memtest86+ running the full test on each card.  I had trouble
even after removing all but the graphics card, and reserving IRQs in
the BIOS.  I ran a SMART long test on the drive, which passed.  I
disconnected the other drives.  Nothing improved things, and all
components had worked previously, for almost 2 years.  All of this
occurred after about 3 months of errorless uptime.

It occurred to me that a kernel update might have occurred during the three
months of uptime, and I discovered that I had been running 2.6.11-1.27 all that
time.  I stepped back one update, to 2.6.11-1.35.  Still problems, but
different ones.  /dev/hda (the boot disk) got remounted read-only,
saying it was full, when it wasn't, and only 15% of the inodes were
used, at 4:02 am when little was going on.  Strange keyboard errors
happened during the boot.

I stepped back to 2.6.11-1.27, and things seem stable.  It has been
running without errors since 11:55 this morning, and 13 hours seems
like a lot of uptime right now.  I am filing this report over VNC
grabbing the console via a wireless network, without a problem.  While
things could get bad again, I expect they won't, with this kernel.

00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
(rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:09.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell]
(rev 12)
00:0c.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit
Ethernet (rev 10)
00:0e.0 Unknown mass storage controller: Promise Technology, Inc. 20269 (rev
02)00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237
AC97 Audio Controller (rev 60)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon
9000] (rev 01)
01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000]
(Secondary) (rev 01)

Let me know what else to send, if it helps others.  I seem to have my
workaround, at least for now.

--jh--

Comment 48 Alan Cox 2005-08-27 11:42:33 UTC

Thanks for the 3 machine summary. The status 0x04 commands early in the log are
harmless - something asked the drive to do things it didnt support. The DMA
timeouts are indicating problems with data transfer, repeating problems. 

If you get a chance with that VIA box can you see if disabling acpi and the cpu
speed daemon helps with it at all.

Comment 49 Joe Harrington 2005-08-30 13:46:40 UTC

Yesterday at 1:30 pm, I booted with 2.6.12-1.1372 acpi=off.  Things were
semi-stable, much more so than before.  However, things were locked up when I
came in this morning.  There was still the normal image on the screen, but
logging had stopped at 3:08 am, similar to some of the prior crashes.

I also noticed that glup, the IBM T40 laptop, had been running with acpi=off
pci=noacpi atkbd.reset all along (though it has been quite stable with the
1.1372 kernel).

--jh--

Comment 50 Joe Harrington 2005-08-30 13:49:40 UTC

Oh, and I had turned off cpuspeed as well, though it didn't seem to do much with
this CPU.

--jh--

Comment 51 Joe Harrington 2005-09-29 16:54:25 UTC

Well, I'm ashamed to say the VIA box (oobleck) had some serious hardware
problems that I blamed on software.  These are now resolved, and the errors are
gone.  The problem was that the power supply was giving 4.5 volts on the 5-volt
line, and the boot disk parked its heads whenever the voltage fluctuated below
4.5 volts.  It revived when the voltages came back up.  Hence, the timeouts were
real.  Something was making this get worse, hence the appearance that things
were bad with later kernels.  A new power supply works fine.  That disk in other
IDE positions was also fine, with the old PS.  Other disks, same model, in that
IDE position were fine with the old PS, since only having that disk on the
primary IDE dropped the voltage below 4.5 volts.  I am now running on the 1378
kernel with no special options, and it's happy.

Since I'm not the original poster, I won't close the bug, but I'd suggest
investigating hardware.  Look at the health monitor in your BIOS and check
voltages.  Listen for head-parking sounds (the same "clunk" that you hear when
you turn off your power).  Do SMART tests, and download the disk vendor's
diagnostics and boot into them.  Try a shorter IDE cable, if you have a long
one.  Most importantly, backup data.

Similar errors might come from a disk/cable setup that (electrically) can't
consistently do the top IDE speeds, but can do lower speeds.  Note that the
official ATA cable length limit is 18", but 24" and 36" cables are common, and
most disks run fine on them.  This may explain some commenters' success at
disabling certain DMA modes.

--jh--

Comment 52 Miroslav Holubec 2005-11-01 19:09:10 UTC

Exactly same problem like David. Using Dell Inspiron laptop 5160, distro Fedora 
Core 4, kernel 2.6.12-1.1447_FC4smp. After changing UDMA mode to 2 are problems 
away.

Comment 53 Monti 2005-11-04 14:20:28 UTC

I upgradet to 2.6.14-1-686-smp (still debian).  I had no problems for two days,
but now I got an error again.  The problem might be less frequent, but it's
still there nonetheless.

The error message is the still:

hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error
hda: dma timeout error: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: DMA disabled
ide0: reset: success

Hardware: Dell Inspiron 500m

Comment 54 Andy Moore 2005-12-12 20:59:09 UTC

I am also getting this error:

hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error
hda: dma timeout error: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: DMA disabled
ide0: reset: success
hda: DMA disabled

I have a Latitude D600, running vanilla 2.6.14.3 from kernel.org. My distro is
Ubuntu Breezy.

Comment 55 Monti 2006-01-03 11:28:26 UTC

Created attachment 122707 [details]
interrupts, lspci, lspci -n, uname -a, dmesg, lsmod

The attached information might help solve the problem.	This problem is also in
the Debian bugtracker:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=321409

Comment 56 Dave Jones 2006-01-16 22:24:42 UTC

This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.

Comment 57 Sergey Mende 2006-01-23 21:51:24 UTC

Dave and Alan,

It seems I have the same problem with latest (1656) FC4 kernel.
I have VIA EPIA-M Mini-ITX board equipped with VIA C3 Nehemiah C [C5N] 
processor. However, I rebuilt the kernel using the original config form 
i686.rpm with cpu type set to 'C3-2' and with enabled 'longhaul' module. The 
dma timeout arises only when the cpuspeed daemon is running and most likely at 
the time when the cpu load grows after some period of inactivity. When the 
cpuspeed is not running, i.e. the cpu is running at the constant frequency, 
that kernel has no dma timeouts.

Most often I receive the following messages:

Jan 22 17:04:40 epia kernel: hda: dma_timer_expiry: dma status == 0x20
Jan 22 17:04:40 epia kernel: hda: DMA timeout retry
Jan 22 17:04:40 epia kernel: hda: timeout waiting for DMA
Jan 22 17:04:40 epia kernel: hda: status error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Jan 22 17:04:40 epia kernel: ide: failed opcode was: unknown
Jan 22 17:04:40 epia kernel: hda: drive not ready for command

I am not 100% sure that I met exactly this bug as I have read above the 
frequency drivers for via's chipsets are known to be buggy. But it looks like 
the dma timeout happens as soon as cpu frequency controlling facilities are in 
use. 

I can perform more testing and analysing of this issue on my system if you be 
kind to point me the direction to dig.

Regards,
Sergey

Comment 58 Dave Jones 2006-01-24 06:38:30 UTC

different problem. longhaul is known to have issues, which is why it's not built
into the Fedora kernel.

Comment 59 Dave Jones 2006-02-03 06:08:33 UTC

This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.

Comment 60 Monti 2006-03-17 12:03:45 UTC

The first drive in my Inspiron 500m was the the included 4200 RPM Hitachi drive.
 It had the problems listed here.  

I received a new identical drive from Dell, but the problems didn't disappear. 
This made me believe that the chipset was the source of this problem.  

Until now.  A couple of weeks ago I switched to a new 7200 RPM Hitachi drive
bought from a third party.  All DMA problems disappeared.

There can, as I see it, be two reasons for this:

1)  I received two faulty hard drives from Dell.
2)  The chipset had problems with the hard drive models I received from Dell,
but not with other drives.

I don't know which it is.  I had no problems with the "faulty" hard drives when
I tested them with other fairly equal computers (one year older 500m and one
year newer 510m).  Maybe a overly sensitive chipset?

Anyway, I'm just happy I have a working computer again.

Henrik

Comment 61 Dave Jones 2006-07-29 05:18:03 UTC

A number of these reports (including the debian one referenced above, and
comment #60) sound like bad hardware.

As this bug has grown to unmanagable proportions with a number of different
(albeit similar) problems referenced, if this bug still affects you with the
latest errata kernel, please open a new bug.

Thanks.

Comment 62 David Kaplan 2006-07-29 18:31:05 UTC

I am the original reporter of this bug and I can testify that the original
problem, which was well specified and repeatable, still affects the kernel.  The
solution is to set "hdparm -d1 -X66", but this shouldn't be necessary as the
hard drive should be able to do UDMA5.  As a number of people have reported the
same problem, it is not a hardware problem.  Furthermore, I bool boot my machine
and windows never has this problem.  Finally, I now run Ubuntu and have seen
this same problem with the same solution.

At the very least, this bug should be marked "WONTFIX" or "CANTFIX" as the
resolution is NOT errata.

Note You need to log in before you can comment on or make changes to this bug.