Description of problem: I am using FC2 on a Dell Inspiron 1150 and have all sorts of DMA timeouts that make the hard drive very slow. I could try turning off DMA, but that appears to really slow things down. I have seen other DMA type bugs and they all appear related, but no two appear to be exactly the same. What I really want to know is what is this doing to my hard drive? Should I turn DMA off? Version-Release number of selected component (if applicable): FC2 with all the updates How reproducible: Always Steps to Reproduce: 1. Save a big enough file to notice 2. 3. Actual results: DMA Timeouts. dmesg output below Expected results: No DMA timeouts Additional info: hdc: dma_timer_expiry: dma status == 0x21 hdc: DMA timeout error hdc: dma timeout error: status=0xd0 { Busy } hdc: DMA disabled ide1: reset: success Losing some ticks... checking if CPU frequency changed. hdc: DMA disabled Losing some ticks... checking if CPU frequency changed. Losing some ticks... checking if CPU frequency changed. Losing some ticks... checking if CPU frequency changed. Losing too many ticks! TSC cannot be used as a timesource. Possible reasons for this are: You're running with Speedstep, You don't have DMA enabled for your hard disk (see hdparm), Incorrect TSC synchronization on an SMP system (see dmesg). Falling back to a sane timesource now. hdc: dma_timer_expiry: dma status == 0x21 hdc: DMA timeout error hdc: dma timeout error: status=0xd0 { Busy } hdc: DMA disabled ide1: reset: success
*** Bug 132585 has been marked as a duplicate of this bug. ***
I have noticed a similar problem on my Dell Lattitude D600. The error message I get is the same as originally posted. I also only have the problem when dealing with large files. I have found that if I set the transfer mode to udma2 (-d1 -X66), I don't get these error messages. The disk is otherwise set for udma5, which it should support.
Here's the relevant section from my dmesg outut. I wonder if the the Inspiron is also using and ICH4 chipset... ICH4: IDE controller at PCI slot 0000:00:1f.1 PCI: Enabling device 0000:00:1f.1 (0005 -> 0007) ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11 ICH4: chipset revision 1 ICH4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:DMA, hdd:pio hda: IC25N030ATMR04-0, ATA DISK drive Using anticipatory io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
I've started to see this issue on my Dell Latitude L400 laptop since installing FC3 yesterday. Until then I was using FC2 and didn't have this problem. Under FC2 I was getting 20MB/s from the disk; FC3 tries to use DMA but fails and falls back to some terribly slow mode that gives 0.7MB/s. By adding 'PIIX4: ide=nodma' to the kernel line in grub.conf I can get half-way decent performance without DMA, but we should be able to do better. IDE controller at PCI slot 0000:00:07.1 PIIX4: chipset revision 1 PIIX4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xfcd0-0xfcd7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xfcd8-0xfcdf, BIOS settings: hdc:pio, hdd:pio Probing IDE interface ide0... hda: FUJITSU MHN2200AT, ATA DISK drive Using cfq io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
The 2.6.9-1.678_FC3 kernel fixes this issue for me. I note that the log is now reporting hda:pio rather than hda:DMA, as above.
kernel-2.6.9-1.3_FC2 gives me the same problem as before.
An update: the 2.6.9-1.678_FC3 kernel fixes this issue in the case where the system is cold booted. After a reboot it still reports hda:DMA and the disk runs very, very slowly.
If there is a difference between cold and warm boot behaviour that is almost certainly a BIOS problem. You might try acpi=off and see if it then reboots sanely. Also does Manuel's suggestion in comment #2 help. If so it gives me something to investigate further, either as a software bug or a cable limit we need to add.
What sort of info do you need? I still have this problem with kernel-2.6.9-1.6_FC2 and the above reporter states that he still sees it with FC3.
The 2.6.9-1.681_FC3 kernel does finally seem to fix the problems I was seeing. I've tested this with the following sequence: cold boot 681: OK reboot 681: OK (reports hda:DMA) reboot 678: OK reboot 678: DMA errors on reading partition table, disk slow reboot 681: DMA errors, disk slow cold boot 681: OK reboot 681: OK
I wonder if the issue reported in comment #4 and noted as resolved in comment #10 is different than that originally posted. I continue to get the specific DMA error message below with kernel 2.6.9-1.681_FC3 on my Dell Latitude D600. So, at least two people are reporting these specific DMA errors for Dell laptops, even with the most recent kernel installed on FC3... I'm happy to provide whatever information is needed. hda: dma_timer_expiry: dma status == 0x21 hda: DMA timeout error hda: dma timeout error: status=0xd0 { Busy } ide: failed opcode was: unknown hda: DMA disabled ide0: reset: success
I am inclined to think that Ron's problem is a different one as well. Ron - do you get the dma timeout error that is mentioned above?
My timeout error has a different status: hda: dma_timer_expiry: dma status == 0x21 hda: DMA timeout error hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } ide: failed opcode was: unknown I had to get that out of the log from yesterday. Every attempt to reproduce the problem today has failed. Even the 667 kernel is working properly today.
I definitely still get this error with FC3 and kernel-2.6.9-1.681_FC3. dmesg shows: hdc: dma_timer_expiry: dma status == 0x21 hdc: DMA timeout error hdc: dma timeout error: status=0xd0 { Busy } ide: failed opcode was: unknown hdc: DMA disabled ide1: reset: success Losing some ticks... checking if CPU frequency changed.
I am changing the severity of this bug as my system is basically unusable for heavy disk use with FC3.
dmkaplan: What effect does booting with acpi=off have. Also what effect does disabling hal have "service haldaemon stop"
Hmmm. I didn't think FC2 used the hal daemon, so I thought this wouldn't fix the problem (I am now using FC3). I turned off the hal daemon and it took me a while to get the problem to appear (lots of gimp and openoffice stuff open at the same time), but yes the problem still appears. Next I will try rebooting with acpi=off to see I that helps. Keep tooned...
Rebooted with acpi=off. Was able to reproduce the problem. Then I turned haldaemon off as well (both acpi and hald stopped). Was able to reproduce the problem again. Back to the drawing board....
Is there anything else I can do to debug this problem?
For some reason I did not see Alan's original comment #8. I tried setting hdparm -d1 -X66 (i.e. changing the system from udma5 to udma2). This appears to make the DMA timeout problems go away, though the system still appears quite sluggish.
This bug persists in the 2.6.9-1.724_FC3 kernel if you use udma5, though the actual error code has changed (below). udma2 is slow but without the dma timeouts. hdc: DMA timeout error hdc: dma timeout error: status=0xd0 { Busy } ide: failed opcode was: unknown hdc: DMA disabled ide1: reset: success Losing some ticks... checking if CPU frequency changed. Losing too many ticks! TSC cannot be used as a timesource. Possible reasons for this are: You're running with Speedstep, You don't have DMA enabled for your hard disk (see hdparm), Incorrect TSC synchronization on an SMP system (see dmesg). Falling back to a sane timesource now.
I've got the same pb & messages when I move big files on FC3 * kernel => 2.6.9-1.724_FC3 */etc/sysconfig/hardisks: USE_DMA=1 MULTIPLE_IO=16 * dmesg | grep hda ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio hda: IC25N040ATMR04-0, ATA DISK drive hda: max request size: 1024KiB hda: 78140160 sectors (40007 MB) w/1740KiB Cache, CHS=16383/255/63, UDMA(100) hda: cache flushes supported * /var/log/message: Jan 5 18:51:36 william kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 5 18:51:46 william kernel: hda: DMA timeout error Jan 5 18:51:46 william kernel: hda: dma timeout error: status=0xd0 { Busy } Jan 5 18:51:46 william kernel: Jan 5 18:51:46 william kernel: ide: failed opcode was: unknown Jan 5 18:51:46 william kernel: hda: DMA disabled Jan 5 18:51:46 william kernel: ide0: reset: success Jan 5 18:52:24 william kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 5 18:52:34 william kernel: hda: DMA timeout error Jan 5 18:52:34 william kernel: hda: dma timeout error: status=0xd0 { Busy } Jan 5 18:52:34 william kernel: Jan 5 18:52:34 william kernel: ide: failed opcode was: unknown Jan 5 18:52:34 william kernel: hda: DMA disabled Jan 5 18:52:34 william kernel: ide0: reset: success Jan 5 18:53:21 william kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 5 18:53:31 william kernel: hda: DMA timeout error Jan 5 18:53:31 william kernel: hda: dma timeout error: status=0xd0 { Busy } Jan 5 18:53:31 william kernel: Jan 5 18:53:31 william kernel: ide: failed opcode was: unknown Jan 5 18:53:31 william kernel: hda: DMA disabled Jan 5 18:53:32 william kernel: ide0: reset: success Jan 5 18:54:02 william kernel: hda: dma_timer_expiry: dma status == 0x21 Jan 5 18:54:12 william kernel: hda: DMA timeout error Jan 5 18:54:12 william kernel: hda: dma timeout error: status=0xd0 { Busy } Jan 5 18:54:12 william kernel: Jan 5 18:54:12 william kernel: ide: failed opcode was: unknown Jan 5 18:54:12 william kernel: hda: DMA disabled Jan 5 18:54:12 william kernel: ide0: reset: success Jan 5 18:54:15 william kernel: Losing too many ticks! Jan 5 18:54:15 william kernel: TSC cannot be used as a timesource. Jan 5 18:54:15 william kernel: Possible reasons for this are: Jan 5 18:54:15 william kernel: You're running with Speedstep, Jan 5 18:54:15 william kernel: You don't have DMA enabled for your hard disk (see hdparm), Jan 5 18:54:15 william kernel: Incorrect TSC synchronization on an SMP system (see dmesg). Jan 5 18:54:15 william kernel: Falling back to a sane timesource now. * lspci 00:00.0 Host bridge: Intel Corp. 82855PM Processor to I/O Controller (rev 03) 00:01.0 PCI bridge: Intel Corp. 82855PM Processor to AGP Controller (rev 03) 00:1d.0 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01) 00:1d.1 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01) 00:1d.2 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01) 00:1d.7 USB Controller: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corp. 82801 Mobile PCI Bridge (rev 81) 00:1f.0 ISA bridge: Intel Corp. 82801DBM (ICH4-M) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corp. 82801DBM (ICH4-M) IDE Controller (rev 01) 00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01) 00:1f.6 Modem: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 01) 01:00.0 VGA compatible controller: nVidia Corporation NV28 [GeForce4 Ti 4200 Go AGP 8x] (rev a1) 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705M Gigabit Ethernet (rev 01) 02:01.0 CardBus bridge: Texas Instruments: Unknown device ac47 (rev 01) 02:01.1 CardBus bridge: Texas Instruments: Unknown device ac4a (rev 01) 02:01.2 FireWire (IEEE 1394): Texas Instruments: Unknown device 802b 02:01.3 System peripheral: Texas Instruments: Unknown device 8204 * hdparm /dev/hda : multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 16383/255/63, sectors = 40007761920, start = 0
I had a similar problem on a system when I upgraded from Redhat-9 to Fedora Core-3. Disabling DMA on the drive let the disk work but I was still getting crashes when there was any traffic on the ethernet. An upgrade to 2.6.10 didn't change the ethernet problem. Then I realized that I had been using both the disk and ethernet OK when I did the install (the DVD was on another machine) and when I was using the rescue mode. I checked the kernel used durring the rescue mode and it was the same as the kernel I was using when I booted from the hard disk. Furthermore in rescue mode the disk had DMA turned on. I then tried hitting the ethernet in single user mode and it worked. So I started turning on the services that would be used in level 3 a bit at a time. The problem came back when I turned on cpuspeed. I then disabled cpuspeed, enabled disk DMA and rebooted to level 5. All works OK. It seems that there is some interaction with what cpuspeed does and both the disk and ethernet. One additional note is that I tried booting into single user mode, starting up cpuspeed, and then shutting it down. The first time I touched the ethernet I got a crash. It seems that the crash is not associated with actions of the daemon but rather with a residual effect of previous actions of the daemon. Below is a copy of cpuinfo for the machine I did this on: processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 6 model name : VIA Samuel stepping : 3 cpu MHz : 668.574 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr mce cx8 mtrr pge mmx pni 3dnow bogomips : 1306.62
Robert - sounds like a different problem. I have had no ethernet or install problems. Do you get the same dma timeout we are talking about? If you aren't using a Dell laptop, I suspect it isn't the same problem.
CPUspeed + VIA was a known problem. I believe DaveJ fixed that by disabling VIA CPU speed in kernel errata ?
It was the same DMA timeout. I can't connect to that machine right now so I can't quote the log but I will do so later in case I missed something. I also did not have any trouble durring installation. It was that contrast between things working durring installation and then failing after reboot that made me go looking for the difference. This machine is not a laptop so perhaps they are unrelated problems with the same symptom.
The DMA timeout problem continues to appear in udma5 mode for kernel-2.6.10-1.737_FC3.
I also have the same problem : Jul 30 21:07:30 ada kernel: hdc: DMA timeout error Jul 30 21:07:30 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Jul 30 22:43:45 ada kernel: hdc: DMA timeout error Jul 30 22:43:45 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Jul 31 00:48:12 ada kernel: hda: irq timeout: status=0xd0 { Busy } Jul 31 00:48:12 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Aug 4 21:12:11 ada kernel: hda: irq timeout: status=0xd0 { Busy } Aug 4 21:12:11 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Aug 5 15:28:52 ada kernel: hda: irq timeout: status=0xd0 { Busy } Aug 5 15:28:52 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Aug 11 18:09:10 ada kernel: hdc: DMA timeout error Aug 11 18:09:10 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Aug 11 18:09:40 ada kernel: hdc: DMA timeout error Aug 11 18:09:40 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Aug 18 19:59:08 ada kernel: hdc: DMA timeout error Aug 18 19:59:08 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Sep 2 21:49:14 ada kernel: hda: irq timeout: status=0xd0 { Busy } Sep 2 21:49:14 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Sep 20 07:12:46 ada kernel: hda: irq timeout: status=0xd0 { Busy } Sep 20 07:12:46 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Sep 22 17:03:11 ada kernel: hda: irq timeout: status=0xd0 { Busy } Sep 22 17:03:11 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Sep 22 22:25:07 ada kernel: hda: irq timeout: status=0xd0 { Busy } Sep 22 22:25:07 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Sep 22 22:25:07 ada kernel: hda: irq timeout: status=0xd0 { Busy } Sep 22 22:25:07 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Sep 22 22:26:54 ada kernel: hda: irq timeout: status=0xd0 { Busy } Sep 22 22:26:54 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d Sep 30 15:32:36 ada kernel: hdc: DMA timeout error Sep 30 15:32:36 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Sep 30 17:29:37 ada kernel: hdc: DMA timeout error Sep 30 17:29:37 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Sep 30 18:06:06 ada kernel: hdc: DMA timeout error Sep 30 18:06:06 ada kernel: hdc: dma timeout error: status=0xd0 { Busy } Oct 1 20:57:03 ada kernel: hda: irq timeout: status=0xd0 { Busy } Oct 1 20:57:03 ada kernel: hda: irq timeout: error=0xd0LastFailedSense 0x0d It appears only/mostly when I am doing a lot of I/O operations (writting/reading several hundreds of MB with BDB (a database application) ). I have an Inspiron 5100 and I allways thought that I have a bad hard-drive. I am running Fedora Core 2, kernel 2.6.7 so i guess that is not a FC3 problem. What hard disc do you have? Mine is: Oct 1 18:39:05 ada smartd[1469]: Device: /dev/hdc, opened Oct 1 18:39:05 ada smartd[1469]: Device: /dev/hdc, not found in smartd database. Oct 1 18:39:05 ada smartd[1469]: Device: /dev/hdc, is SMART capable. Adding to "monitor" list. Oct 1 18:39:14 ada kernel: ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:DMA, hdd:pio Oct 1 18:39:14 ada kernel: hdc: IC25N040ATMR04-0, ATA DISK drive Oct 1 18:39:14 ada kernel: hdc: max request size: 1024KiB Oct 1 18:39:14 ada kernel: hdc: 78140160 sectors (40007 MB) w/1740KiB Cache, CHS=16383/255/63, UDMA(100) Oct 1 18:39:14 ada kernel: hdc: hdc1 hdc2 hdc3 < hdc5 hdc6 hdc7 > Oct 1 18:39:23 ada kernel: EXT3 FS on hdc5, internal journal Andrei (In reply to comment #0) > Description of problem: > I am using FC2 on a Dell Inspiron 1150 and have all sorts of DMA > timeouts that make the hard drive very slow. I could try turning off > DMA, but that appears to really slow things down. I have seen other > DMA type bugs and they all appear related, but no two appear to be > exactly the same. > > What I really want to know is what is this doing to my hard drive? > Should I turn DMA off? > > Version-Release number of selected component (if applicable): > FC2 with all the updates > > How reproducible: > Always > > Steps to Reproduce: > 1. Save a big enough file to notice > 2. > 3. > > Actual results: > DMA Timeouts. dmesg output below > > Expected results: > No DMA timeouts > > Additional info: > > hdc: dma_timer_expiry: dma status == 0x21 > hdc: DMA timeout error > hdc: dma timeout error: status=0xd0 { Busy } > > hdc: DMA disabled > ide1: reset: success > Losing some ticks... checking if CPU frequency changed. > hdc: DMA disabled > Losing some ticks... checking if CPU frequency changed. > Losing some ticks... checking if CPU frequency changed. > Losing some ticks... checking if CPU frequency changed. > Losing too many ticks! > TSC cannot be used as a timesource. > Possible reasons for this are: > You're running with Speedstep, > You don't have DMA enabled for your hard disk (see hdparm), > Incorrect TSC synchronization on an SMP system (see dmesg). > Falling back to a sane timesource now. > hdc: dma_timer_expiry: dma status == 0x21 > hdc: DMA timeout error > hdc: dma timeout error: status=0xd0 { Busy } > > hdc: DMA disabled > ide1: reset: success
The same problem appears on my Latitude D800. I am able to reproduce it by copying large files. After the first timeout hdparm -i still shows udma5 as selected, but transfer rate given by hdparm -t drops to 2.5 Mb/s. If I manually select udma2 (hdparm -d1 -X66) the rate is back to 25 Mb/s.
I have heard reports that this has to do with the smartd daemon. Has anyone tried turning that off and then doing the test? Also, scanning the web I have seen that forms of this problem have been bouncing around in a number of distributions since the 2.4 kernels.
We have had the same problem with a number of dual Xeon boxes (Supermicro X5DPR-iG2). 14 out of 16 showed dma timeouts over a 6 week period, after upgrading to the 2.6.x kernel. There were no problems under several 2.4.x kernels. Currently we run 2.6.10-1.766_FC3smp, but successive kernel upgrades from the stock FC3 kernel have not helped. It therefore seems that this problem is not limited to laptops I tried turning smartd off, but the timeouts persisted. So the suggestion that turning off smartd might help (#30) seems to be incorrect. I also tried switching to UDMA2 but timeouts persisted. Most recently, I switched to udma1 and turned off acpi on boot (#8). We have been running for 5 days with only 3 timeouts on 16 nodes; much less than before and insufficient to provoke a switch to piix4. I would not call this a solution: the read speed is down to 15MB/sec (from 50 in udma5), but it it preferable to piix4. I plan to wait a week or two and then try udma2.
Is it true that the failure in this bug report seems specific to system based on the ICH4 (and ICH-4M)?
I have this problem, and I am also not using a laptop. This problem just started yesterday for me though, so I am going to check to see what packages where upgraded yesterday, and see if I can see the problem. The other funny thing is that my server is unreachable on any port but 25, replies to pings and TTL's. Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21 Mar 23 03:41:16 zeus kernel: hda: DMA timeout error Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Mar 23 03:41:16 zeus kernel: Mar 23 03:41:16 zeus kernel: hda: status timeout: status=0xd0 { Busy } Mar 23 03:41:16 zeus kernel: Mar 23 03:41:16 zeus kernel: hda: drive not ready for command Mar 23 03:41:16 zeus kernel: ide0: reset: success Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21 Mar 23 03:41:16 zeus kernel: hda: DMA timeout error Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x5a { DriveReady SeekComplete DataRequest Index } Mar 23 03:41:16 zeus kernel: Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21 Mar 23 03:41:16 zeus kernel: hda: DMA timeout error Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Mar 23 03:41:16 zeus kernel: Mar 23 03:41:16 zeus kernel: hda: status timeout: status=0xd0 { Busy } Mar 23 03:41:16 zeus kernel: Mar 23 03:41:16 zeus kernel: hda: drive not ready for command Mar 23 03:41:16 zeus kernel: ide0: reset: success Mar 23 03:41:16 zeus kernel: hda: dma_timer_expiry: dma status == 0x21 Mar 23 03:41:16 zeus kernel: hda: DMA timeout error Mar 23 03:41:16 zeus kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Mar 23 03:41:16 zeus kernel: Mar 23 03:41:16 zeus kernel: hda: status timeout: status=0xd0 { Busy } Mar 23 03:41:16 zeus kernel: Mar 23 03:41:16 zeus kernel: hda: drive not ready for command Mar 23 03:41:16 zeus kernel: ide0: reset: success When I get access to the server I will post the list of applications that were automatically updated to hopefully give you guys some ideas.
To Len Brown (#32). I dont think so. My boxes use the 7501 chipset (S'micro X5DPR-iG2+) which includes the ICH3-S I/O controller. I dont think it is a hardware issue. I have similar boxes also with the 7501 (S'micro X5DPA-TGM) but with SATA and these have no problem (using the SCSI based SATA driver). I think it is just the IDE driver.
Hi! Alan! Maybe this helps to locate it. I own 8 linux machines. I have this problem for the kernel series 2.6.0 (- test) came out! I use FC2 and FC3. If I use 2.4 (RedHat compiled or original from kernel.org with the necessary options only) it's everything ok. If I use 2.6 (RedHat compiled or original from kernel.org with the necessary options only) the problem comes out, BUT ONLY IF I USE Pentium 1 class machines! No matters if I use ide=nodma, or /sbin/hdparm -d1 -X mdma1 /dev/hda1 (ok, a bit infrequent). The problem exists in 2.6.11 too (and I've tried all the versions from 2.6.0). One of them (lspci): 00:00.0 Host bridge: Intel Corp. 430TX - 82439TX MTXC (rev 01) 00:01.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 01) 00:01.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01) 00:01.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01) 00:01.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 01) :-) 00:09.0 VGA compatible controller: S3 Inc. 86c325 [ViRGE] (rev 06) 00:0a.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30) Just a few programs are running: PID TTY STAT TIME COMMAND 1 ? S 0:00 init [3] 2 ? SWN 0:00 [ksoftirqd/0] 3 ? SW< 0:00 [events/0] 4 ? SW< 0:00 [khelper] 5 ? SW< 0:00 [kblockd/0] 6 ? SW 0:00 [khubd] 25 ? SW 0:00 [pdflush] 26 ? SW 0:00 [pdflush] 28 ? SW< 0:00 [aio/0] 27 ? SW 0:00 [kswapd0] 611 ? SW 0:00 [kseriod] 632 ? SW 0:00 [kjournald] 977 ? SW 0:00 [kjournald] 1505 ? S 0:00 syslogd -m 0 1509 ? S 0:00 klogd -x 1535 ? S 0:01 /usr/sbin/sshd 1718 ? S 0:00 sshd: root@pts/0 1720 pts/0 S 0:00 -bash 1747 ? S 0:01 sshd: root@pts/1 1749 pts/1 S 0:00 -bash 2584 ? S< 0:00 /sbin/wland 2809 tty1 S 0:00 /sbin/mingetty tty1 2815 tty2 S 0:00 /sbin/mingetty tty2 2816 tty3 S 0:00 /sbin/mingetty tty3 2817 tty4 S 0:00 /sbin/mingetty tty4 2818 tty5 S 0:00 /sbin/mingetty tty5 2819 tty6 S 0:00 /sbin/mingetty tty6 2998 pts/2 S 0:00 bash -rcfile .bashrc 3020 pts/0 R 0:00 ps ax And the log: Apr 5 04:04:15 XXXXX kernel: hda: dma_timer_expiry: dma status == 0x21 Apr 5 04:04:25 XXXXX kernel: hda: DMA timeout error Apr 5 04:04:25 XXXXX kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete Apr 5 04:04:25 XXXXX kernel: Apr 5 04:04:25 XXXXX kernel: ide: failed opcode was: unknown Apr 5 04:04:46 XXXXX kernel: hda: dma_timer_expiry: dma status == 0x21 Apr 5 04:04:56 XXXXX kernel: hda: DMA timeout error Apr 5 04:04:56 XXXXX kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete I have no losing ticks.
Regarding the previous comment, this problem is not limited to Pentium I machines. I am using a laptop that is no more than 6 months old (i.e. it should be a Pentium 4), and it has this problem. It could be some sort of timing constraint that applies to older machines and laptops perhaps.
I have the same problem on a Dell Inspiron m500 with a ICH4 controller: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH4: IDE controller at PCI slot 0000:00:1f.1 PCI: Enabling device 0000:00:1f.1 (0005 -> 0007) ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11 ICH4: chipset revision 1 ICH4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:DMA, hdd:pio hda: IC25N030ATMR04-0, ATA DISK drive Using anticipatory io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: max request size: 1024KiB hda: 58605120 sectors (30005 MB) w/1740KiB Cache, CHS=16383/255/63, UDMA(100) /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4 hdc: SAMSUNG CDRW/DVD SN-324F, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 The errors I get are these: Apr 6 09:47:33 localhost kernel: hda: dma_timer_expiry: dma status == 0x21 Apr 6 09:47:44 localhost kernel: hda: DMA timeout error Apr 6 09:47:44 localhost kernel: hda: dma timeout error: status=0xd0 { Busy } Apr 6 09:47:44 localhost kernel: Apr 6 09:47:44 localhost kernel: hda: DMA disabled Apr 6 09:47:44 localhost kernel: ide0: reset: success I run on a Debian system, so the problem is not isolated to Fedora kernels. I get this error three or four times a week on my current kernel: 2.6.8-2-686-smp. This is annoying, but I can live with it. With 2.6.9 and 2.6.10 kernels, I get this problem much more frequently -- right past the point where I give up working on the machine. I reset the harddrive with hdparm parameters -d1 -c1 -Xudma5.
I have a similar problem with the 2.6.11 kernel, but running on a Gentoo. My computer slows down A LOT when copying large files on my Seagate drive, although DMA is enabled (UDMA5). LSPCI: 0000:00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02) 0000:00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02) 0000:00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 0000:00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 0000:00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI #3 (rev 02) 0000:00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02) 0000:00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 0000:00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 0000:00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 0000:00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02) 0000:01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1) 0000:02:09.0 Ethernet controller: Marvell Technology Group Ltd. Gigabit Ethernet Controller (rev 13) DMESG: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH5: IDE controller at PCI slot 0000:00:1f.1 ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 177 ICH5: chipset revision 2 ICH5: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:pio Probing IDE interface ide0... hda: ST380011A, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: DV-516D, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 1024KiB hda: Host Protected Area detected. current capacity is 156299375 sectors (80025 MB) native capacity is 156301488 sectors (80026 MB) hda: Host Protected Area disabled. hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100) hda: cache flushes supported /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 > If i'm copying files and during this operation I change the UDMA settings, i get errors like these: Apr 7 18:19:25 nevermore hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } Apr 7 18:19:25 nevermore Apr 7 18:19:25 nevermore ide: failed opcode was: unknown Apr 7 18:19:25 nevermore hda: CHECK for good STATUS Apr 7 18:19:34 nevermore hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Apr 7 18:19:34 nevermore hda: dma_intr: error=0x04 { DriveStatusError } Apr 7 18:19:34 nevermore ide: failed opcode was: unknown Apr 7 18:20:05 nevermore hda: dma_timer_expiry: dma status == 0x21 Apr 7 18:20:05 nevermore hda: DMA timeout error Apr 7 18:20:05 nevermore hda: dma timeout error: status=0xd0 { Busy } Apr 7 18:20:05 nevermore Apr 7 18:20:05 nevermore ide: failed opcode was: unknown Apr 7 18:20:05 nevermore hda: DMA disabled Apr 7 18:20:05 nevermore ide0: reset: success I even managed to lock the computer up when instead of copying, I was doing dd if=/dev/zero of=temp bs=1024 count=20000
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.
This bug should be reopened - the problem persists with FC3, kernel 2.6.11-1.14_FC3. To clarify a previous comment (#2), setting the transfer rate to udma2 works most of the time, but not always - I still get a low incidence of these DMA errors.
Since my last comment (#37) I have updated to Debian kernel 2.6.11-1-686 (no SMP this time, just checking). The problem persists, although probably not as much. I get small lags when there is much disk activity (e.g. software updates with apt-get). I suspect audio playback together with disk activity might have something to do with this, but I haven't had time to check this thouroghly yet.
Created attachment 115746 [details] 2.6 DMA timeout logs and info I've had this problem with every 2.6 kernel from the fedora project and all of the few 2.6 kernels I tried from source going back about a year. Since then I've been running 2.4.28 from source with no problems. I tried Fedora Core 4 with 2.6.11-1.1369_FC4 this weekend and had the same bad results as before. My experience is that everything will work fine until there is a moderately heavy amount of disk activity. Disabling the smart daemon did not help me, nor did disabling UDMA in the BIOS. I'm attaching some log excerpts showing the problem and a bunch of other system info I've gathered. After some searching I ran across the following workaround ******** Possible Work Around ******** pass the ide=nodma parameter to the kernel at boot, i.e. kernel /vmlinuz-2.6.11-1.1369_FC4 ro root=LABEL=/ rhgb ide=nodma I haven't had any problems since, YMMM
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
This is still a problem on debian kernel linux-image-2.6.12-1-686-smp. I know I shouldn't bring my debian problems here, but it seems to me this is a general linux kernel issue.
I see this problem on my Dell Latitude D505 as well, latest kernel (2.6.12-1.1372_FC3): Aug 24 17:00:31 twiadria kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 24 17:00:46 twiadria kernel: hda: DMA timeout error Aug 24 17:00:46 twiadria kernel: hda: dma timeout error: status=0xd0 { Busy } Aug 24 17:00:46 twiadria kernel: Aug 24 17:00:46 twiadria kernel: ide: failed opcode was: unknown Aug 24 17:00:46 twiadria kernel: hda: DMA disabled Aug 24 17:00:46 twiadria kernel: ide0: reset: success
I haven't had any problems with this for a few kernel releases (including the initial kernel in FC4, although I'm runnning FC3 again). On the other hand, I replaced my hard drive soon after the problems stopped, for unrelated reasons. It would be worth hearing from the original reporter if he is still having issues.
Sorry for the length of this. I tried to be concise while giving enough info to go on. The short version is that the 2.6.12-1.1372 kernel gave horrific errors and crashes on one machine while running fine on another. The 2.6.11-1.35 kernel gave some problems, one of them fatal, and the 2.6.11-1.27 kernel ran for months without issues. I have 3 machines, and I recently started getting dma_timeout_expiry and related errors on 2 of them, with different results. Glup and Oobleck run FC3 and get nightly yum updates, including kernel. Voom ran FC2 but fell off updates when Legacy's repo started not supporting them consistently. Oobleck is worst affected, Glup least. Glup is an IBM ThinkPad T40 with an IBM/Hitachi 80GB ATA drive. It has run all of the kernels in the FC3 updates without trouble: 00:00.0 Host bridge: Intel Corporation 82855PM Processor to I/O Controller (rev 03) 00:01.0 PCI bridge: Intel Corporation 82855PM Processor to AGP Controller (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 81) 00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01) 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01) 00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01) 00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 Lf [FireGL 9000] (rev 02) 02:00.0 CardBus bridge: Texas Instruments PCI1520 PC card Cardbus Controller (rev 01) 02:00.1 CardBus bridge: Texas Instruments PCI1520 PC card Cardbus Controller (rev 01) 02:01.0 Ethernet controller: Intel Corporation 82540EP Gigabit Ethernet Controller (Mobile) (rev 03) 02:02.0 Ethernet controller: Atheros Communications, Inc. AR5211 802.11ab NIC (rev 01) Voom was an FC2 x86_64 machine with an Opteron 148 CPU and 2 WD 250GB SATA disks. It got ata timeout messages in its logs, but I never followed them up, since it only happened 3 times and I only noticed them in the logs the following day. No performance issues were obvious. 2.6.9-1.6_FC2 Dec 18 04:02:52 voom kernel: ata1: command 0x25 timeout, stat 0x51 host_stat 0x60 Dec 18 04:02:52 voom kernel: ata1: status=0x51 { DriveReady SeekComplete Error }Dec 18 04:02:52 voom kernel: ata1: error=0x04 { DriveStatusError } Dec 18 04:02:52 voom kernel: SCSI error : <0 0 0 0> return code = 0x8000002 Dec 18 04:02:52 voom kernel: Current sda: sense key Aborted Command Dec 18 04:02:52 voom kernel: end_request: I/O error, dev sda, sector 29257296 2.6.10-1.770_FC2 Mar 16 09:42:49 voom kernel: ata1: command 0x25 timeout, stat 0x51 host_stat 0x60 Mar 16 09:42:49 voom kernel: ata1: status=0x51 { DriveReady SeekComplete Error }Mar 16 09:42:49 voom kernel: ata1: error=0x04 { DriveStatusError } Mar 16 09:42:49 voom kernel: SCSI error : <0 0 0 0> return code = 0x8000002 Mar 16 09:42:49 voom kernel: Current sda: sense key Aborted Command Mar 16 09:42:49 voom kernel: end_request: I/O error, dev sda, sector 40641392 2.6.10-1.771_FC2 Aug 2 04:09:38 voom kernel: ata1: command 0x25 timeout, stat 0x51 host_stat 0x60 Aug 2 04:09:38 voom kernel: ata1: status=0x51 { DriveReady SeekComplete Error }Aug 2 04:09:38 voom kernel: ata1: error=0x04 { DriveStatusError } Aug 2 04:09:38 voom kernel: SCSI error : <0 0 0 0> return code = 0x8000002 Aug 2 04:09:38 voom kernel: Current sda: sense key Aborted Command Aug 2 04:09:38 voom kernel: end_request: I/O error, dev sda, sector 27114072 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 01:0a.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller 01:0b.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 01:0c.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) 01:0d.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 Gigabit Ethernet (rev 03) 01:0e.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 Gigabit Ethernet (rev 03) Voom no longer runs an FC release. It's the master of a beowulf cluster and runs cAos's experimental FNN kernels. Current kernel is 2.6.12-76.caoscustom and there have been no more ata-related error messages since 12 August. Oobleck has an Asus A7V600 mobo with VIA chipset and Athlon XP 2800+ CPU. It has a WD 250GB ATA root disk, 2 Maxtor 250s, and a Maxtor 160. 2.6.12-1.1372_FC3 Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 19 17:18:16 oobleck kernel: hda: DMA timeout error Aug 19 17:18:16 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Aug 19 17:18:16 oobleck kernel: Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 19 17:18:16 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown Aug 19 17:18:16 oobleck smartd[3475]: Device: /dev/hdg, enabled SMART Automatic Offline Testing. Aug 19 17:18:16 oobleck kernel: ide0: reset: success Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 19 17:18:16 oobleck kernel: hda: DMA timeout error Aug 19 17:18:16 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Aug 19 17:18:16 oobleck kernel: Aug 19 17:18:16 oobleck kernel: ide: failed opcode was: unknown Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 19 17:18:16 oobleck kernel: hda: DMA timeout error Aug 19 17:18:16 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } That is a sample of error output. Similar output occurred at each of these times: # grep expiry /var/log/messages 2.6.12-1.1372_FC3 Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 19 17:18:16 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:40:59 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:40:59 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:41:00 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:41:00 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 ...and the last one had some different error codes: Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x50 { DriveReady SeekComplete } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error } Aug 22 14:54:21 oobleck kernel: hda: task_in_intr: error=0x04 { DriveStatusError } Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Aug 22 14:54:21 oobleck kernel: hda: dma_timer_expiry: dma status == 0x21 Aug 22 14:54:21 oobleck kernel: hda: DMA timeout error Aug 22 14:54:21 oobleck kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Then things got really bad. The system was down much of the time from Aug 22 to today on the 2.6.12-1.1372 kernel. The timeout errors occurred a *lot*, though the machine would hang immediately thereafter. None of the messages from this period made it into the log. Boots failed, usually after kernel init but still while going through startup scripts. Other boots went exceedingly slowly (an hour) before failing. Slowness started at random times and without error messages. Other boots seemed fine, but the machine crashed during periods of heavy network and graphics card use (VNC grabbing the physical display while running over an unreliable wireless network), or just randomly. I lost 2 work days swapping cards around looking for IRQ conflicts and memory problems. The memory tested good with memtest86+ running the full test on each card. I had trouble even after removing all but the graphics card, and reserving IRQs in the BIOS. I ran a SMART long test on the drive, which passed. I disconnected the other drives. Nothing improved things, and all components had worked previously, for almost 2 years. All of this occurred after about 3 months of errorless uptime. It occurred to me that a kernel update might have occurred during the three months of uptime, and I discovered that I had been running 2.6.11-1.27 all that time. I stepped back one update, to 2.6.11-1.35. Still problems, but different ones. /dev/hda (the boot disk) got remounted read-only, saying it was full, when it wasn't, and only 15% of the inodes were used, at 4:02 am when little was going on. Strange keyboard errors happened during the boot. I stepped back to 2.6.11-1.27, and things seem stable. It has been running without errors since 11:55 this morning, and 13 hours seems like a lot of uptime right now. I am filing this report over VNC grabbing the console via a wireless network, without a problem. While things could get bad again, I expect they won't, with this kernel. 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge 00:09.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12) 00:0c.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 00:0e.0 Unknown mass storage controller: Promise Technology, Inc. 20269 (rev 02)00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800 South] 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon 9000] (rev 01) 01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) (rev 01) Let me know what else to send, if it helps others. I seem to have my workaround, at least for now. --jh--
Thanks for the 3 machine summary. The status 0x04 commands early in the log are harmless - something asked the drive to do things it didnt support. The DMA timeouts are indicating problems with data transfer, repeating problems. If you get a chance with that VIA box can you see if disabling acpi and the cpu speed daemon helps with it at all.
Yesterday at 1:30 pm, I booted with 2.6.12-1.1372 acpi=off. Things were semi-stable, much more so than before. However, things were locked up when I came in this morning. There was still the normal image on the screen, but logging had stopped at 3:08 am, similar to some of the prior crashes. I also noticed that glup, the IBM T40 laptop, had been running with acpi=off pci=noacpi atkbd.reset all along (though it has been quite stable with the 1.1372 kernel). --jh--
Oh, and I had turned off cpuspeed as well, though it didn't seem to do much with this CPU. --jh--
Well, I'm ashamed to say the VIA box (oobleck) had some serious hardware problems that I blamed on software. These are now resolved, and the errors are gone. The problem was that the power supply was giving 4.5 volts on the 5-volt line, and the boot disk parked its heads whenever the voltage fluctuated below 4.5 volts. It revived when the voltages came back up. Hence, the timeouts were real. Something was making this get worse, hence the appearance that things were bad with later kernels. A new power supply works fine. That disk in other IDE positions was also fine, with the old PS. Other disks, same model, in that IDE position were fine with the old PS, since only having that disk on the primary IDE dropped the voltage below 4.5 volts. I am now running on the 1378 kernel with no special options, and it's happy. Since I'm not the original poster, I won't close the bug, but I'd suggest investigating hardware. Look at the health monitor in your BIOS and check voltages. Listen for head-parking sounds (the same "clunk" that you hear when you turn off your power). Do SMART tests, and download the disk vendor's diagnostics and boot into them. Try a shorter IDE cable, if you have a long one. Most importantly, backup data. Similar errors might come from a disk/cable setup that (electrically) can't consistently do the top IDE speeds, but can do lower speeds. Note that the official ATA cable length limit is 18", but 24" and 36" cables are common, and most disks run fine on them. This may explain some commenters' success at disabling certain DMA modes. --jh--
Exactly same problem like David. Using Dell Inspiron laptop 5160, distro Fedora Core 4, kernel 2.6.12-1.1447_FC4smp. After changing UDMA mode to 2 are problems away.
I upgradet to 2.6.14-1-686-smp (still debian). I had no problems for two days, but now I got an error again. The problem might be less frequent, but it's still there nonetheless. The error message is the still: hda: dma_timer_expiry: dma status == 0x21 hda: DMA timeout error hda: dma timeout error: status=0xd0 { Busy } ide: failed opcode was: unknown hda: DMA disabled ide0: reset: success Hardware: Dell Inspiron 500m
I am also getting this error: hda: dma_timer_expiry: dma status == 0x21 hda: DMA timeout error hda: dma timeout error: status=0xd0 { Busy } ide: failed opcode was: unknown hda: DMA disabled ide0: reset: success hda: DMA disabled I have a Latitude D600, running vanilla 2.6.14.3 from kernel.org. My distro is Ubuntu Breezy.
Created attachment 122707 [details] interrupts, lspci, lspci -n, uname -a, dmesg, lsmod The attached information might help solve the problem. This problem is also in the Debian bugtracker: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=321409
This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you.
Dave and Alan, It seems I have the same problem with latest (1656) FC4 kernel. I have VIA EPIA-M Mini-ITX board equipped with VIA C3 Nehemiah C [C5N] processor. However, I rebuilt the kernel using the original config form i686.rpm with cpu type set to 'C3-2' and with enabled 'longhaul' module. The dma timeout arises only when the cpuspeed daemon is running and most likely at the time when the cpu load grows after some period of inactivity. When the cpuspeed is not running, i.e. the cpu is running at the constant frequency, that kernel has no dma timeouts. Most often I receive the following messages: Jan 22 17:04:40 epia kernel: hda: dma_timer_expiry: dma status == 0x20 Jan 22 17:04:40 epia kernel: hda: DMA timeout retry Jan 22 17:04:40 epia kernel: hda: timeout waiting for DMA Jan 22 17:04:40 epia kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } Jan 22 17:04:40 epia kernel: ide: failed opcode was: unknown Jan 22 17:04:40 epia kernel: hda: drive not ready for command I am not 100% sure that I met exactly this bug as I have read above the frequency drivers for via's chipsets are known to be buggy. But it looks like the dma timeout happens as soon as cpu frequency controlling facilities are in use. I can perform more testing and analysing of this issue on my system if you be kind to point me the direction to dig. Regards, Sergey
different problem. longhaul is known to have issues, which is why it's not built into the Fedora kernel.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
The first drive in my Inspiron 500m was the the included 4200 RPM Hitachi drive. It had the problems listed here. I received a new identical drive from Dell, but the problems didn't disappear. This made me believe that the chipset was the source of this problem. Until now. A couple of weeks ago I switched to a new 7200 RPM Hitachi drive bought from a third party. All DMA problems disappeared. There can, as I see it, be two reasons for this: 1) I received two faulty hard drives from Dell. 2) The chipset had problems with the hard drive models I received from Dell, but not with other drives. I don't know which it is. I had no problems with the "faulty" hard drives when I tested them with other fairly equal computers (one year older 500m and one year newer 510m). Maybe a overly sensitive chipset? Anyway, I'm just happy I have a working computer again. Henrik
A number of these reports (including the debian one referenced above, and comment #60) sound like bad hardware. As this bug has grown to unmanagable proportions with a number of different (albeit similar) problems referenced, if this bug still affects you with the latest errata kernel, please open a new bug. Thanks.
I am the original reporter of this bug and I can testify that the original problem, which was well specified and repeatable, still affects the kernel. The solution is to set "hdparm -d1 -X66", but this shouldn't be necessary as the hard drive should be able to do UDMA5. As a number of people have reported the same problem, it is not a hardware problem. Furthermore, I bool boot my machine and windows never has this problem. Finally, I now run Ubuntu and have seen this same problem with the same solution. At the very least, this bug should be marked "WONTFIX" or "CANTFIX" as the resolution is NOT errata.