From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041020 Firefox/0.10.1 Description of problem: When accessing an auxiliary HD, the kernel blocks for about 20 seconds and generates an error message in /var/log/messages: ov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown Nov 24 17:17:02 marvin kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 24 17:17:02 marvin kernel: hde: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 24 17:17:02 marvin kernel: ide: failed opcode was: unknown Nov 24 17:17:02 marvin kernel: drivers/ide/ide-io.c:1390: spin_lock(drivers/ide/ide.c:0236e3a8) already locked by drivers/ide/ide-iops.c/1178 Nov 24 17:17:13 marvin kernel: Badness in pdc202xx_reset_host at drivers/ide/pci/pdc202xx_old.c:588 Nov 24 17:17:13 marvin kernel: Stack pointer is garbage, not printing trace Nov 24 17:17:13 marvin kernel: Badness in pdc202xx_reset_host at drivers/ide/pci/pdc202xx_old.c:590 Nov 24 17:17:13 marvin kernel: Stack pointer is garbage, not printing trace Nov 24 17:17:13 marvin kernel: PDC202XX: Primary channel reset. Nov 24 17:17:13 marvin kernel: PDC202XX: Secondary channel reset. Nov 24 17:17:14 marvin kernel: drivers/ide/ide-iops.c:1246: spin_unlock(drivers/ide/ide.c:0236e3a8) not locked Nov 24 17:17:14 marvin kernel: ide2: reset: master: error (0x00?) Version-Release number of selected component (if applicable): kernel-2.6.9-1.667 How reproducible: Always Steps to Reproduce: 1.boot linux from drive /dev/hdb 2.call fdisk /dev/hde 3. Actual Results: the system will block for ~20 seconds (apparently trying to read the partition table). Any further read access on the disk will work fine. Expected Results: i/o on the disk should work smoothly. Additional info: I'v used the disk on this controller before (FC2). I'v just installed another OS on /dev/hde (FreeBSD) without any problem, so I don't think it's a h/w problem.
Initial error is a hardware error (CRC error) although that could be a drive mistune from software. The reset and lock errors are real and I will go chase those down. Can you attach a boot dmesg and more of the log, plus an hdparm info dump of that drive so I can look at the modes chosen.
Created attachment 107501 [details] boot dmesg
Created attachment 107502 [details] output of 'hdparm /dev/hde'
Created attachment 107503 [details] extract from my /var/log/messages file I originally posted the report when 'fdisk /dev/hde' blocked for some seconds, after noting output like the above. However, I'm now unable to reproduce that. The text I attached is (I believe) part from the boot log. It still looks like an error, and the message seems the same, but the context isn't quite identical. Also, the posted error is from a boot some days ago. The latest boot didn't generate any error, not even a CRC error ! May be some context info can help: The device now on /dev/hde was originally my main disk (/dev/hda) and worked quite well. I recently upgraded, adding two other disks in, and so the original hd ended up on /dev/hde, where I'm starting to use it for other OSes. I'v just set up FreeBSD on /dev/hde1 and it seems to run fine. By the way, is the CRC error due to the disk or the controller ? Thanks, Stefan
Interestingly the BIOS selected pio for that device although it is DMA capable. That may indicate the firmware knows something we don't. Does FreeBSD use DMA on that device or does it use PIO ? The CRC error comes from the controller and drives. Each data transfer at UDMA or faster has a CRC generated by the sender and checked by the receiver. If they don't match a CRC error is asserted. This is all done at the hardware level.
I get a warning from the bios (?) when booting about using a 40 pin cable I may (should ?) replace with an 80 pin cable. But as that only was a warning, I didn't bother. Also, these lines from the freebsd boot log may clarify things a bit: ata2-master: DMA limited to UDMA33, non-ATA66 cable or device ad4: 38166MB <WDC WD400BB-75AUAI/18.20D18> [77545/16/63] at ata2-master UDMA33 Mounting root from ufs:/dev/ad4s1a
That would explain the initial problem if Linux got the cable detect wrong (the rest of the stuff is it breakign as a result although that still needs fixing). Looking at the code I don't see any obvious bugs in the cable detection handling. You are not using any options such as "ide2=ata66" I assume ?
nope.
Created attachment 108039 [details] cable detection fix for pdc202xx_old.c
I think I've found it! pdc202xx_old_cable_detect() always returns '0' (which means 80c cable) due to a sloppy coding - result of CIS & mask is truncated to 8 bits although CIS holds cable info in bits 10-11. The above patch fixes it.
Thanks a lot Bartlomiej, much appreciated
Hi, I've detected the same on other chipset. uname -a: Linux t2.terem.mindworks.hu 2.6.9-1.667 #1 Tue Nov 2 14:41:25 EST 2004 i686 athlon i386 GNU/Linux (but it's do the same with the latest FC3 kernel v.2.6.10) lspci: 00:00.0 Host bridge: Silicon Integrated Systems [SiS] 746 Host (rev 10) 00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SG86C202 00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS963 [MuTIOL Media IO] (rev 25) 00:02.1 SMBus: Silicon Integrated Systems [SiS] SiS961/2 SMBus Controller 00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] 00:02.7 Multimedia audio controller: Silicon Integrated Systems [SiS] Sound Controller (rev a0) 00:03.0 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f) 00:03.1 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f) 00:03.2 USB Controller: Silicon Integrated Systems [SiS] USB 2.0 Controller 00:04.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 90) 01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 SE] (rev 01) 01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200 SE] (Secondary) (rev 01) messages.log: Mar 19 15:21:48 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Mar 19 15:21:48 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=65269184, sector=65269174 Mar 19 15:21:48 t2 kernel: ide: failed opcode was: unknown Mar 19 15:21:48 t2 kernel: end_request: I/O error, dev hda, sector 65269174 Mar 19 15:21:50 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Mar 19 15:21:50 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=65269184, sector=65269182 Mar 19 15:21:50 t2 kernel: ide: failed opcode was: unknown Mar 19 15:21:50 t2 kernel: end_request: I/O error, dev hda, sector 65269182 Mar 19 15:21:55 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Mar 19 15:21:55 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=65269184, sector=65269182 Mar 19 15:21:55 t2 kernel: ide: failed opcode was: unknown Mar 19 15:21:55 t2 kernel: end_request: I/O error, dev hda, sector 65269182 Mar 19 15:21:58 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Mar 19 15:21:58 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=65269199, sector=65269198 Mar 19 15:21:58 t2 kernel: ide: failed opcode was: unknown Mar 19 15:21:58 t2 kernel: end_request: I/O error, dev hda, sector 65269198 Mar 19 15:24:51 t2 kernel: hda: CHECK for good STATUS Mar 19 15:24:51 t2 kernel: hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } Mar 19 15:24:51 t2 kernel:
It's doing the same on 15 equivalent machine.
I'm seeing this kind of behaviour with the card mentioned on bug#144743. Others would indicate it may well be associated.
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
Mar 19 15:21:58 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=65269199, sector=65269198 is a faulty disk not a kernel bug. So not relevant to the bug report.
This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Closing per previous comment.