140788 – kernel blocks temporarily when accessing ide hd

Bug 140788 - kernel blocks temporarily when accessing ide hd

Summary: kernel blocks temporarily when accessing ide hd

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	4
Hardware:	athlon
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Alan Cox
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-11-24 22:24 UTC by Stefan Seefeld
Modified:	2007-11-30 22:10 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-05-04 12:55:35 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
boot dmesg (12.52 KB, text/plain) 2004-11-27 18:28 UTC, Stefan Seefeld	no flags	Details
output of 'hdparm /dev/hde' (257 bytes, text/plain) 2004-11-27 18:29 UTC, Stefan Seefeld	no flags	Details
extract from my /var/log/messages file (2.59 KB, text/plain) 2004-11-27 18:49 UTC, Stefan Seefeld	no flags	Details
cable detection fix for pdc202xx_old.c (317 bytes, patch) 2004-12-07 14:54 UTC, Bartlomiej Zolnierkiewicz	no flags	Details \| Diff
View All

Description Stefan Seefeld 2004-11-24 22:24:34 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041020
Firefox/0.10.1

Description of problem:
When accessing an auxiliary HD, the kernel blocks for about 20 seconds
and generates an error message in /var/log/messages:

ov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown
Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown
Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown
Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown
Nov 24 17:17:01 marvin kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 24 17:17:01 marvin kernel: hde: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov 24 17:17:01 marvin kernel: ide: failed opcode was: unknown
Nov 24 17:17:02 marvin kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 24 17:17:02 marvin kernel: hde: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov 24 17:17:02 marvin kernel: ide: failed opcode was: unknown
Nov 24 17:17:02 marvin kernel: drivers/ide/ide-io.c:1390:
spin_lock(drivers/ide/ide.c:0236e3a8) already locked by
drivers/ide/ide-iops.c/1178
Nov 24 17:17:13 marvin kernel: Badness in pdc202xx_reset_host at
drivers/ide/pci/pdc202xx_old.c:588
Nov 24 17:17:13 marvin kernel: Stack pointer is garbage, not printing
trace
Nov 24 17:17:13 marvin kernel: Badness in pdc202xx_reset_host at
drivers/ide/pci/pdc202xx_old.c:590
Nov 24 17:17:13 marvin kernel: Stack pointer is garbage, not printing
trace
Nov 24 17:17:13 marvin kernel: PDC202XX: Primary channel reset.
Nov 24 17:17:13 marvin kernel: PDC202XX: Secondary channel reset.
Nov 24 17:17:14 marvin kernel: drivers/ide/ide-iops.c:1246:
spin_unlock(drivers/ide/ide.c:0236e3a8) not locked
Nov 24 17:17:14 marvin kernel: ide2: reset: master: error (0x00?)


Version-Release number of selected component (if applicable):
kernel-2.6.9-1.667

How reproducible:
Always

Steps to Reproduce:
1.boot linux from drive /dev/hdb
2.call fdisk /dev/hde
3.
    

Actual Results:  the system will block for ~20 seconds (apparently
trying to read the partition table).
Any further read access on the disk will work fine.

Expected Results:  i/o on the disk should work smoothly.

Additional info:

I'v used the disk on this controller before (FC2).
I'v just installed another OS on /dev/hde (FreeBSD)
without any problem, so I don't think it's a h/w problem.

Comment 1 Alan Cox 2004-11-25 12:49:19 UTC

Initial error is a hardware error (CRC error)  although that could be a drive
mistune from software. The reset and lock errors are real and I will go chase
those down.

Can you attach a boot dmesg and more of the log, plus an hdparm info dump of
that drive so I can look at the modes chosen.

Comment 2 Stefan Seefeld 2004-11-27 18:28:49 UTC

Created attachment 107501 [details]
boot dmesg

Comment 3 Stefan Seefeld 2004-11-27 18:29:49 UTC

Created attachment 107502 [details]
output of 'hdparm /dev/hde'

Comment 4 Stefan Seefeld 2004-11-27 18:49:42 UTC

Created attachment 107503 [details]
extract from my /var/log/messages file

I originally posted the report when 'fdisk /dev/hde' blocked for some seconds,
after noting output like the above. However, I'm now unable to reproduce that.
The text I attached is (I believe) part from the boot log. It still looks like
an error, and the message seems the same, but the context isn't quite
identical.

Also, the posted error is from a boot some days ago. The latest boot didn't
generate any error, not even a CRC error !

May be some context info can help: The device now on /dev/hde was originally
my main disk (/dev/hda) and worked quite well. I recently upgraded, adding
two other disks in, and so the original hd ended up on /dev/hde, where I'm
starting to use it for other OSes. I'v just set up FreeBSD on /dev/hde1 and
it seems to run fine.

By the way, is the CRC error due to the disk or the controller ?

Thanks,
	     Stefan

Comment 5 Alan Cox 2004-11-28 14:56:05 UTC

Interestingly the BIOS selected pio for that device although it is DMA capable.
That may indicate the firmware knows something we don't. Does FreeBSD use DMA on
that device or does it use PIO ?

The CRC error comes from the controller and drives. Each data transfer at UDMA
or faster has a CRC generated by the sender and checked by the receiver. If they
don't match a CRC error is asserted. This is all done at the hardware level.

Comment 7 Stefan Seefeld 2004-11-28 19:08:03 UTC

I get a warning from the bios (?) when booting about using a 40 pin cable I may
(should ?) replace with an 80 pin cable. But as that only was a warning, I didn't
bother.
Also, these lines from the freebsd boot log may clarify things a bit:

ata2-master: DMA limited to UDMA33, non-ATA66 cable or device
ad4: 38166MB <WDC WD400BB-75AUAI/18.20D18> [77545/16/63] at ata2-master UDMA33
Mounting root from ufs:/dev/ad4s1a

Comment 8 Alan Cox 2004-11-28 20:51:02 UTC

That would explain the initial problem if Linux got the cable detect wrong (the
rest of the stuff is it breakign as a result although that still needs fixing).
 Looking at the code I don't see any obvious bugs in the cable detection
handling. You are not using any options such as "ide2=ata66" I assume ?

Comment 9 Stefan Seefeld 2004-11-28 22:10:29 UTC

nope.

Comment 13 Bartlomiej Zolnierkiewicz 2004-12-07 14:54:28 UTC

Created attachment 108039 [details]
cable detection fix for pdc202xx_old.c

Comment 14 Bartlomiej Zolnierkiewicz 2004-12-07 14:56:24 UTC

I think I've found it!  pdc202xx_old_cable_detect() always returns '0'
(which means 80c cable) due to a sloppy coding - result of CIS & mask
is truncated to 8 bits although CIS holds cable info in bits 10-11.

The above patch fixes it.

Comment 15 Alan Cox 2004-12-07 15:25:13 UTC

Thanks a lot Bartlomiej, much appreciated

Comment 19 Fehér János 2005-03-21 08:33:53 UTC

Hi,

I've detected the same on other chipset.

uname -a:

Linux t2.terem.mindworks.hu 2.6.9-1.667 #1 Tue Nov 2 14:41:25 EST 2004 i686
athlon i386 GNU/Linux

(but it's do the same with the latest FC3 kernel v.2.6.10)

lspci:

00:00.0 Host bridge: Silicon Integrated Systems [SiS] 746 Host (rev 10)
00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SG86C202
00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS963 [MuTIOL Media IO]
(rev 25)
00:02.1 SMBus: Silicon Integrated Systems [SiS] SiS961/2 SMBus Controller
00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE]
00:02.7 Multimedia audio controller: Silicon Integrated Systems [SiS] Sound
Controller (rev a0)
00:03.0 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f)
00:03.1 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f)
00:03.2 USB Controller: Silicon Integrated Systems [SiS] USB 2.0 Controller
00:04.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast
Ethernet (rev 90)
01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 SE]
(rev 01)
01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200 SE]
(Secondary) (rev 01)

messages.log:

Mar 19 15:21:48 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete
Error }
Mar 19 15:21:48 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError },
LBAsect=65269184, sector=65269174
Mar 19 15:21:48 t2 kernel: ide: failed opcode was: unknown
Mar 19 15:21:48 t2 kernel: end_request: I/O error, dev hda, sector 65269174
Mar 19 15:21:50 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete
Error }
Mar 19 15:21:50 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError },
LBAsect=65269184, sector=65269182
Mar 19 15:21:50 t2 kernel: ide: failed opcode was: unknown
Mar 19 15:21:50 t2 kernel: end_request: I/O error, dev hda, sector 65269182
Mar 19 15:21:55 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete
Error }
Mar 19 15:21:55 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError },
LBAsect=65269184, sector=65269182
Mar 19 15:21:55 t2 kernel: ide: failed opcode was: unknown
Mar 19 15:21:55 t2 kernel: end_request: I/O error, dev hda, sector 65269182
Mar 19 15:21:58 t2 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete
Error }
Mar 19 15:21:58 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError },
LBAsect=65269199, sector=65269198
Mar 19 15:21:58 t2 kernel: ide: failed opcode was: unknown
Mar 19 15:21:58 t2 kernel: end_request: I/O error, dev hda, sector 65269198
Mar 19 15:24:51 t2 kernel: hda: CHECK for good STATUS
Mar 19 15:24:51 t2 kernel: hda: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Mar 19 15:24:51 t2 kernel:

Comment 20 Fehér János 2005-03-21 08:36:58 UTC

It's doing the same on 15 equivalent machine.

Comment 21 Peter Lawler 2005-04-24 05:31:24 UTC

I'm seeing this kind of behaviour with the card mentioned on bug#144743. Others
would indicate it may well be associated.

Comment 22 Dave Jones 2005-07-15 19:50:38 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 23 Alan Cox 2005-09-08 09:41:07 UTC

Mar 19 15:21:58 t2 kernel: hda: dma_intr: error=0x40 { UncorrectableError },
LBAsect=65269199, sector=65269198

is a faulty disk not a kernel bug. So not relevant to the bug report.

Comment 24 Dave Jones 2006-01-16 22:27:54 UTC

This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.

Comment 25 Dave Jones 2006-02-03 06:45:10 UTC

This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.

Comment 26 John Thacker 2006-05-04 12:55:35 UTC

Closing per previous comment.

Note You need to log in before you can comment on or make changes to this bug.