Bug 230109

Summary: bug in pdc202xx_old.c hangs system
Product: [Fedora] Fedora Reporter: Trevor Cordes <trevor>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: triage, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: F8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-04 17:35:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Trevor Cordes 2007-02-26 18:23:46 UTC
Description of problem:
DMA error (drive may be ailing) causes:
BUG: warning at drivers/ide/pci/pdc202xx_old.c:469/pdc202xx_reset_host() (Not
tainted)
errors and near-complete system hang.
Yes, I know that ailing disks do nasty things, but it's in a RAID 1 config and
it should survive such errors, and especially not give a "kernel: BUG" warning.

Version-Release number of selected component (if applicable):
2.6.18-1.2257.fc5

How reproducible:
only when DMA error pops up, rarely

Steps to Reproduce:
1. 2 drives sw md RAID1 off a Promise IDE card
2. wait for dma error
3. watch it crash
  
Actual results:
hang

Expected results:
keeps going (it is RAID 1 after all)

Additional info:
Feb 26 06:10:24 firewall kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb 26 06:10:24 firewall kernel: hde: dma_intr: error=0x84 { DriveStatusError
BadCRC }
Feb 26 06:10:24 firewall kernel: ide: failed opcode was: unknown
Feb 26 06:10:24 firewall kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb 26 06:10:24 firewall kernel: hde: dma_intr: error=0x84 { DriveStatusError
BadCRC }
Feb 26 06:10:24 firewall kernel: ide: failed opcode was: unknown
Feb 26 06:10:24 firewall kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb 26 06:10:24 firewall kernel: hde: dma_intr: error=0x84 { DriveStatusError
BadCRC }
Feb 26 06:10:24 firewall kernel: ide: failed opcode was: unknown
Feb 26 06:10:28 firewall kernel: hde: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb 26 06:10:28 firewall kernel: hde: dma_intr: error=0x84 { DriveStatusError
BadCRC }
Feb 26 06:10:28 firewall kernel: ide: failed opcode was: unknown
Feb 26 06:10:28 firewall kernel: BUG: warning at
drivers/ide/pci/pdc202xx_old.c:467/pdc202xx_reset_host() (Not tainted)
Feb 26 06:10:28 firewall kernel:  [<c0403f28>] dump_trace+0x69/0x1af
Feb 26 06:10:28 firewall kernel:  [<c0404086>] show_trace_log_lvl+0x18/0x2c
Feb 26 06:10:28 firewall kernel:  [<c0404601>] show_trace+0xf/0x11
Feb 26 06:10:28 firewall kernel:  [<c040468b>] dump_stack+0x15/0x17
Feb 26 06:10:28 firewall kernel:  [<c05497c5>] pdc202xx_reset_host+0x7a/0x12f
Feb 26 06:10:28 firewall kernel:  [<c054988c>] pdc202xx_reset+0x12/0x2a
Feb 26 06:10:28 firewall kernel:  [<c055302d>] do_reset1+0x176/0x191
Feb 26 06:10:28 firewall kernel:  [<c05523bc>] __ide_error+0x197/0x1aa
Feb 26 06:10:28 firewall kernel:  [<c055242b>] ide_error+0x5c/0x72
Feb 26 06:10:28 firewall kernel:  [<c055214b>] ide_intr+0x146/0x1a7
Feb 26 06:10:28 firewall kernel:  [<c043f79e>] handle_IRQ_event+0x23/0x49
Feb 26 06:10:28 firewall kernel:  [<c043f846>] __do_IRQ+0x82/0xde
Feb 26 06:10:28 firewall kernel:  [<c0405385>] do_IRQ+0x9a/0xb8
Feb 26 06:10:28 firewall kernel:  =======================
Feb 26 06:10:28 firewall kernel: BUG: warning at
drivers/ide/pci/pdc202xx_old.c:469/pdc202xx_reset_host() (Not tainted)
Feb 26 06:10:28 firewall kernel:  [<c0403f28>] dump_trace+0x69/0x1af
Feb 26 06:10:28 firewall kernel:  [<c0404086>] show_trace_log_lvl+0x18/0x2c
Feb 26 06:10:28 firewall kernel:  [<c0404601>] show_trace+0xf/0x11
Feb 26 06:10:28 firewall kernel:  [<c040468b>] dump_stack+0x15/0x17
Feb 26 06:10:28 firewall kernel:  [<c0549839>] pdc202xx_reset_host+0xee/0x12f
Feb 26 06:10:28 firewall kernel:  [<c054988c>] pdc202xx_reset+0x12/0x2a
Feb 26 06:10:28 firewall kernel:  [<c055302d>] do_reset1+0x176/0x191
Feb 26 06:10:28 firewall kernel:  [<c05523bc>] __ide_error+0x197/0x1aa
Feb 26 06:10:28 firewall kernel:  [<c055242b>] ide_error+0x5c/0x72
Feb 26 06:10:28 firewall kernel:  [<c055214b>] ide_intr+0x146/0x1a7
Feb 26 06:10:28 firewall kernel:  [<c043f79e>] handle_IRQ_event+0x23/0x49
Feb 26 06:10:28 firewall kernel:  [<c043f846>] __do_IRQ+0x82/0xde
Feb 26 06:10:28 firewall kernel:  [<c0405385>] do_IRQ+0x9a/0xb8
Feb 26 06:10:28 firewall kernel:  =======================
Feb 26 06:10:28 firewall kernel: PDC202XX: Primary channel reset.
Feb 26 06:10:28 firewall kernel: PDC202XX: Secondary channel reset.
Feb 26 06:10:28 firewall kernel: ide2: reset: master: error (0x00?)
-- end of log output until rebooted --

Comment 1 Trevor Cordes 2007-02-26 18:26:57 UTC
potentially related unresolved bugs:
bug #140788
bug #130810


Comment 2 Bug Zapper 2008-04-04 06:22:40 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 3 Trevor Cordes 2008-04-04 12:53:16 UTC
I haven't seen this bug since the initial report, but it is an obscure bug
requiring a rare set of circumstances, so I'm not surprised.  Maybe it's fixed
upstream, maybe not.


Comment 4 Dave Jones 2008-04-04 17:35:04 UTC
We switched to a completely different set of ata drivers after that kernel, so
chances are the messages don't exist any more, but if there was a problem it
would exhibit a different failure mode.

If everything works fine, we're probably ok to just close this.