Bug 203530 - DMA timeout error on hdc followed by unresponsive system
DMA timeout error on hdc followed by unresponsive system
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.3
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Aristeu Rozanski
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-22 06:25 EDT by Charlie Brady
Modified: 2007-11-16 20:14 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-29 17:49:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Charlie Brady 2006-08-22 06:25:41 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.5) Gecko/20060728 CentOS/1.5.0.5-0.el4.1.centos4 Firefox/1.5.0.5 pango-text

Description of problem:
See syslog entries below. System became totally unresposive, with nothing dispayed on console, and was power cycled just before the reboot at Aug 21 22:02:08.

...
Aug 20 22:06:26 e-smith esmith::event[2906]: S55set-gateway-ip=action|Event|ip-up|Action|S55set-gateway-ip|Start|1156125986 815261|End|1156125986 818437|Elapsed|0.003176
Aug 20 22:06:26 e-smith pppd[2686]: Script /etc/ppp/ip-up finished (pid 2896), status = 0x0
Aug 20 22:55:15 e-smith kernel: hdc: dma_timer_expiry: dma status == 0x61
Aug 20 22:55:25 e-smith kernel: hdc: DMA timeout error
Aug 20 22:55:25 e-smith kernel: hdc: dma timeout error: status=0xd0 { Busy }
Aug 20 22:55:25 e-smith kernel:
Aug 20 22:55:25 e-smith kernel: ide: failed opcode was: unknown
Aug 20 22:55:25 e-smith kernel: hdc: DMA disabled
Aug 20 22:55:25 e-smith kernel: hdd: DMA disabled
Aug 20 22:56:00 e-smith kernel: ide1: reset timed-out, status=0xd0
Aug 20 22:56:00 e-smith kernel: hdd: status timeout: status=0xd0 { Busy }
Aug 20 22:56:00 e-smith kernel: hdd: status timeout: error=0xd0LastFailedSense 0x0d
Aug 20 22:56:00 e-smith kernel: hdd: drive not ready for command
Aug 20 22:56:30 e-smith kernel: hdd: ATAPI reset timed-out, status=0x80
Aug 20 22:57:05 e-smith kernel: ide1: reset timed-out, status=0x80
Aug 21 22:02:08 e-smith syslogd 1.4.1: restart.
Aug 21 22:02:08 e-smith syslog: syslogd startup succeeded
Aug 21 22:02:08 e-smith syslog:
Aug 21 22:02:08 e-smith syslog: Starting kernel logger:
Aug 21 22:02:08 e-smith kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug 21 22:02:08 e-smith syslog: klogd startup succeeded
...

/dev/hdc was one of a RAID1 pair with /dev/hda, so system could have remained operational by ejecting partitions on /dev/hdc from any RAID sets. As it was, once system rebooted, /dev/hdc2 was removed from /dev/md2, but /dev/hdc1 remained in /dev/md1.

smartctl did not show any disk problems after reboot, and I re-added /dev/hdc2 to /dev/md2. System is currently running find AFAICT.

Problem has only happened a single time.




Version-Release number of selected component (if applicable):
2.6.9-34.0.2.ELsmp

How reproducible:
Didn't try


Steps to Reproduce:


Actual Results:


Expected Results:
I don't know whether the DMA timeout indicates a hardware error or a bug in the kernel, but I think the kernel should be able to respond more gracefully to a hardware problem if there is redundant hardware.


Additional info:
Comment 1 Jason Baron 2006-09-01 13:08:26 EDT
So you've only seen this problem once?
Comment 2 Charlie Brady 2006-09-01 18:02:35 EDT
Q. So you've only seen this problem once?

A. (already given) "Problem has only happened a single time."

Comment 3 Aristeu Rozanski 2006-09-29 17:49:01 EDT
As it's not possible to reproduce and get more data about it, I'm closing this
one. Please feel free to reopen it if it could be reproduced with latest RHEL-4
kernel.

Note You need to log in before you can comment on or make changes to this bug.