Bug 203530

Summary: DMA timeout error on hdc followed by unresponsive system
Product: Red Hat Enterprise Linux 4 Reporter: Charlie Brady <charlieb-fedora-bugzilla>
Component: kernelAssignee: Aristeu Rozanski <arozansk>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.3CC: alan, jbaron
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-09-29 21:49:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Charlie Brady 2006-08-22 10:25:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.5) Gecko/20060728 CentOS/1.5.0.5-0.el4.1.centos4 Firefox/1.5.0.5 pango-text

Description of problem:
See syslog entries below. System became totally unresposive, with nothing dispayed on console, and was power cycled just before the reboot at Aug 21 22:02:08.

...
Aug 20 22:06:26 e-smith esmith::event[2906]: S55set-gateway-ip=action|Event|ip-up|Action|S55set-gateway-ip|Start|1156125986 815261|End|1156125986 818437|Elapsed|0.003176
Aug 20 22:06:26 e-smith pppd[2686]: Script /etc/ppp/ip-up finished (pid 2896), status = 0x0
Aug 20 22:55:15 e-smith kernel: hdc: dma_timer_expiry: dma status == 0x61
Aug 20 22:55:25 e-smith kernel: hdc: DMA timeout error
Aug 20 22:55:25 e-smith kernel: hdc: dma timeout error: status=0xd0 { Busy }
Aug 20 22:55:25 e-smith kernel:
Aug 20 22:55:25 e-smith kernel: ide: failed opcode was: unknown
Aug 20 22:55:25 e-smith kernel: hdc: DMA disabled
Aug 20 22:55:25 e-smith kernel: hdd: DMA disabled
Aug 20 22:56:00 e-smith kernel: ide1: reset timed-out, status=0xd0
Aug 20 22:56:00 e-smith kernel: hdd: status timeout: status=0xd0 { Busy }
Aug 20 22:56:00 e-smith kernel: hdd: status timeout: error=0xd0LastFailedSense 0x0d
Aug 20 22:56:00 e-smith kernel: hdd: drive not ready for command
Aug 20 22:56:30 e-smith kernel: hdd: ATAPI reset timed-out, status=0x80
Aug 20 22:57:05 e-smith kernel: ide1: reset timed-out, status=0x80
Aug 21 22:02:08 e-smith syslogd 1.4.1: restart.
Aug 21 22:02:08 e-smith syslog: syslogd startup succeeded
Aug 21 22:02:08 e-smith syslog:
Aug 21 22:02:08 e-smith syslog: Starting kernel logger:
Aug 21 22:02:08 e-smith kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug 21 22:02:08 e-smith syslog: klogd startup succeeded
...

/dev/hdc was one of a RAID1 pair with /dev/hda, so system could have remained operational by ejecting partitions on /dev/hdc from any RAID sets. As it was, once system rebooted, /dev/hdc2 was removed from /dev/md2, but /dev/hdc1 remained in /dev/md1.

smartctl did not show any disk problems after reboot, and I re-added /dev/hdc2 to /dev/md2. System is currently running find AFAICT.

Problem has only happened a single time.




Version-Release number of selected component (if applicable):
2.6.9-34.0.2.ELsmp

How reproducible:
Didn't try


Steps to Reproduce:


Actual Results:


Expected Results:
I don't know whether the DMA timeout indicates a hardware error or a bug in the kernel, but I think the kernel should be able to respond more gracefully to a hardware problem if there is redundant hardware.


Additional info:

Comment 1 Jason Baron 2006-09-01 17:08:26 UTC
So you've only seen this problem once?

Comment 2 Charlie Brady 2006-09-01 22:02:35 UTC
Q. So you've only seen this problem once?

A. (already given) "Problem has only happened a single time."



Comment 3 Aristeu Rozanski 2006-09-29 21:49:01 UTC
As it's not possible to reproduce and get more data about it, I'm closing this
one. Please feel free to reopen it if it could be reproduced with latest RHEL-4
kernel.