Bug 127813 (IT_40900) - [RHEL3 U4] Loss of SATA ICH device hangs RAID1
Summary: [RHEL3 U4] Loss of SATA ICH device hangs RAID1
Keywords:
Status: CLOSED ERRATA
Alias: IT_40900
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 123574
TreeView+ depends on / blocked
 
Reported: 2004-07-14 06:12 UTC by Alexandre Oliva
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-12-20 20:55:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:550 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4 2004-12-20 05:00:00 UTC

Description Alexandre Oliva 2004-07-14 06:12:25 UTC
On the Mustang system (ICH5R), I installed RHEL3U1 on a software RAID1
(mirror on 2 drives).  While the system is running, if one of the SATA
drives is disconnected, the entire RAID1 becomes unresponsive (which
results in a system hang, since this is my system drive).

Even though libata/ata_piix doesn't support hot plug, it should fail
and time out in a more graceful manner, so that a RAID1 will continue
to be available after the removal or failure of one member.

I also tried this with the latest RHEL3, 2.4.21-15, and got the
following messages when I pulled the drive:

ata1: DMA timeout, stat 0x1
ATA: abnormal status 0xD0 on port 0xCC9F
scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 9b a2 07
00 00 b8 00
Current sd08:01: sense key Medium Error
Additional sense indicates Unrecovered read error - auto reallocate failed
 I/O error: dev 08:01, sector 10199496
raid1: Disk failure on sda1, disabling device.
▒Operation continuing on 1 devices
raid1: mirror resync was not fully finished, restarting next time.
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: md_do_sync() got signal ... exiting
raid1: mirror resync was not fully finished, restarting next time.
raid1: mirror resync was not fully finished, restarting next time.
raid1: mirror resync was not fully finished, restarting next time.
raid1: mirror resync was not fully finished, restarting next time.
raid1: mirror resync was not fully finished, restarting next time.
raid1: mirror resync was not fully finished, restarting next time.
raid1: mirror resync was not fully finished, restarting next time.
raid1: mirror resync was not fully finished, restarting next time.

(system will no longer do anything, because the RAID is no longer
responding...)

Comment 4 Jeff Garzik 2004-07-20 17:26:29 UTC
ICH5 controller hardware does not support SATA hotplug.

That's a "don't do that" situation on ICH5, unfortunately.

Most other SATA controllers do support hotplug, and libata needs some
work in that area.


Comment 5 Alexandre Oliva 2004-07-20 18:45:01 UTC
Can you please confirm that thi applies to ICH5R as well?

Also, can you supply evidence that lack of support for hot plugging is
in the hardware, not in our software?

Thanks,

Comment 6 Alexandre Oliva 2004-07-20 18:47:49 UTC
Nevermind, http://www.intel.com/design/chipsets/datashts/25251601.pdf
(ICH5 and ICH5R data sheet), on page 186, says:

5.17.2      Hot Swap Operation

Dynamic hot swap (e.g., surprise removal) is not supported by the SATA
host controller. However, using the SPC register configuration bits,
and power management flows, a device can be powered down by software,
and the port can then be powered off, allowing removal and insertion
of a new device.

Note: This hot swap operation requires BIOS and OS support.


Comment 7 Alexandre Oliva 2004-07-20 18:51:53 UTC
Anyhow, even though hot swapping is not supported, it's very
unfortunate that we can't actually just start issuing I/O errors for
accesses to the disk, such that the RAID devices using the disk can
survive.  Any chance we could arrange for this to happen?  This is the
actual problem.  This is not about support for hot swapping, it's
about getting the RAID to survive the disk removal/failure.  I mean,
what if the disk isn't removed, but rather actually dies, is this any
different?

Comment 12 Jeff Garzik 2004-08-17 11:06:59 UTC
Nothing has changed about ICH5 hotplug in the latest update.


Comment 13 giulioo 2004-08-17 11:19:50 UTC
But the problem is not hotplug!!

I think you should change the subject of this bug to be
"md raid1 useless if an ich5 disk managed by libata fails."
otherwise the bug will be ignored due to "ichX hotplug unsupported".



Comment 14 Alexandre Oliva 2004-08-17 12:05:28 UTC
I *had* changed the summary to make the (revised) problem statement
clearer.  Why was it changed back to something that we've already
agreed to be a hardware issue, not fixable in software?

Comment 20 Jeff Garzik 2004-10-08 21:36:38 UTC
Yes, this fix is present in the current taroon CVS.


Comment 21 Ernie Petrides 2004-10-11 20:02:33 UTC
The fix for this problem was committed to the RHEL3 U4 patch
pool on 17-Sep-2004 (in kernel version 2.4.21-20.7.EL).


Comment 22 John Flanagan 2004-12-20 20:55:39 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html



Note You need to log in before you can comment on or make changes to this bug.