On the Mustang system (ICH5R), I installed RHEL3U1 on a software RAID1 (mirror on 2 drives). While the system is running, if one of the SATA drives is disconnected, the entire RAID1 becomes unresponsive (which results in a system hang, since this is my system drive). Even though libata/ata_piix doesn't support hot plug, it should fail and time out in a more graceful manner, so that a RAID1 will continue to be available after the removal or failure of one member. I also tried this with the latest RHEL3, 2.4.21-15, and got the following messages when I pulled the drive: ata1: DMA timeout, stat 0x1 ATA: abnormal status 0xD0 on port 0xCC9F scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 9b a2 07 00 00 b8 00 Current sd08:01: sense key Medium Error Additional sense indicates Unrecovered read error - auto reallocate failed I/O error: dev 08:01, sector 10199496 raid1: Disk failure on sda1, disabling device. ▒Operation continuing on 1 devices raid1: mirror resync was not fully finished, restarting next time. md0: no spare disk to reconstruct array! -- continuing in degraded mode md: md_do_sync() got signal ... exiting raid1: mirror resync was not fully finished, restarting next time. raid1: mirror resync was not fully finished, restarting next time. raid1: mirror resync was not fully finished, restarting next time. raid1: mirror resync was not fully finished, restarting next time. raid1: mirror resync was not fully finished, restarting next time. raid1: mirror resync was not fully finished, restarting next time. raid1: mirror resync was not fully finished, restarting next time. raid1: mirror resync was not fully finished, restarting next time. (system will no longer do anything, because the RAID is no longer responding...)
ICH5 controller hardware does not support SATA hotplug. That's a "don't do that" situation on ICH5, unfortunately. Most other SATA controllers do support hotplug, and libata needs some work in that area.
Can you please confirm that thi applies to ICH5R as well? Also, can you supply evidence that lack of support for hot plugging is in the hardware, not in our software? Thanks,
Nevermind, http://www.intel.com/design/chipsets/datashts/25251601.pdf (ICH5 and ICH5R data sheet), on page 186, says: 5.17.2 Hot Swap Operation Dynamic hot swap (e.g., surprise removal) is not supported by the SATA host controller. However, using the SPC register configuration bits, and power management flows, a device can be powered down by software, and the port can then be powered off, allowing removal and insertion of a new device. Note: This hot swap operation requires BIOS and OS support.
Anyhow, even though hot swapping is not supported, it's very unfortunate that we can't actually just start issuing I/O errors for accesses to the disk, such that the RAID devices using the disk can survive. Any chance we could arrange for this to happen? This is the actual problem. This is not about support for hot swapping, it's about getting the RAID to survive the disk removal/failure. I mean, what if the disk isn't removed, but rather actually dies, is this any different?
Nothing has changed about ICH5 hotplug in the latest update.
But the problem is not hotplug!! I think you should change the subject of this bug to be "md raid1 useless if an ich5 disk managed by libata fails." otherwise the bug will be ignored due to "ichX hotplug unsupported".
I *had* changed the summary to make the (revised) problem statement clearer. Why was it changed back to something that we've already agreed to be a hardware issue, not fixable in software?
Yes, this fix is present in the current taroon CVS.
The fix for this problem was committed to the RHEL3 U4 patch pool on 17-Sep-2004 (in kernel version 2.4.21-20.7.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html