Bug 69338
Summary: | RAID-1 bad block remapping | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Mace Moneta <moneta.mace> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 | CC: | moneta.mace |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:39:46 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mace Moneta
2002-07-21 16:13:56 UTC
Just a note that this still occurs on Redhat 8.0. I'm using a script to auto-recover from this, copied here in case someone finds it useful: #!/bin/bash ########################################################################### # # raidmon # # Author: Mace Moneta # Created: 06/23/2002 # Modified: # Version 1.0 # # This script is invoked periodically by cron to check the status of # the raid-1 array. In the event of a failure, attempt recovery by # hot-removing and hot-adding the failed drive partition. # # Prerequisites: # # None. # ########################################################################### # # Obtain a lock to prevent multiple instances # if [ -f /var/lock/raidmon.lock ] then /bin/echo "Lock held by another instance - exiting" | /usr/bin/logger -t raid.status -- exit fi /bin/touch /var/lock/raidmon.lock # # Who gets status emails? # EMAIL="root" # # Randomize start time within the minute # /bin/sleep $(($RANDOM % 50)) # # check the status of the multi-disk devices # RAIDFAIL=`/sbin/lsraid -A -f -a /dev/md0 -a /dev/md1 -a /dev/md2 | /bin/grep -v online | /bin/grep -v good | /bin/grep -v "^$"` # # If the array is good, just note it in the syslog. # # If there is a failure, perform recovery. # if [ "$RAIDFAIL" == "" ] then /bin/echo "Good status" | /usr/bin/logger -t raid.status -- else # # Recovery procedure: # # 1. Log the failure to syslog # 2. Email notification of the failure # 3. Log the multi-device that failed and the physical partition # 4. Hot-remove the failed partition from the multi-device # 5. Hot-add the partition back to the multi-device # 6. Log the recovery process completion # 7. Email notification of the recovery process completion # /bin/echo "$RAIDFAIL" | /usr/bin/logger -t raid.status -- /bin/echo "$RAIDFAIL" | /bin/mail -s "RAID failure" $EMAIL &>/dev/null BADDEV=`/bin/echo $RAIDFAIL | /bin/awk '{print $4}'` BADDEVSHORT=`/bin/echo $BADDEV | /bin/awk -F/ '{print $3}'` MDDEV=`/bin/cat /proc/mdstat | /bin/grep $BADDEVSHORT | /bin/awk '{print $1}'` /bin/echo "Initiating automatic recovery of $MDDEV device $BADDEVSHORT" | /usr/bin/logger -t raid.status -- /bin/sleep 10 /sbin/raidhotremove /dev/$MDDEV /dev/$BADDEVSHORT /bin/sleep 10 /sbin/raidhotadd /dev/$MDDEV /dev/$BADDEVSHORT /bin/sleep 10 /bin/echo "Automatic recovery of $MDDEV device $BADDEVSHORT completed" | /usr/bin/logger -t raid.status -- /bin/cat /proc/mdstat | /bin/mail -s "Automatic recovery of $MDDEV device $BADDEVSHORT completed" $EMAIL fi # # Processing completed, clear the lock # /bin/rm /var/lock/raidmon.lock Just thought I'd update this bug. This problem continued into Redhat 9, and I've confirmed that the same thing happens in Fedora Core 1, up through the 2.4.22-1.2179.nptl kernel. This bug has been open for almost 2 years now. I'll check back in a year or two... Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |