Bug 69338

Summary:	RAID-1 bad block remapping
Product:	[Retired] Red Hat Linux	Reporter:	Mace Moneta <moneta.mace>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	8.0	CC:	moneta.mace
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-09-30 15:39:46 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mace Moneta 2002-07-21 16:13:56 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020606

Description of problem:
I experienced a read error (see below) on one drive of a RAID-1 2-disk array. 
The drive remapped the bad sector, however, no attempt was made by the kernel to
rewrite using the data from the good drive in the array.  Instead, the drive was
removed from the array.  Simply performing a raidhotremove and raidhotadd
restored the array, but with a 2 hour (plus time to detect the failure) exposure
to a secondary failure.  In addition, if you look at the definition below, you
will see that there are three partitions on that "failed" drive that are part of
RAID-1 arrays.  The remaining two partitions were not removed from the array (a
good thing in this case, but an inconsistent implementation).  Either the drive
should be removed from all arrays, or soft recovery (preferred) should be
attempted.  

I'd suggest attempting to re-write the data some number of times (16?) and if
unsuccessful, the drive (all partitions) should be removed from arrays.

/var/log/messages:

Jun 22 02:20:16 mmouse kernel: hdg: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Jun 22 02:20:16 mmouse kernel: hdg: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=2670177, sector=531200
Jun 22 02:20:16 mmouse kernel: end_request: I/O error, dev 22:03
(hdg), sector 531200
Jun 22 02:20:16 mmouse kernel: raid1: Disk failure on hdg3, disabling
device.
Jun 22 02:20:16 mmouse kernel: Operation continuing on 1 devices
Jun 22 02:20:16 mmouse kernel: raid1: hdg3: rescheduling block 531200
Jun 22 02:20:16 mmouse kernel: md: updating md0 RAID superblock on
device
Jun 22 02:20:16 mmouse kernel: md: hde3 [events: 00000008]<6>(write)
hde3's sb offset: 77107904
Jun 22 02:20:16 mmouse kernel: md: recovery thread got woken up ...
Jun 22 02:20:16 mmouse kernel: md0: no spare disk to reconstruct
array! -- continuing in degraded mode
Jun 22 02:20:16 mmouse kernel: md: recovery thread finished ...
Jun 22 02:20:16 mmouse kernel: md: (skipping faulty hdg3 )
Jun 22 02:20:16 mmouse kernel: raid1: hde3: redirecting sector 531200
to another mirror

...<snip>

Jun 22 05:18:06 mmouse kernel: md: trying to hot-add hdg3 to md0 ... 
Jun 22 05:18:06 mmouse kernel: md: bind<hdg3,2>

...<snip>

Jun 22 06:17:48 mmouse kernel: md: md0: sync done.

---

/etc/raidtab:

raiddev             /dev/md0
raid-level                  1
nr-raid-disks               2
chunk-size                  64k
persistent-superblock       1
nr-spare-disks              0
    device          /dev/hde3
    raid-disk     0
    device          /dev/hdg3
    raid-disk     1
raiddev             /dev/md2
raid-level                  1
nr-raid-disks               2
chunk-size                  64k
persistent-superblock       1
nr-spare-disks              0
    device          /dev/hde1
    raid-disk     0
    device          /dev/hdg1
    raid-disk     1
raiddev/dev/md1
raid-level                  1
nr-raid-disks               2
chunk-size                  64k
persistent-superblock       1
nr-spare-disks              0
    device          /dev/hde2
    raid-disk     0
    device          /dev/hdg2
    raid-disk     1



Version-Release number of selected component (if applicable):

2.4.18-5 kernel

How reproducible:
Always

Steps to Reproduce:
1.Wait for a bad block error to occur
2.Observe error
3.raidhotremove and raidhotadd to recover
	

Actual Results:  The partition is removed from the array

Expected Results:  The error should have been corrected automatically by
attempting to rewrite the data, or all partitions on the failed drive should be
removed from arrays.

Additional info:

Comment 1 Mace Moneta 2003-01-24 00:47:29 UTC

Just a note that this still occurs on Redhat 8.0.  I'm using a script to
auto-recover from this, copied here in case someone finds it useful:

#!/bin/bash

###########################################################################
#
# raidmon
#
# Author: Mace Moneta
# Created: 06/23/2002
# Modified: 
# Version 1.0
#
# This script is invoked periodically by cron to check the status of
# the raid-1 array.  In the event of a failure, attempt recovery by
# hot-removing and hot-adding the failed drive partition.
#
# Prerequisites:
#
# None.
#
###########################################################################

#
# Obtain a lock to prevent multiple instances
#
if [ -f /var/lock/raidmon.lock ]
then
   /bin/echo "Lock held by another instance - exiting" | /usr/bin/logger -t
raid.status --
   exit
fi
/bin/touch /var/lock/raidmon.lock

#
# Who gets status emails?
#
EMAIL="root"

#
# Randomize start time within the minute
#
/bin/sleep $(($RANDOM % 50))

#
# check the status of the multi-disk devices
#
RAIDFAIL=`/sbin/lsraid -A -f -a /dev/md0 -a /dev/md1 -a /dev/md2 | /bin/grep -v
online | /bin/grep -v good | /bin/grep -v "^$"`

#
# If the array is good, just note it in the syslog.
#
# If there is a failure, perform recovery.
#
if [ "$RAIDFAIL" == "" ]
then
   /bin/echo "Good status" | /usr/bin/logger -t raid.status --
else
   #
   # Recovery procedure:
   #
   # 1. Log the failure to syslog
   # 2. Email notification of the failure
   # 3. Log the multi-device that failed and the physical partition
   # 4. Hot-remove the failed partition from the multi-device
   # 5. Hot-add the partition back to the multi-device
   # 6. Log the recovery process completion
   # 7. Email notification of the recovery process completion
   #
   /bin/echo "$RAIDFAIL" | /usr/bin/logger -t raid.status --
   /bin/echo "$RAIDFAIL" | /bin/mail -s "RAID failure" $EMAIL &>/dev/null
   BADDEV=`/bin/echo $RAIDFAIL | /bin/awk '{print $4}'`
   BADDEVSHORT=`/bin/echo $BADDEV | /bin/awk -F/ '{print $3}'`
   MDDEV=`/bin/cat /proc/mdstat | /bin/grep $BADDEVSHORT | /bin/awk '{print $1}'`
   /bin/echo "Initiating automatic recovery of $MDDEV device $BADDEVSHORT" |
/usr/bin/logger -t raid.status --
   /bin/sleep 10
   /sbin/raidhotremove /dev/$MDDEV /dev/$BADDEVSHORT
   /bin/sleep 10
   /sbin/raidhotadd /dev/$MDDEV /dev/$BADDEVSHORT
   /bin/sleep 10
   /bin/echo "Automatic recovery of $MDDEV device $BADDEVSHORT completed" |
/usr/bin/logger -t raid.status --
   /bin/cat /proc/mdstat | /bin/mail -s "Automatic recovery of $MDDEV device
$BADDEVSHORT completed" $EMAIL
fi

#
# Processing completed, clear the lock
#
/bin/rm /var/lock/raidmon.lock

Comment 2 Mace Moneta 2004-04-25 06:07:49 UTC

Just thought I'd update this bug.  This problem continued into Redhat
9, and I've confirmed that the same thing happens in Fedora Core 1, up
through the 2.4.22-1.2179.nptl kernel.  This bug has been open for
almost 2 years now.  I'll check back in a year or two...

Comment 3 Bugzilla owner 2004-09-30 15:39:46 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/