Bug 233972

Summary: Enhancement request to Software RAID to do Data Scrubbing
Product: Red Hat Enterprise Linux 5 Reporter: Colin.Simpson
Component: mdadmAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: BaseOS QE <qe-baseos-auto>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: dkovalsk, k.georgiou, msusta, riek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The Linux software raid stack supports data scrubbing (reading disks in the raid array and looking for bad sectors, and when bad sectors are found using information from other disks or from parity to rewrite the bad sectors with good data). However, the mdadm package did not make use of this functionality. This package adds a cron job to /etc/cron.weekly to check disks for bad sectors and repair them when found.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:52:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 513501    

Description Colin.Simpson 2007-03-26 13:04:00 UTC
Description of problem:
Disks that are run as a Software RAID can develop bad blocks on unaccessed 
sectors of the disk. When a disk fails in the array and you replace the drive, 
it can fail to rebuild due to previously hidden bad blocks on the remaining 
disks (we've recently been bitten by this). As disks get larger this problem
becomes more likely. This can be mitigated on suitably up to date kernels by so
called "Data Scrubbing". This is a very serious issue as without being scrubbed
a RAID 5 can be less reliable than a RAID 0 with 2 drives (this stat it's off
one of the links below).

Debian has a script checkarray that they cron weekly (I'm told) that simply calls,

echo check > /sys/block/mdX/md/sync_action

,for each of the Software RAID's.


See:
http://www.gentoo-wiki.com/HOWTO_Install_on_Software_RAID#Data_Scrubbing
http://www.ashtech.net/~syntax/blog/archives/53-Data-Scrub-with-Linux-RAID-or-Die.html
http://linux-raid.osdl.org/index.php/RAID_Administration


A similar script should probably be added to RH EL and Fedora. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Colin.Simpson 2007-10-29 15:24:05 UTC
Any thoughts on this ticket? 


Comment 2 Doug Ledford 2008-06-14 16:52:49 UTC
The check capability is present in rhel5 already, but we don't automatically
initiate check events as those can have negative impacts on both performance and
power consumption.  It is left to the user to initiate an event if they choose.
 I would highly recommend initiating an event prior to any planned modifications
of the array.

However, I can certainly see shipping a cron.weekly script that simply defaults
to off, but can be enabled by the user for exactly this purpose.

Comment 3 RHEL Program Management 2008-07-21 23:11:31 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 4 Colin.Simpson 2008-07-22 09:00:55 UTC
Not so bothered about it making it into a RH minor release, I think it should be
on your radar for a future major release. 

Should I (or can you, as I'm not sure exactly how) put this as a suggestion to
the Fedora team so it may make it into RH release down the line. 



Comment 8 Ruediger Landmann 2009-05-21 05:51:49 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The Linux software raid stack supports data scrubbing (reading disks in the raid array and looking for bad sectors, and when bad sectors are found using information from other disks or from parity to rewrite the bad sectors with good data).  However, the mdadm package did not make use of this functionality.  This package adds a cron job to /etc/cron.weekly to check disks for bad sectors and repair them when found.

Comment 9 Matěj Šusta 2009-07-22 15:01:47 UTC
Small note to relnotes:
- change sectors to blocks
- actual version of script just runs "check", which means that array will be checked whether it's consistent, but nothing will be repaired

Comment 10 Matěj Šusta 2009-07-24 08:34:14 UTC
/me slaps his face, to read better next time, please ignore comment #9

Comment 15 errata-xmlrpc 2009-09-02 11:52:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1382.html