Bug 233972 - Enhancement request to Software RAID to do Data Scrubbing
Enhancement request to Software RAID to do Data Scrubbing
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mdadm (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Doug Ledford
BaseOS QE
:
Depends On:
Blocks: 5.4/TechnicalNotes
  Show dependency treegraph
 
Reported: 2007-03-26 09:04 EDT by Colin Simpson
Modified: 2010-03-14 17:31 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The Linux software raid stack supports data scrubbing (reading disks in the raid array and looking for bad sectors, and when bad sectors are found using information from other disks or from parity to rewrite the bad sectors with good data). However, the mdadm package did not make use of this functionality. This package adds a cron job to /etc/cron.weekly to check disks for bad sectors and repair them when found.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:52:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Colin Simpson 2007-03-26 09:04:00 EDT
Description of problem:
Disks that are run as a Software RAID can develop bad blocks on unaccessed 
sectors of the disk. When a disk fails in the array and you replace the drive, 
it can fail to rebuild due to previously hidden bad blocks on the remaining 
disks (we've recently been bitten by this). As disks get larger this problem
becomes more likely. This can be mitigated on suitably up to date kernels by so
called "Data Scrubbing". This is a very serious issue as without being scrubbed
a RAID 5 can be less reliable than a RAID 0 with 2 drives (this stat it's off
one of the links below).

Debian has a script checkarray that they cron weekly (I'm told) that simply calls,

echo check > /sys/block/mdX/md/sync_action

,for each of the Software RAID's.


See:
http://www.gentoo-wiki.com/HOWTO_Install_on_Software_RAID#Data_Scrubbing
http://www.ashtech.net/~syntax/blog/archives/53-Data-Scrub-with-Linux-RAID-or-Die.html
http://linux-raid.osdl.org/index.php/RAID_Administration


A similar script should probably be added to RH EL and Fedora. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Colin Simpson 2007-10-29 11:24:05 EDT
Any thoughts on this ticket? 
Comment 2 Doug Ledford 2008-06-14 12:52:49 EDT
The check capability is present in rhel5 already, but we don't automatically
initiate check events as those can have negative impacts on both performance and
power consumption.  It is left to the user to initiate an event if they choose.
 I would highly recommend initiating an event prior to any planned modifications
of the array.

However, I can certainly see shipping a cron.weekly script that simply defaults
to off, but can be enabled by the user for exactly this purpose.
Comment 3 RHEL Product and Program Management 2008-07-21 19:11:31 EDT
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 4 Colin Simpson 2008-07-22 05:00:55 EDT
Not so bothered about it making it into a RH minor release, I think it should be
on your radar for a future major release. 

Should I (or can you, as I'm not sure exactly how) put this as a suggestion to
the Fedora team so it may make it into RH release down the line. 

Comment 8 Ruediger Landmann 2009-05-21 01:51:49 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The Linux software raid stack supports data scrubbing (reading disks in the raid array and looking for bad sectors, and when bad sectors are found using information from other disks or from parity to rewrite the bad sectors with good data).  However, the mdadm package did not make use of this functionality.  This package adds a cron job to /etc/cron.weekly to check disks for bad sectors and repair them when found.
Comment 9 Matěj Šusta 2009-07-22 11:01:47 EDT
Small note to relnotes:
- change sectors to blocks
- actual version of script just runs "check", which means that array will be checked whether it's consistent, but nothing will be repaired
Comment 10 Matěj Šusta 2009-07-24 04:34:14 EDT
/me slaps his face, to read better next time, please ignore comment #9
Comment 15 errata-xmlrpc 2009-09-02 07:52:26 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1382.html

Note You need to log in before you can comment on or make changes to this bug.