Bug 474436
Summary: | FutureFeature Include a script to do Data Scrubbing on Software RAID | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Colin.Simpson |
Component: | mdadm | Assignee: | Doug Ledford <dledford> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 11 | CC: | dledford, jbs |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-06-26 16:15:12 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Colin.Simpson
2008-12-03 20:27:34 UTC
check doesn't actually fix anything, and unless you go looking in /sys/block/md* for the mismatch count, errors still exist and are not repaired. I added a cron job to the cron.weekly directory that will run a repair operation on all active md raid arrays at the time the cron job is run. This made it into mdadm-3.0-0.devel3.1.fc11. Is there any danger doing a "repair" by default rather than a "check"? Esp if a RAID 1? Rather than just letting it report via "mdadm --monitor". I'm assuming most failures will be a bad block on one disk so the "repair" will try to rewrite the bad block on the bad disk only (and will get a block remap on the disk that failed). And the random pick of a disk with valid data caused by an inconsistent array will be a very rare case. Or haven't I explained well. From reading commends from the raid developers I don't think repair should be done automatically. The problem is that there is no way for the RAID subsystem to know which of the blocks is the correct one and may overwrite the good data with bad. That sort of recovery, if necessary, should be a manual process. In some cases it might be better to restore the affected files from backup. However, even "check" will trigger the normal RAID bad block handling when the read fails (bad block handling meaning recover the data from the other drives and write it back to the drive that failed read). So even the safer "check" has useful scrubbing behavior. Before adding a script to do an automatic "repair" I would talk to the raid developers This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Now it looks like this is in Fedora 11, i.e the script /etc/cron.weekly/raid-check I'd imagine this bug can close. Thanks for putting this in It is, and it's a check instead of a repair as you request. So, yeah, I'll close this out. |