Bug 1325654 - raid-6 bit-rot detection & repair
Summary: raid-6 bit-rot detection & repair
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Nigel Croxon
QA Contact: guazhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-10 12:08 UTC by Frank Ch. Eigler
Modified: 2018-05-24 17:22 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-05-02 15:51:42 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Frank Ch. Eigler 2016-04-10 12:08:48 UTC
md-raid6's "repair" scan mode is documented as possibly repairing raid6 mismatches amongst the component drives.  But this is implemented by assuming that if a verification scan failed, the parity drives must have been both erroneous, and thus the parity drives are rewritten.  If the actual bit-rot was on one of the data drives, this then ***propagates that data corruption*** irreversibly.

md-raid6 has enough redundancy to correct any one drive's worth of bitrot.  "repair" mode should be changed to exploit that redundancy: it should attempt to rewrite exactly the bad areas - maybe even on a byte-by-byte basis - not necessarily the parity drives.

md-raid6 has probably enough redundancy to detect two drives' worth of overlapping bitrot errors, which it could signal, and refuse to propagate / make-worse.  More than two drives' worth of overlapping errors are probably not reliably diagnosable.

This change would make md-raid6 a reasonable defence against bit-rot, even with overlying filesystems that have no data checksumming features, and with normal applications that cannot do error detection/correction on their files.

Comment 2 Jes Sorensen 2016-04-11 12:52:05 UTC
If you want to see something like this happening, you need to report it against
upstream where feature development is actually going on - not against RHEL.

Jes

Comment 6 Nigel Croxon 2017-05-02 15:51:42 UTC
Moving to closed. 

-Nigel


Note You need to log in before you can comment on or make changes to this bug.