Hide Forgot
md-raid6's "repair" scan mode is documented as possibly repairing raid6 mismatches amongst the component drives. But this is implemented by assuming that if a verification scan failed, the parity drives must have been both erroneous, and thus the parity drives are rewritten. If the actual bit-rot was on one of the data drives, this then ***propagates that data corruption*** irreversibly. md-raid6 has enough redundancy to correct any one drive's worth of bitrot. "repair" mode should be changed to exploit that redundancy: it should attempt to rewrite exactly the bad areas - maybe even on a byte-by-byte basis - not necessarily the parity drives. md-raid6 has probably enough redundancy to detect two drives' worth of overlapping bitrot errors, which it could signal, and refuse to propagate / make-worse. More than two drives' worth of overlapping errors are probably not reliably diagnosable. This change would make md-raid6 a reasonable defence against bit-rot, even with overlying filesystems that have no data checksumming features, and with normal applications that cannot do error detection/correction on their files.
If you want to see something like this happening, you need to report it against upstream where feature development is actually going on - not against RHEL. Jes
Moving to closed. -Nigel