Bug 223217

Summary: Files on RAID experiencing corruption
Product: [Fedora] Fedora Reporter: Stuart MacDonald <stuartm>
Component: mdadmAssignee: Doug Ledford <dledford>
Status: CLOSED DUPLICATE QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: 6   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-18 14:44:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stuart MacDonald 2007-01-18 14:38:51 UTC
Description of problem:

I have an Asus A8V-SE with a Via VT6420 RAID controller. There are two HDs;
hda (new), hdb (old, possibly dying). There are two other HDs sda and sdb on
the RAID controller, and they are configured as a RAID-1 volume md0. This is
an FC6 install, with kernel 2.6.18-1.2798.fc6 #1 SMP x86_64. All fs are ext3.

I mounted hdb5 and copied my old partition to md0. After a little while I
found a file that had a strange one-bit error in it. Since the old drive is
probably dying of bad sectors, I assumed the fault was there. How to correct?
Make a second copy, diff the two, and if the corruption is random, the diff
should show all the errors, and I can manually correct. So I made a second copy
to hda2, and then diffed** hda2 and md0. This returned a file with a large
number of errors. I manually corrected some errors in one subdir that I needed,
and noticed that all the errors were in the md0 copy, and none in the hda2
copy. So I'm diffing hda2 against hdb5 and so far, no errors at all. This
implies that the errors on md0 were introduced by the RAID somehow.

Version-Release number of selected component (if applicable):

Stock FC6 install.

How reproducible:

The partition is 30 Gb. I've only got this one machine. So, I can't attempt to
reproduce it, but I suspect it's reproducible.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

Files on md0 show one-bit corruption. Somewhere between 200 and 400 such errors
in 30 Gb of files.

Expected results:

Files are corruption-free.

Additional info:

Willing to test. I'd desperately like the RAID to work. I haul very large sets
of files around, and early testing has shown the RAID-1 to cut time by 66%.

Comment 1 Stuart MacDonald 2007-01-18 14:44:21 UTC

*** This bug has been marked as a duplicate of 223216 ***