Red Hat Bugzilla – Bug 109251
Kickstart raid5 parity not properly initialized
Last modified: 2007-04-18 12:59:11 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a)
Gecko/20030811 Mozilla Firebird/0.6.1
Description of problem:
Using a kickstart installation file, one can set up a simple raid5
across many drives, with no spares. This raid5 configuration goes
awfully fast. It turns out that the parity information isn't
initialized on the disks. When the first disk fails out, either
manually, or because of a disk error, the array is almost always
corrupt. If you recreate the array using mdadm, the very first thing
the raid does is do a parity reconstruct (or resync, if you use
--force). Fail out a job from that array, and things are good.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create a kickstart installation file that sets up a raid5 array
2. install system using kickstart file
3. use raidsetfaulty to fail out one disk from raid5
4. umount and fsck the raid array
Actual Results: Corruption on the filesystem, numerous warnings about
trying to access beyond the end of the raid array, etc.
Expected Results: Raid operates just fine in degraded mode,
filesystem integrity is preserved.
Created attachment 95767 [details]
Patch to force raid5 initializations to resync
*** Bug 108613 has been marked as a duplicate of this bug. ***
Has anyone had a chance to review this? This affects all versions of
Anaconda I've used, from Redhat 9 up to Fedora Core 2 Test 1. I
supplied a patch, but this is fairly serious. Anyone who creates a
RAID5 filesystem with Anaconda has a ticking time-bomb causing
immediate data loss if they lose one drive. The patch is simple. For
RAID5 filesystems, don't enable the (aptly named)
For RAID5, unless you wipe the disks first (ie. put them into a known
state by writing all 0's or all 1's to them), you MUST do an initial
parity sync. Otherwise, you have no idea what your initial parity
state is and when the drive fails, you're essentially reconstructing
the bits from possibly/probably/almost-always unknown (ie. wrong)
parity data. In my experience, it aways leads to corruption.
We're using mdadm now which does this automatically.