From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030811 Mozilla Firebird/0.6.1 Description of problem: Using a kickstart installation file, one can set up a simple raid5 across many drives, with no spares. This raid5 configuration goes awfully fast. It turns out that the parity information isn't initialized on the disks. When the first disk fails out, either manually, or because of a disk error, the array is almost always corrupt. If you recreate the array using mdadm, the very first thing the raid does is do a parity reconstruct (or resync, if you use --force). Fail out a job from that array, and things are good. Version-Release number of selected component (if applicable): anaconda-9.0-4 How reproducible: Always Steps to Reproduce: 1. create a kickstart installation file that sets up a raid5 array 2. install system using kickstart file 3. use raidsetfaulty to fail out one disk from raid5 4. umount and fsck the raid array Actual Results: Corruption on the filesystem, numerous warnings about trying to access beyond the end of the raid array, etc. Expected Results: Raid operates just fine in degraded mode, filesystem integrity is preserved. Additional info:
Created attachment 95767 [details] Patch to force raid5 initializations to resync
*** Bug 108613 has been marked as a duplicate of this bug. ***
Has anyone had a chance to review this? This affects all versions of Anaconda I've used, from Redhat 9 up to Fedora Core 2 Test 1. I supplied a patch, but this is fairly serious. Anyone who creates a RAID5 filesystem with Anaconda has a ticking time-bomb causing immediate data loss if they lose one drive. The patch is simple. For RAID5 filesystems, don't enable the (aptly named) '--dangerous-no-resync' option. For RAID5, unless you wipe the disks first (ie. put them into a known state by writing all 0's or all 1's to them), you MUST do an initial parity sync. Otherwise, you have no idea what your initial parity state is and when the drive fails, you're essentially reconstructing the bits from possibly/probably/almost-always unknown (ie. wrong) parity data. In my experience, it aways leads to corruption.
We're using mdadm now which does this automatically.