Bug 109251

Summary: Kickstart raid5 parity not properly initialized
Product: [Retired] Red Hat Linux Reporter: Hrunting Johnson <hrunting>
Component: anacondaAssignee: Jeremy Katz <katzj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: mingo
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-05 03:34:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to force raid5 initializations to resync none

Description Hrunting Johnson 2003-11-06 03:58:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a)
Gecko/20030811 Mozilla Firebird/0.6.1

Description of problem:
Using a kickstart installation file, one can set up a simple raid5
across many drives, with no spares.  This raid5 configuration goes
awfully fast.  It turns out that the parity information isn't
initialized on the disks.  When the first disk fails out, either
manually, or because of a disk error, the array is almost always
corrupt.  If you recreate the array using mdadm, the very first thing
the raid does is do a parity reconstruct (or resync, if you use
--force).  Fail out a job from that array, and things are good.

Version-Release number of selected component (if applicable):
anaconda-9.0-4

How reproducible:
Always

Steps to Reproduce:
1. create a kickstart installation file that sets up a raid5 array
2. install system using kickstart file
3. use raidsetfaulty to fail out one disk from raid5
4. umount and fsck the raid array
    

Actual Results:  Corruption on the filesystem, numerous warnings about
trying to access beyond the end of the raid array, etc.

Expected Results:  Raid operates just fine in degraded mode,
filesystem integrity is preserved.

Additional info:

Comment 1 Hrunting Johnson 2003-11-06 18:24:43 UTC
Created attachment 95767 [details]
Patch to force raid5 initializations to resync

Comment 3 Hrunting Johnson 2003-11-06 22:55:25 UTC
*** Bug 108613 has been marked as a duplicate of this bug. ***

Comment 4 Hrunting Johnson 2004-03-24 21:10:39 UTC
Has anyone had a chance to review this?  This affects all versions of
Anaconda I've used, from Redhat 9 up to Fedora Core 2 Test 1.  I
supplied a patch, but this is fairly serious.  Anyone who creates a
RAID5 filesystem with Anaconda has a ticking time-bomb causing
immediate data loss if they lose one drive.  The patch is simple.  For
RAID5 filesystems, don't enable the (aptly named)
'--dangerous-no-resync' option.

For RAID5, unless you wipe the disks first (ie. put them into a known
state by writing all 0's or all 1's to them), you MUST do an initial
parity sync.  Otherwise, you have no idea what your initial parity
state is and when the drive fails, you're essentially reconstructing
the bits from possibly/probably/almost-always unknown (ie. wrong)
parity data.  In my experience, it aways leads to corruption.

Comment 5 Jeremy Katz 2004-10-05 03:34:22 UTC
We're using mdadm now which does this automatically.