Hide Forgot
There are two ways to create a "raid1" LV: 1) use the lvcreate command to create the whole thing at once 2) use the lvconvert command to "upconvert" from a linear LV The way that initialization happens in RAID can be problematic for #2 if there is a primary device failure during the process. After encountering a failure (read, write, or entire device) in the primary device, the sync process simply continues on using an alternate device to complete the sync. This is odd but not an issue with #1; however, in the case of #2 it can lead to data loss - even if the primary device is not replace and later revived. Imagine three devices. The first device is filled with 'A's, the second with 'B's and the third with 'C's. Before the sync begins, the three disks would look like: A B C <- 2nd half of the disks A B C <- 1st half of the disks Let's say the first device dies at the moment the first half of the addressable space is sync'ed. We would then have the following (where [] signifies the dead device): [A] B C [A] A A The sync process will simply continue, using the next available device after marking the first device as 'failed'. The final result will be: [A] B B [A] A A If the primary device can be revived at this point, it is synchronized according to the rest of the copies, leaving: B B B A A A Again, this is fine for a "raid1" that was created from scratch - the contents are perfectly in-sync. However, if the array was being upconverted from a linear LV, the last half of the original contents is destroyed by the process of recovery when the device is revived. In the case of #2, we would much rather have the recovery process stop when the primary fails - giving the user a chance to revive it (remember, it could have happened due to something as simple as a single read/write error). Once revived, the desire would be to continue syncing from the original LV.
The MD raid1 personality behaves that way in case of multi-legged mirrors (i.e. selecting a new primary when the current one dies). The behavioural change you're requesting ("...to continue syncing from the original LV.") implies to first update any dirty regions on the recurring initial primary leg before restarting the previously interrupted resynchronization from where it got discontinued or any updates would be lost. That wouldn't work though, because regions aren't mapped 1:1 to io payload sizes and offsets and in turn will typically not fully written over, thus replacing parts of regions on the recurred primary leg with uninitialized data causing data corruption. To compensate that fact, we'd need a finer grained write intend bitmap (i.e. tiny region size) to make sure the whole region gets updated in this situation which imposes overhead and scalability issues on large linear LVs being up converted because of bitmap size limits.
It is important to note, that an initially synchronizing up converted linear -> raid1 LV isn't any better with respect to resilience right after the conversion than the previous linear one. Initial sync in this conversion just causes the resilience ratio to grow to 100% over time. We may only be able to work around the fact of a transiently failing primary leg (the previous linear LV containing user data) with a solution along the lines of comment #1.
this bug will be deferred to 7.5, but needs a release note.
Fixed in RHEL7.4 Fixed by: ddb14b6 lvconvert: Disallow removal of primary when up-converting (recovering) 4c0e908 RAID (lvconvert/dmeventd): Cleanly handle primary failure during 'recover' op d34d206 lvconvert: Don't require a 'force' option during RAID repair. c87907d lvconvert: linear -> raid1 upconvert should cause "recover" not "resync"