Red Hat Bugzilla – Bug 748908
RFE: LVM RAID - Support RAID device replacement
Last modified: 2012-08-27 10:54:00 EDT
Support the ability to replace specific devices in a RAID array.
RAID is not like traditional LVM mirroring. LVM mirroring required failed devices to be removed or the logical volume would simply hang. RAID arrays can keep on running with failed devices. In fact, for RAID types other than RAID1, removing a device would mean substituting an error target or converting to a lower level RAID (e.g. RAID6 -> RAID5, or RAID4/5 to RAID0). Therefore, rather than removing a failed device unconditionally and potentially allocating a replacement, RAID allows the user to "replace" a device with a new one. This approach is a 1-step solution vs the current 2-step solution.
example> lvconvert --replace <dev to remove> vg/lv [possible replacements]
Release criteria (test requirements):
1) Ability to replace a device in an array
[root@bp-01 LVM2]# lvcreate --type raid1 -m2 -L 1G -n lv vg
Logical volume "lv" created
[root@bp-01 LVM2]# devices vg
LV Copy% Devices
lv 100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
[root@bp-01 LVM2]# lvconvert --replace /dev/sdb2 vg/lv
[root@bp-01 LVM2]# devices vg
LV Copy% Devices
lv 37.50 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
2) Device being rebuilt should be sync'ed properly. See in #1 how after the
convert the 'Copy%' reflects that the device is being rebuilt. You can even
tell which specific device by checking 'dmsetup status' and looking for the small 'a', which means "(a)live" but resyncing. ('A' means "(A)live" and synced.)
[root@bp-01 LVM2]# dmsetup status vg-lv
0 2097152 raid raid1 3 AaA 524800/2097152
3) Check the integrity of the replacement by writing a pattern to a two-way RAID1, replacing one device, then replacing the other, and verifying the pattern.
4) Test the ability to specify a replacement device:
example> lvconvert --replace <old PV> vg/lv <new PV>
5) Try replacing more than one device at a time by specifying multiple 'replace' arguments. This should work for n-1 devices of RAID1, 2 devices of RAID6, and 1 device for RAID 4/5.
example> lvconvert --replace <old PV1> --replace <old PV2> vg/lv
6) It should be forbidden to replace devices while the array is not in-sync.
7) Replacement drives should never be allocated from extra space on drives already used in the array. IOW, lv_rimage_0 and lv_rimage_1 should not be located on the same PV.
Feature checked in upstream - version 2.02.89
Git commit id: 02941f999ce0f8fa68b923f13cd48219db1fbab6
Adding QA ack for 6.3.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Feature to 6.3. No documentation required.
Bug 732458 is the bug that requires a release note for the RAID features. Other documentation is found in the LVM manual.
Operational bugs need no documentation because they are being fixed before their initial release.
Feature verified in the latest rpms.
lvm2-2.02.95-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
lvm2-libs-2.02.95-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
lvm2-cluster-2.02.95-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
udev-147-2.41.el6 BUILT: Thu Mar 1 13:01:08 CST 2012
device-mapper-1.02.74-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
device-mapper-libs-1.02.74-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
device-mapper-event-1.02.74-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
device-mapper-event-libs-1.02.74-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
cmirror-2.02.95-5.el6 BUILT: Thu Apr 19 10:29:01 CDT 2012
./raid_shuffle -o taft-01 -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.3 -i 20
=== Iteration 20 of 20 started on taft-01 at Mon Apr 23 17:46:14 CDT 2012 ===
INUSE PVS IN VG:
/dev/sdf2 /dev/sdg1 /dev/sdg2 /dev/sdh1 /dev/sdh2
NOT INUSE PVS IN VG:
FREE PVS: /dev/sdc2 /dev/sdd1 /dev/sdd2 /dev/sde1
Adding /dev/sde1 to volume group
vgextend shuffle /dev/sde1
Moving data (replacing raid image) from /dev/sdg1 to /dev/sde1 on taft-01
lvconvert --replace /dev/sdg1 shuffle/raid /dev/sde1
Waiting until all mirror|raid volumes become fully syncd...
0/1 mirror(s) are fully synced: ( 78.02% )
1/1 mirror(s) are fully synced: ( 100.00% )
Verify the device moved from /dev/sdg1 is no longer present
Verify the device moved to /dev/sde1 is present
Checking files on /mnt/raid
/usr/tests/sts-rhel6.3/bin/checkit -w /mnt/raid -f /tmp/raid_shuffleA.15500 -v
checkit starting with:
Verify XIOR Stream: /tmp/raid_shuffleA.15500
Working dir: /mnt/raid
Removing /dev/sde2 from volume group
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.