Bug 748908 - RFE: LVM RAID - Support RAID device replacement
RFE: LVM RAID - Support RAID device replacement
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: FutureFeature
Depends On: 748890
Blocks: 732458 756082
  Show dependency treegraph
 
Reported: 2011-10-25 10:34 EDT by Jonathan Earl Brassow
Modified: 2012-08-27 10:54 EDT (History)
10 users (show)

See Also:
Fixed In Version: lvm2-2.02.95-1.el6
Doc Type: Enhancement
Doc Text:
New Feature to 6.3. No documentation required. Bug 732458 is the bug that requires a release note for the RAID features. Other documentation is found in the LVM manual. Operational bugs need no documentation because they are being fixed before their initial release.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 11:00:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Jonathan Earl Brassow 2011-10-25 10:34:24 EDT
Support the ability to replace specific devices in a RAID array.

RAID is not like traditional LVM mirroring.  LVM mirroring required failed devices to be removed or the logical volume would simply hang.  RAID arrays can keep on running with failed devices.  In fact, for RAID types other than RAID1, removing a device would mean substituting an error target or converting to a lower level RAID (e.g. RAID6 -> RAID5, or RAID4/5 to RAID0).  Therefore, rather than removing a failed device unconditionally and potentially allocating a replacement, RAID allows the user to "replace" a device with a new one.  This approach is a 1-step solution vs the current 2-step solution.

example> lvconvert --replace <dev to remove> vg/lv [possible replacements]
Comment 1 Jonathan Earl Brassow 2011-10-25 10:51:20 EDT
Release criteria (test requirements):

1) Ability to replace a device in an array
[root@bp-01 LVM2]# lvcreate --type raid1 -m2 -L 1G -n lv vg
  Logical volume "lv" created
[root@bp-01 LVM2]# devices vg
  LV            Copy%  Devices                                     
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
  [lv_rimage_0]        /dev/sdb1(1)                                
  [lv_rimage_1]        /dev/sdb2(1)                                
  [lv_rimage_2]        /dev/sdc1(1)                                
  [lv_rmeta_0]         /dev/sdb1(0)                                
  [lv_rmeta_1]         /dev/sdb2(0)                                
  [lv_rmeta_2]         /dev/sdc1(0)                                
[root@bp-01 LVM2]# lvconvert --replace /dev/sdb2 vg/lv
[root@bp-01 LVM2]# devices vg
  LV            Copy%  Devices                                     
  lv             37.50 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
  [lv_rimage_0]        /dev/sdb1(1)                                
  [lv_rimage_1]        /dev/sdc2(1)                                
  [lv_rimage_2]        /dev/sdc1(1)                                
  [lv_rmeta_0]         /dev/sdb1(0)                                
  [lv_rmeta_1]         /dev/sdc2(0)                                
  [lv_rmeta_2]         /dev/sdc1(0)                                

2) Device being rebuilt should be sync'ed properly.  See in #1 how after the
convert the 'Copy%' reflects that the device is being rebuilt.  You can even
tell which specific device by checking 'dmsetup status' and looking for the small 'a', which means "(a)live" but resyncing.  ('A' means "(A)live" and synced.)
[root@bp-01 LVM2]# dmsetup status vg-lv
0 2097152 raid raid1 3 AaA 524800/2097152

3) Check the integrity of the replacement by writing a pattern to a two-way RAID1, replacing one device, then replacing the other, and verifying the pattern.

4) Test the ability to specify a replacement device:
example> lvconvert --replace <old PV> vg/lv <new PV>

5) Try replacing more than one device at a time by specifying multiple 'replace' arguments.  This should work for n-1 devices of RAID1, 2 devices of RAID6, and 1 device for RAID 4/5.
example> lvconvert --replace <old PV1> --replace <old PV2> vg/lv

6) It should be forbidden to replace devices while the array is not in-sync.

7) Replacement drives should never be allocated from extra space on drives already used in the array.  IOW, lv_rimage_0 and lv_rimage_1 should not be located on the same PV.
Comment 3 Jonathan Earl Brassow 2011-11-29 21:46:30 EST
Feature checked in upstream - version 2.02.89

Git commit id: 02941f999ce0f8fa68b923f13cd48219db1fbab6
Comment 4 Corey Marthaler 2011-12-15 19:05:42 EST
Adding QA ack for 6.3.
Comment 8 Jonathan Earl Brassow 2012-04-23 14:27:29 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
New Feature to 6.3.  No documentation required.

Bug 732458 is the bug that requires a release note for the RAID features.  Other documentation is found in the LVM manual.

Operational bugs need no documentation because they are being fixed before their initial release.
Comment 9 Corey Marthaler 2012-04-23 19:10:08 EDT
Feature verified in the latest rpms.

lvm2-2.02.95-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012
lvm2-libs-2.02.95-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012
lvm2-cluster-2.02.95-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012
device-mapper-libs-1.02.74-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012
device-mapper-event-1.02.74-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012
device-mapper-event-libs-1.02.74-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012
cmirror-2.02.95-5.el6    BUILT: Thu Apr 19 10:29:01 CDT 2012

./raid_shuffle -o taft-01 -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.3 -i 20
[...]
=== Iteration 20 of 20 started on taft-01 at Mon Apr 23 17:46:14 CDT 2012 ===
INUSE PVS IN VG:
        /dev/sdf2 /dev/sdg1 /dev/sdg2 /dev/sdh1 /dev/sdh2
NOT INUSE PVS IN VG:
        /dev/sde2 /dev/sdf1
FREE PVS: /dev/sdc2 /dev/sdd1 /dev/sdd2 /dev/sde1

Adding /dev/sde1 to volume group
vgextend shuffle /dev/sde1

Moving data (replacing raid image) from /dev/sdg1 to /dev/sde1 on taft-01
lvconvert --replace /dev/sdg1 shuffle/raid /dev/sde1
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 78.02% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Verify the device moved from /dev/sdg1 is no longer present
Verify the device moved to /dev/sde1 is present
Checking files on /mnt/raid
/usr/tests/sts-rhel6.3/bin/checkit -w /mnt/raid -f /tmp/raid_shuffleA.15500 -v
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/raid_shuffleA.15500
Working dir:        /mnt/raid


Removing /dev/sde2 from volume group
Comment 11 errata-xmlrpc 2012-06-20 11:00:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html

Note You need to log in before you can comment on or make changes to this bug.