If a user creates a RAID LV that is composed of all of the available PVs in a VG and a device fails, there is no way to repair it. To repair the LV requires an extra PV, but there are no spares and you cannot use 'vgextend' to add a spare since there are missing/failed PVs. Using 'vgreduce --removemissing' threatens to remove any "partial" LVs - including the RAID LV we are trying to fix. The simplest and best solution is to allow vgextend when there are PVs missing from a VG.
TEST REQUIREMENTS: Simple test: 1) Create a VG (no need to create any LVs) 2) kill a device in the VG 3) Try to add a new PV to the VG using 'vgextend' -- Succeeds if bug is fixed. More complete test: For RAID4, RADI5, and RAID6 do: 1) Create a RAID 4/5/6 LV using all the PVs in the VG 2) wait for sync 3) kill 1 device (or 1 and 2 for RAID6) 4) Attempt to replace failed devices ('lvconvert --repair vg/lv') - should fail 5) vgextend 6) Attempt to replace failed devices -- Should succeed It would also help if when testing RAID6, try adding 1 PV and repairing and then iterate the test again and add two devices. This we show if we can partially repair a RAID6.
Adding QA ack for 6.4. Devel will need to provide unit testing results however before this bug can be ultimately verified by QA.
commit 186a2772e8ac3c2088bdfc833c32d773464d666b Author: Jonathan Brassow <jbrassow> Date: Thu Jul 26 17:06:06 2012 -0500 vgextend: Allow PVs to be added to VGs that have PVs missing Allowing people to add devices to a VG that has PVs missing helps people avoid the inability to repair RAID LVs in certain cases. For example, if a user creates a RAID 4/5/6 LV using all of the available devices in a VG, there will be no spare devices to repair the LV with if a device should fail. Further, because the VG is missing a device, new devices cannot be added to allow the repair. If 'vgreduce --removemissing' were attempted, the "MISSING" PV could not be removed without also destroying the RAID LV. Allowing vgextend to operate solves the circular dependency. When the PV is added by a vgextend operation, the sequence number is incremented and the 'MISSING' flag is put on the PVs which are missing.
Tested with raid1, 4, 5 and 6 WITHOUT lvmetad running acting as expected. here is the output for just raid4 (07:58:27) [root@r6-node02:/var/log]$ lvs -a -o +devices WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert Devices lv_root VolGroup -wi-ao--- 7.54g /dev/vda2(0) lv_swap VolGroup -wi-ao--- 1.97g /dev/vda2(1930) raid4 vg rwi-a-r-- 1.01g 100.00 raid4_rimage_0(0),raid4_rimage_1(0),raid4_rimage_2(0),raid4_rimage_3(0) [raid4_rimage_0] vg iwi-aor-- 344.00m /dev/sda1(1) [raid4_rimage_1] vg iwi-aor-- 344.00m /dev/sdb1(1) [raid4_rimage_2] vg iwi-aor-- 344.00m /dev/sdc1(1) [raid4_rimage_3] vg iwi-aor-- 344.00m /dev/sdd1(1) [raid4_rmeta_0] vg ewi-aor-- 4.00m /dev/sda1(0) [raid4_rmeta_1] vg ewi-aor-- 4.00m /dev/sdb1(0) [raid4_rmeta_2] vg ewi-aor-- 4.00m /dev/sdc1(0) [raid4_rmeta_3] vg ewi-aor-- 4.00m /dev/sdd1(0) (07:58:43) [root@r6-node02:/var/log]$ echo "offline" > /sys/block/sdb/device/state (07:59:00) [root@r6-node02:/var/log]$ lvconvert --repair vg/raid4 WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. /dev/sdb1: read failed after 0 of 1024 at 10733879296: Input/output error /dev/sdb1: read failed after 0 of 1024 at 10733948928: Input/output error /dev/sdb1: read failed after 0 of 1024 at 0: Input/output error /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid mZat3l-EcEJ-dnKe-sdoz-3e7t-PSge-NfQbj6. Attempt to replace failed RAID images (requires full device resync)? [y/n]: y Insufficient suitable allocatable extents for logical volume : 87 more required Failed to allocate replacement images for vg/raid4 Failed to replace faulty devices in vg/raid4. Adding a device and trying again (07:59:31) [root@r6-node02:/var/log]$ vgextend vg /dev/sde1 WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. /dev/sdb1: read failed after 0 of 1024 at 10733879296: Input/output error /dev/sdb1: read failed after 0 of 1024 at 10733948928: Input/output error /dev/sdb1: read failed after 0 of 1024 at 0: Input/output error /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid mZat3l-EcEJ-dnKe-sdoz-3e7t-PSge-NfQbj6. Volume group "vg" successfully extended (08:00:34) [root@r6-node02:/var/log]$ lvconvert --repair vg/raid4 WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. /dev/sdb1: read failed after 0 of 1024 at 10733879296: Input/output error /dev/sdb1: read failed after 0 of 1024 at 10733948928: Input/output error /dev/sdb1: read failed after 0 of 1024 at 0: Input/output error /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid mZat3l-EcEJ-dnKe-sdoz-3e7t-PSge-NfQbj6. Attempt to replace failed RAID images (requires full device resync)? [y/n]: y Faulty devices in vg/raid4 successfully replaced. Marking this VERIFIED with: lvm2-2.02.98-6.el6.x86_64 lvm2-libs-2.02.98-6.el6.x86_64 device-mapper-1.02.77-6.el6.x86_64 device-mapper-libs-1.02.77-6.el6.x86_64 Without the use of lvm2-lvmetad.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html