Bug 843546
Summary: | LVM RAID: unable to add PV to VG when no spares exist during RAID LV device failure | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jonathan Earl Brassow <jbrassow> |
Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | 6.3 | CC: | agk, cmarthal, coughlan, dwysocha, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, slevine, thornber, zkabelac |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.97-2.el6 | Doc Type: | Bug Fix |
Doc Text: |
Previously, it was impossible to add a physical volume to a volume group if a device failure occurred in a RAID logical volume and there were no spare devices in the volume group. This meant that it was impossible to replace the failed devices in the RAID logical volume and thus impossible to make the volume group consistent without editing LVM metadata by hand.
It is now possible to add a physical volume to a volume group that has missing or failed devices. It is then possible to replace the failed devices in a RAID logical volume. ('lvconvert --repair <vg>/<LV>' can be used to accomplish this.)
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 08:11:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jonathan Earl Brassow
2012-07-26 15:05:25 UTC
TEST REQUIREMENTS: Simple test: 1) Create a VG (no need to create any LVs) 2) kill a device in the VG 3) Try to add a new PV to the VG using 'vgextend' -- Succeeds if bug is fixed. More complete test: For RAID4, RADI5, and RAID6 do: 1) Create a RAID 4/5/6 LV using all the PVs in the VG 2) wait for sync 3) kill 1 device (or 1 and 2 for RAID6) 4) Attempt to replace failed devices ('lvconvert --repair vg/lv') - should fail 5) vgextend 6) Attempt to replace failed devices -- Should succeed It would also help if when testing RAID6, try adding 1 PV and repairing and then iterate the test again and add two devices. This we show if we can partially repair a RAID6. Adding QA ack for 6.4. Devel will need to provide unit testing results however before this bug can be ultimately verified by QA. commit 186a2772e8ac3c2088bdfc833c32d773464d666b Author: Jonathan Brassow <jbrassow> Date: Thu Jul 26 17:06:06 2012 -0500 vgextend: Allow PVs to be added to VGs that have PVs missing Allowing people to add devices to a VG that has PVs missing helps people avoid the inability to repair RAID LVs in certain cases. For example, if a user creates a RAID 4/5/6 LV using all of the available devices in a VG, there will be no spare devices to repair the LV with if a device should fail. Further, because the VG is missing a device, new devices cannot be added to allow the repair. If 'vgreduce --removemissing' were attempted, the "MISSING" PV could not be removed without also destroying the RAID LV. Allowing vgextend to operate solves the circular dependency. When the PV is added by a vgextend operation, the sequence number is incremented and the 'MISSING' flag is put on the PVs which are missing. Tested with raid1, 4, 5 and 6 WITHOUT lvmetad running acting as expected. here is the output for just raid4 (07:58:27) [root@r6-node02:/var/log]$ lvs -a -o +devices WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert Devices lv_root VolGroup -wi-ao--- 7.54g /dev/vda2(0) lv_swap VolGroup -wi-ao--- 1.97g /dev/vda2(1930) raid4 vg rwi-a-r-- 1.01g 100.00 raid4_rimage_0(0),raid4_rimage_1(0),raid4_rimage_2(0),raid4_rimage_3(0) [raid4_rimage_0] vg iwi-aor-- 344.00m /dev/sda1(1) [raid4_rimage_1] vg iwi-aor-- 344.00m /dev/sdb1(1) [raid4_rimage_2] vg iwi-aor-- 344.00m /dev/sdc1(1) [raid4_rimage_3] vg iwi-aor-- 344.00m /dev/sdd1(1) [raid4_rmeta_0] vg ewi-aor-- 4.00m /dev/sda1(0) [raid4_rmeta_1] vg ewi-aor-- 4.00m /dev/sdb1(0) [raid4_rmeta_2] vg ewi-aor-- 4.00m /dev/sdc1(0) [raid4_rmeta_3] vg ewi-aor-- 4.00m /dev/sdd1(0) (07:58:43) [root@r6-node02:/var/log]$ echo "offline" > /sys/block/sdb/device/state (07:59:00) [root@r6-node02:/var/log]$ lvconvert --repair vg/raid4 WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. /dev/sdb1: read failed after 0 of 1024 at 10733879296: Input/output error /dev/sdb1: read failed after 0 of 1024 at 10733948928: Input/output error /dev/sdb1: read failed after 0 of 1024 at 0: Input/output error /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid mZat3l-EcEJ-dnKe-sdoz-3e7t-PSge-NfQbj6. Attempt to replace failed RAID images (requires full device resync)? [y/n]: y Insufficient suitable allocatable extents for logical volume : 87 more required Failed to allocate replacement images for vg/raid4 Failed to replace faulty devices in vg/raid4. Adding a device and trying again (07:59:31) [root@r6-node02:/var/log]$ vgextend vg /dev/sde1 WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. /dev/sdb1: read failed after 0 of 1024 at 10733879296: Input/output error /dev/sdb1: read failed after 0 of 1024 at 10733948928: Input/output error /dev/sdb1: read failed after 0 of 1024 at 0: Input/output error /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid mZat3l-EcEJ-dnKe-sdoz-3e7t-PSge-NfQbj6. Volume group "vg" successfully extended (08:00:34) [root@r6-node02:/var/log]$ lvconvert --repair vg/raid4 WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning. /dev/sdb1: read failed after 0 of 1024 at 10733879296: Input/output error /dev/sdb1: read failed after 0 of 1024 at 10733948928: Input/output error /dev/sdb1: read failed after 0 of 1024 at 0: Input/output error /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid mZat3l-EcEJ-dnKe-sdoz-3e7t-PSge-NfQbj6. Attempt to replace failed RAID images (requires full device resync)? [y/n]: y Faulty devices in vg/raid4 successfully replaced. Marking this VERIFIED with: lvm2-2.02.98-6.el6.x86_64 lvm2-libs-2.02.98-6.el6.x86_64 device-mapper-1.02.77-6.el6.x86_64 device-mapper-libs-1.02.77-6.el6.x86_64 Without the use of lvm2-lvmetad. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |