| Summary: | raid volumes with multiple device failures need a way to do partial allocation | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | ||||||
| Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 6.3 | CC: | agk, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | lvm2-2.02.95-10.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
New feature, no documentation needed.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-06-20 15:01:53 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
This bug hinges on the way suspend is behaving. When an LV is changed, a suspend may need to "preload" additional sub-LV targets. But when the actual suspend of the LV happens, it forgets to suspend the newly pre-loaded LVs. So, when the resume comes along it finds that these new sub-LVs are not suspended like they should be and we get errors. Created attachment 579124 [details]
Partial fix for this bug
This patch solves the first issue with this bug. It detects that it cannot allocate the necessary number of devices and gradually backs-off. It will try for fewer and fewer devices until it succeeds or there is simply no space to allocate a new device from.
The second and most difficult problem with this bug is dealing with how *_missing_* devices are treated.
When device failures happen in a raid device, they do not need to be replaced in order for the device to continue to operate. If there is not activation/deactivation, the array will continue to function, BUT *_missing_* devices (as added by '_add_error_device') will not be present.
When action is taken on an array that has failed devices, as long as all the failed devices are replace, there is no problem. However, if only some of the devices are replace or a simple suspend+resume are issued on the device from within LVM, the suspend will preload the *_missing_* devices and the resume will fail because of their presence.
Here are easy steps to reproduce the problem:
*) compile LVM with the attached patch
1) create vg with 4 devices
2) create raid1 LV with 3 devices
3) wait for sync
4) kill 2 of the 3 devices in the LV
5) run 'lvconvert --repair vg/lv' and select 'y' to replace devices
If you have activated the LV since the device failures (implies you deactivated at some time between steps 4 and 5), then step 5 will work just fine. This is because the *_missing_* devices were loaded during activation. If you are running the repair directly after a failure - like dmeventd would do - then you will have a failure. The failure can leave you RAID array and sub-LVs in a suspended state. ('dmsetup resume' from the bottom up to avoid frustration.)
Created attachment 579127 [details]
Verbose output from a failed run (includes attached patch)
Search phrases that will bring you quickly to the problem areas:
"missing_"
"Suspending"
"Resuming"
"Device or resource busy"
Might need you to talk this through to save my having to work out what you probably already know. Is the line: #ioctl/libdm-iface.c:1705 device-mapper: create ioctl on vg-lv_rimage_1-missing_0_0 failed: Device or resource busy the *first* place where this went wrong? Or should it have done some other operations prior to this? - If so, what did it miss out and where is the *first* place the sequence of ioctls diverges from what would be correct? Do you know yet whether or not all the entries in the deptree are as intended, or are some nodes missing from it or with incorrect state? Do you know what the tables should look like after correct behavior? And what they actually look like after the failure? (dmsetup info -c; dmsetup table; dmsetup table --inactive - dumped before the cmd is run, after failed run, what it would be like if ran correctly) _add_error_device uses dm_tree_add_new_dev() which assumes the device does not already exist - but on the resume code path in the trace it does already exist => failure My first guess is that this (_add_error_device()) needs to be changed to use parts of the _add_dev_to_dtree() code path instead. [Call _info() on it. If it doesn't exist, call dm_tree_add_new_dev() like now. If it does exist, use dm_tree_add_dev() instead.] There might be something to that, but right now, RAID repair doesn't work for me at all, and neither does partial activation of RAID volumes. With: aux prepare_vg 5 lvcreate --type raid1 -m 2 -L 1 -n raid $vg "$dev1" "$dev2" "$dev3" # "$dev4" lvchange -a n --partial $vg/raid aux disable_dev $dev3 # $dev4 lvchange -a y --partial $vg/raid -vvvv I see this: #libdm/libdm-deptree.c:2273 Adding target to (254:20): 0 8192 raid raid1 3 0 region_size 1024 3 254:10 254:13 254:14 254:15 254:17 254:19 #libdm/ioctl/libdm-iface.c:1687 dm table (254:20) OF [16384] (*1) #ibdm/ioctl/libdm-iface.c:1687 dm reload (254:20) NF [16384] (*1) #libdm/ioctl/libdm-iface.c:1705 device-mapper: reload ioctl on failed: Invalid argument All the devices mentioned on that table line do exist, two of them point to an underlying error target (missing). I am not exactly familiar with the raid table format, but it seems consistent with the table that's loaded by normal activation, which works. This is, however, on linux 3.1.10 -- is it possible that this is a kernel bug on my end? (I am upgrading to 3.2.15 in the meantime...) Petr, perhaps you are hitting 0447568fc51e0268e201f7086d2450cf986e0411. [j]$ git tag --contains 0447568fc51e0268e201f7086d2450cf986e0411 v3.4-rc1 v3.4-rc2 A simpler reproducer: Put 2 PVs into a VG Create and activate $LV with 2 stripes. Disable one of the PVs. lvchange --refresh --partial $LV yes, the patch works. I'll review it a bit more closely and run it through the tests suites before checking-in. Sorry I didn't put this in sooner...
commit 50b2d511ecc5177895961e16b215c9fcb84ad80f
Author: Jonathan Earl Brassow <jbrassow>
Date: Tue Apr 24 20:05:31 2012 +0000
Allow a subset of failed devices to be replaced in RAID LVs.
If two devices in an array failed, it was previously impossible to replace
just one of them. This patch allows for the replacement of some, but perhaps
not all, failed devices.
commit 9ac67656ae54cde61184e0f5bda25022a1c1d3c1
Author: Jonathan Earl Brassow <jbrassow>
Date: Tue Apr 24 20:00:03 2012 +0000
Prevent resume from creating error devices that already exist from suspend.
Thanks to agk for providing the patch that prevents resume from attempting
(and then failing) to create error devices which already exist; having been
created by a corresponding suspend operation.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
New feature, no documentation needed.
This doesn't appear to work with the latest rpms.
2.6.32-270.el6.x86_64
lvm2-2.02.95-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
lvm2-libs-2.02.95-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
lvm2-cluster-2.02.95-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
udev-147-2.41.el6 BUILT: Thu Mar 1 13:01:08 CST 2012
device-mapper-1.02.74-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-libs-1.02.74-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-event-1.02.74-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-event-libs-1.02.74-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
cmirror-2.02.95-9.el6 BUILT: Wed May 16 10:34:14 CDT 2012
./black_bird -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.3 -o taft-01 -i 2 -e kill_multiple_synced_raid6_4legs
Scenario kill_multiple_synced_raid6_4legs: Kill multiple legs of synced 4 leg raid6 volume(s)
********* RAID hash info for this scenario *********
* names: synced_multiple_raid6_4legs_1
* sync: 1
* type: raid6
* -m |-i value: 4
* leg devices: /dev/sdf1 /dev/sdc1 /dev/sdg1 /dev/sdh1 /dev/sde1 /dev/sdb1
* failpv(s): /dev/sdc1 /dev/sde1
* failnode(s): taft-01
* raid fault policy: allocate
******************************************************
Creating raids(s) on taft-01...
taft-01: lvcreate --type raid6 -i 4 -n synced_multiple_raid6_4legs_1 -L 500M black_bird /dev/sdf1:0-1000 /dev/sdc1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-1000 /dev/sdb1:0-1000
RAID Structure(s):
LV Attr LSize Copy% Devices
synced_multiple_raid6_4legs_1 rwi-a-r- 512.00m synced_multiple_raid6_4legs_1_rimage_0(0),synced_multiple_raid6_4legs_1_rimage_1(0),synced_multiple_raid6_4legs_1_rimage_2(0),synced_multiple_raid6_4legs_1_rimage_3(0),synced_multiple_raid6_4legs_1_rimage_4(0),synced_multiple_raid6_4legs_1_rimage_5(0)
[synced_multiple_raid6_4legs_1_rimage_0] Iwi-aor- 128.00m /dev/sdf1(1)
[synced_multiple_raid6_4legs_1_rimage_1] Iwi-aor- 128.00m /dev/sdc1(1)
[synced_multiple_raid6_4legs_1_rimage_2] Iwi-aor- 128.00m /dev/sdg1(1)
[synced_multiple_raid6_4legs_1_rimage_3] Iwi-aor- 128.00m /dev/sdh1(1)
[synced_multiple_raid6_4legs_1_rimage_4] Iwi-aor- 128.00m /dev/sde1(1)
[synced_multiple_raid6_4legs_1_rimage_5] Iwi-aor- 128.00m /dev/sdb1(1)
[synced_multiple_raid6_4legs_1_rmeta_0] ewi-aor- 4.00m /dev/sdf1(0)
[synced_multiple_raid6_4legs_1_rmeta_1] ewi-aor- 4.00m /dev/sdc1(0)
[synced_multiple_raid6_4legs_1_rmeta_2] ewi-aor- 4.00m /dev/sdg1(0)
[synced_multiple_raid6_4legs_1_rmeta_3] ewi-aor- 4.00m /dev/sdh1(0)
[synced_multiple_raid6_4legs_1_rmeta_4] ewi-aor- 4.00m /dev/sde1(0)
[synced_multiple_raid6_4legs_1_rmeta_5] ewi-aor- 4.00m /dev/sdb1(0)
* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 1, some allocation should work)
PV=/dev/sde1
synced_multiple_raid6_4legs_1_rimage_4: 1
synced_multiple_raid6_4legs_1_rmeta_4: 1
PV=/dev/sdc1
synced_multiple_raid6_4legs_1_rimage_1: 1
synced_multiple_raid6_4legs_1_rmeta_1: 1
Creating ext on top of mirror(s) on taft-01...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-01...
Writing verification files (checkit) to mirror(s) on...
---- taft-01 ----
Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
---- taft-01 ----
Disabling device sdc on taft-01
Disabling device sde on taft-01
[HANG]
lvm[3047]: Insufficient suitable allocatable extents for logical volume : 66 more required
lvm[3047]: Failed to allocate replacement images for black_bird/synced_multiple_raid6_4legs_1
lvm[3047]: Attempting replacement of 1 devices instead of 2
kernel: device-mapper: raid: Failed to read superblock of device at position 1
kernel: device-mapper: raid: Device 4 specified for rebuild: Clearing superblock
kernel: md/raid:mdX: device dm-14 operational as raid disk 5
kernel: md/raid:mdX: device dm-10 operational as raid disk 3
kernel: md/raid:mdX: device dm-8 operational as raid disk 2
kernel: md/raid:mdX: device dm-4 operational as raid disk 0
kernel: md/raid:mdX: allocated 6384kB
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: md/raid:mdX: raid level 6 active with 4 out of 6 devices, algorithm 8
kernel: created bitmap (1 pages) for device mdX
lvm[3047]: device-mapper: create ioctl on black_bird-synced_multiple_raid6_4legs_1_rimage_1-missing_0_0 failed: Device or resource busy
lvm[3047]: Failed to resume black_bird/synced_multiple_raid6_4legs_1 after committing changes
lvm[3047]: Failed to replace faulty devices in black_bird/synced_multiple_raid6_4legs_1.
lvm[3047]: Repair of RAID device black_bird-synced_multiple_raid6_4legs_1 failed.
lvm[3047]: Failed to process event for black_bird-synced_multiple_raid6_4legs_1
lvm[3047]: No longer monitoring RAID device black_bird-synced_multiple_raid6_4legs_1 for events.
kernel: INFO: task kjournald:6648 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kjournald D 0000000000000001 0 6648 2 0x00000080
kernel: ffff880216bbdc50 0000000000000046 ffff880216bbdc10 ffffffffa000422e
kernel: ffff880216bbdbc0 ffffffff81012bd9 ffff880216bbdc00 ffffffff8109cd39
kernel: ffff880216a845f8 ffff880216bbdfd8 000000000000fb88 ffff880216a845f8
kernel: Call Trace:
kernel: [<ffffffffa000422e>] ? dm_table_unplug_all+0x8e/0x100 [dm_mod]
kernel: [<ffffffff81012bd9>] ? read_tsc+0x9/0x20
kernel: [<ffffffff8109cd39>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff8109cd39>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff811ae860>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff814fdd93>] io_schedule+0x73/0xc0
kernel: [<ffffffff811ae8a0>] sync_buffer+0x40/0x50
kernel: [<ffffffff814fe74f>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffff811ae860>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff814fe7f8>] out_of_line_wait_on_bit+0x78/0x90
kernel: [<ffffffff81092110>] ? wake_bit_function+0x0/0x50
kernel: [<ffffffff811ae856>] __wait_on_buffer+0x26/0x30
kernel: [<ffffffffa049afde>] journal_commit_transaction+0x9ee/0x1310 [jbd]
kernel: [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70
kernel: [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0
kernel: [<ffffffffa04a0bb8>] kjournald+0xe8/0x250 [jbd]
kernel: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffffa04a0ad0>] ? kjournald+0x0/0x250 [jbd]
kernel: [<ffffffff81091d66>] kthread+0x96/0xa0
kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
I've reproduced this on corey's machine by doing the following: 1> vgcreate vg /dev/sd[bcdef]1 2> lvcreate --type raid1 -m 3 -L 100M -n lv vg #> Wait for sync 3> off.sh <2 devices> 4> echo y | lvconvert --repair vg/lv The DM devices are left in a suspended state - which explains the hang. These issues don't happen on my machine. There must be something different between our userspace versions of LVM. Looking at the source for the RPM, it looks like the agk's patch from comment 12 has not been pulled in. That would cause precisely this problem. Moving back to post. Partial allocation now works when multiple device failures occur. Marking this verified in the latest rpms. That said a few issues popped up that required test changes to make these scenarios pass. 1. With partial allocation, we'll now see "Repair of RAID device VG-LV failed" messages, an RFE to fix this should be created. Test will need to now ignore this error. 2. With partial allocation, the test will have to manually restore the failed VG, this is not required with other raid failure scenarios 3. With partial allocation, the test will have to maunally recreate one of the failed PVs (which ever one didn't get restored). 4. another issue with partial allocation, is that -missing devices still remain. (Filed rfe 825023 for this) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0962.html |
Description of problem: If a raid volume experiences multiple device failures (in this case three), but the VG only has two free devices, there's no mechanism to allocate just one or two devices, It's either all three or nothing right now. Scenario kill_multiple_synced_raid1_4legs: Kill multiple legs of synced 4 leg raid1 volume(s) ********* RAID hash info for this scenario ********* * names: synced_multiple_raid1_4legs_1 * sync: 1 * type: raid1 * -m |-i value: 4 * leg devices: /dev/sdf1 /dev/sdb1 /dev/sdh1 /dev/sdd1 /dev/sde1 * failpv(s): /dev/sdh1 /dev/sdb1 /dev/sde1 * failnode(s): taft-01 * raid fault policy: allocate ****************************************************** Creating raids(s) on taft-01... taft-01: lvcreate --type raid1 -m 4 -n synced_multiple_raid1_4legs_1 -L 500M black_bird /dev/sdf1:0-1000 /dev/sdb1:0-1000 /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sde1:0-1000 RAID Structure(s): LV Attr LSize Copy% Devices synced_multiple_raid1_4legs_1 rwi-a-m- 500.00m 0.00 synced_multiple_raid1_4legs_1_rimage_0(0),synced_multiple_raid1_4legs_1_rimage_1(0),synced_multiple_raid1_4legs_1_rimage_2(0),synced_multiple_raid1_4legs_1_rimage_3(0),synced_multiple_raid1_4legs_1_rimage_4(0) [synced_multiple_raid1_4legs_1_rimage_0] Iwi-aor- 500.00m /dev/sdf1(1) [synced_multiple_raid1_4legs_1_rimage_1] Iwi-aor- 500.00m /dev/sdb1(1) [synced_multiple_raid1_4legs_1_rimage_2] Iwi-aor- 500.00m /dev/sdh1(1) [synced_multiple_raid1_4legs_1_rimage_3] Iwi-aor- 500.00m /dev/sdd1(1) [synced_multiple_raid1_4legs_1_rimage_4] Iwi-aor- 500.00m /dev/sde1(1) [synced_multiple_raid1_4legs_1_rmeta_0] ewi-aor- 4.00m /dev/sdf1(0) [synced_multiple_raid1_4legs_1_rmeta_1] ewi-aor- 4.00m /dev/sdb1(0) [synced_multiple_raid1_4legs_1_rmeta_2] ewi-aor- 4.00m /dev/sdh1(0) [synced_multiple_raid1_4legs_1_rmeta_3] ewi-aor- 4.00m /dev/sdd1(0) [synced_multiple_raid1_4legs_1_rmeta_4] ewi-aor- 4.00m /dev/sde1(0) * NOTE: not enough available devices for allocation fault polices to fully work * (well technically, since we have 2, some allocation should work) PV=/dev/sde1 synced_multiple_raid1_4legs_1_rimage_4: 1 synced_multiple_raid1_4legs_1_rmeta_4: 1 PV=/dev/sdh1 synced_multiple_raid1_4legs_1_rimage_2: 1 synced_multiple_raid1_4legs_1_rmeta_2: 1 PV=/dev/sdb1 synced_multiple_raid1_4legs_1_rimage_1: 1 synced_multiple_raid1_4legs_1_rmeta_1: 1 Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 51.54% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Creating ext on top of mirror(s) on taft-01... mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on taft-01... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- Disabling device sdh on taft-01 Disabling device sdb on taft-01 Disabling device sde on taft-01 Attempting I/O to cause mirror down conversion(s) on taft-01 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.504002 s, 83.2 MB/s Verifying current sanity of lvm after the failure RAID Structure(s): /dev/sdb1: read failed after 0 of 512 at 145669554176: Input/output error /dev/sde1: read failed after 0 of 512 at 145669554176: Input/output error /dev/sdh1: read failed after 0 of 512 at 145669554176: Input/output error LV Attr LSize Copy% Devices synced_multiple_raid1_4legs_1 rwi-aom- 500.00m 100.00 synced_multiple_raid1_4legs_1_rimage_0(0),synced_multiple_raid1_4legs_1_rimage_1(0),synced_multiple_raid1_4legs_1_rimage_2(0),synced_multiple_raid1_4legs_1_rimage_3(0),synced_multiple_raid1_4legs_1_rimage_4(0) [synced_multiple_raid1_4legs_1_rimage_0] iwi-aor- 500.00m /dev/sdf1(1) [synced_multiple_raid1_4legs_1_rimage_1] iwi-aor- 500.00m unknown device(1) [synced_multiple_raid1_4legs_1_rimage_2] iwi-aor- 500.00m unknown device(1) [synced_multiple_raid1_4legs_1_rimage_3] iwi-aor- 500.00m /dev/sdd1(1) [synced_multiple_raid1_4legs_1_rimage_4] iwi-aor- 500.00m unknown device(1) [synced_multiple_raid1_4legs_1_rmeta_0] ewi-aor- 4.00m /dev/sdf1(0) [synced_multiple_raid1_4legs_1_rmeta_1] ewi-aor- 4.00m unknown device(0) [synced_multiple_raid1_4legs_1_rmeta_2] ewi-aor- 4.00m unknown device(0) [synced_multiple_raid1_4legs_1_rmeta_3] ewi-aor- 4.00m /dev/sdd1(0) [synced_multiple_raid1_4legs_1_rmeta_4] ewi-aor- 4.00m unknown device(0) Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s) Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s) Verifying FAILED device /dev/sde1 is *NOT* in the volume(s) Verifying IMAGE device /dev/sdf1 *IS* in the volume(s) Verifying IMAGE device /dev/sdd1 *IS* in the volume(s) verify the rimage/rmeta dm devices remain after the failures Checking EXISTENCE and STATE of synced_multiple_raid1_4legs_1_rimage_4 on: taft-01 there should not be an 'unknown' device associated with synced_multiple_raid1_4legs_1_rimage_4 on taft-01 # There are two free devices in this VG (sdg1 and sdc1) Feb 23 14:02:53 taft-01 lvm[1256]: Insufficient suitable allocatable extents for logical volume : 378 more required Feb 23 14:02:53 taft-01 lvm[1256]: Failed to allocate replacement images for black_bird/synced_multiple_raid1_4legs_1 Feb 23 14:02:53 taft-01 lvm[1256]: Failed to replace faulty devices in black_bird/synced_multiple_raid1_4legs_1. Feb 23 14:02:53 taft-01 lvm[1256]: Repair of RAID device black_bird-synced_multiple_raid1_4legs_1 failed. Feb 23 14:02:53 taft-01 lvm[1256]: Failed to process event for black_bird-synced_multiple_raid1_4legs_1 Feb 23 14:02:53 taft-01 lvm[1256]: Device #1 of raid1 array, black_bird-synced_multiple_raid1_4legs_1, has failed. Feb 23 14:02:54 taft-01 lvm[1256]: Insufficient suitable allocatable extents for logical volume : 378 more required Feb 23 14:02:54 taft-01 lvm[1256]: Failed to allocate replacement images for black_bird/synced_multiple_raid1_4legs_1 Feb 23 14:02:54 taft-01 lvm[1256]: Failed to replace faulty devices in black_bird/synced_multiple_raid1_4legs_1. Feb 23 14:02:54 taft-01 lvm[1256]: Repair of RAID device black_bird-synced_multiple_raid1_4legs_1 failed. Feb 23 14:02:54 taft-01 lvm[1256]: Failed to process event for black_bird-synced_multiple_raid1_4legs_1 # A manual attempt fails as well [root@taft-01 ~]# lvconvert --repair black_bird/synced_multiple_raid1_4legs_1 Attempt to replace failed RAID images (requires full device resync)? [y/n]: y Insufficient suitable allocatable extents for logical volume : 378 more required Failed to allocate replacement images for black_bird/synced_multiple_raid1_4legs_1 Failed to replace faulty devices in black_bird/synced_multiple_raid1_4legs_1. Version-Release number of selected component (if applicable): 2.6.32-236.el6.x86_64 lvm2-2.02.93-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 lvm2-libs-2.02.93-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 lvm2-cluster-2.02.93-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.72-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 device-mapper-libs-1.02.72-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 device-mapper-event-1.02.72-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 device-mapper-event-libs-1.02.72-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 cmirror-2.02.93-0.48.el6 BUILT: Thu Feb 23 07:04:40 CST 2012 How reproducible: Everytime