Hide Forgot
Description of problem: When running revolution tests, sometimes LVM refuses to do a lvconvert to compensate for a missing device. Sometimes the resulting failure locks up lvm so no other lvm commands can be executed. Here's the situation: Current LV layout (before disabling the device: /dev/sdh): LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert Devices mirror_1 revolution_9 cwi-aom--- 2.00g 100.00 mirror_1_mimagetmp_2 mirror_1_mimagetmp_2(0),mirror_1_mimage_2(0) [mirror_1_mimage_0] revolution_9 iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mimage_1] revolution_9 iwi-aom--- 2.00g /dev/sde1(0) [mirror_1_mimage_2] revolution_9 iwi-aom--- 2.00g /dev/sdg1(0) [mirror_1_mimagetmp_2] revolution_9 mwi-aom--- 2.00g mirror_1_mlog 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0) [mirror_1_mlog] revolution_9 lwi-aom--- 4.00m /dev/sdh1(0) root rhel_virt-129 -wi-ao---- 5.48g /dev/vda2(520) swap rhel_virt-129 -wi-ao---- 2.03g /dev/vda2(0) Disabling the device, snip from /var/log/messages: Sep 2 14:34:24 virt-129 qarshd[20848]: Running cmdline: echo offline > /sys/block/sdh/device/state & Sep 2 14:34:26 virt-129 kernel: [ 4928.773183] sd 7:0:0:1: rejecting I/O to offline device Sep 2 14:34:26 virt-129 lvm[2070]: Log device 253:2 has failed (D). Sep 2 14:34:26 virt-129 lvm[2070]: Device failure in revolution_9-mirror_1_mimagetmp_2. Sep 2 14:34:26 virt-129 lvm[2070]: Names including "_mimage" are reserved. Please choose a different LV name. Sep 2 14:34:26 virt-129 kernel: [ 4928.782085] sd 7:0:0:1: rejecting I/O to offline device Sep 2 14:34:26 virt-129 lvm[2070]: Run `lvconvert --help' for more information. Sep 2 14:34:26 virt-129 lvm[2070]: Repair of mirrored device revolution_9-mirror_1_mimagetmp_2 failed. Sep 2 14:34:26 virt-129 lvm[2070]: Failed to remove faulty devices in revolution_9-mirror_1_mimagetmp_2. Sep 2 14:34:28 virt-129 kernel: [ 4930.005594] sd 7:0:0:1: rejecting I/O to offline device Sep 2 14:34:28 virt-129 kernel: [ 4930.011186] sd 7:0:0:1: rejecting I/O to offline device [root@virt-129 ~]# lvs -a -o +devices Couldn't find device with uuid dpbIKv-vhLK-Q6wX-f3MH-LvFi-pJrU-iFoRbX. LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert Devices mirror_1 revolution_9 cwi-aom-p- 2.00g 100.00 mirror_1_mimagetmp_2 mirror_1_mimagetmp_2(0),mirror_1_mimage_2(0) [mirror_1_mimage_0] revolution_9 iwi-aom--- 2.00g /dev/sdd1(0) [mirror_1_mimage_1] revolution_9 iwi-aom--- 2.00g /dev/sde1(0) [mirror_1_mimage_2] revolution_9 iwi-aom--- 2.00g /dev/sdg1(0) [mirror_1_mimagetmp_2] revolution_9 mwi-aom-p- 2.00g mirror_1_mlog 100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0) [mirror_1_mlog] revolution_9 lwi-aom-p- 4.00m unknown device(0) root rhel_virt-129 -wi-ao---- 5.48g /dev/vda2(520) swap rhel_virt-129 -wi-ao---- 2.03g /dev/vda2(0) [root@virt-129 ~]# dmsetup ls revolution_9-mirror_1 (253:6) rhel_virt--129-swap (253:0) rhel_virt--129-root (253:1) revolution_9-mirror_1_mimagetmp_2 (253:5) revolution_9-mirror_1_mlog (253:2) revolution_9-mirror_1_mimage_2 (253:7) revolution_9-mirror_1_mimage_1 (253:4) revolution_9-mirror_1_mimage_0 (253:3) [root@virt-129 ~]# dmsetup status revolution_9-mirror_1: 0 4194304 mirror 2 253:5 253:7 4096/4096 1 AA 1 core rhel_virt--129-swap: 0 4259840 linear rhel_virt--129-root: 0 11485184 linear revolution_9-mirror_1_mimagetmp_2: 0 4194304 mirror 2 253:3 253:4 4096/4096 1 AA 3 disk 253:2 D revolution_9-mirror_1_mlog: 0 8192 linear revolution_9-mirror_1_mimage_2: 0 4194304 linear revolution_9-mirror_1_mimage_1: 0 4194304 linear revolution_9-mirror_1_mimage_0: 0 4194304 linear The lvconvert is issued by LVM itself, the test did not try to convert anything at this point, it just turned off a device (/dev/sdh). Version-Release number of selected component (if applicable): lvm2-2.02.101-0.140.el7 How reproducible: Sometimes. Expected results: down-convert should happen without lvm denying itself. Additional info: use_lvmetad is 0 mirror_segtype_default = "mirror" raid10_segtype_default = "mirror" mirror_log_fault_policy = "remove" mirror_image_fault_policy = "remove"
simple way to reproduce: # Create mirror [root@bp-02 ~]# lvcreate --type mirror -m1 -L 500M -n lv vg Logical volume "lv" created # Upconvert, but kill polling process before it gets to 100% # This prevents the mirror from removing the temporary layer [root@bp-02 ~]# lvconvert -m +1 vg/lv vg/lv: Converted: 2.4% ^C # Wait for 100% sync (makes it easier to avoid dmeventd triggering) [root@bp-02 ~]# devices vg LV Attr Cpy%Sync Devices lv cwi-a-m--- 100.00 lv_mimagetmp_2(0),lv_mimage_2(0) [lv_mimage_0] iwi-aom--- /dev/sdb1(0) [lv_mimage_1] iwi-aom--- /dev/sdc1(0) [lv_mimage_2] iwi-aom--- /dev/sdd1(0) [lv_mimagetmp_2] mwi-aom--- 100.00 lv_mimage_0(0),lv_mimage_1(0) [lv_mlog] lwi-aom--- /dev/sdi1(0) # Kill device [root@bp-02 ~]# off.sh sdi Turning off sdi # command fails. [root@bp-02 ~]# lvconvert --repair vg/lv_mimagetmp_2 Names including "_mimage" are reserved. Please choose a different LV name. Run `lvconvert --help' for more information.
Final command in comment 2 will succeed if the top-most mirror is used as the LV to be repaired. Thus, any device that fails in 'lv_mimagetmp_2' will cause this kind of a failure. The solution is to make dmeventd realize that it must repair 'lv' and not its sub-LV, 'lv_mimagetmp_2'. The code already avoids calling repair on *_mlog, so this shouldn't be too difficult.
Fix checked-in upstream: commit 7de533ad12972f5a9c5bf2d2b477d8320f7e4a8e Author: Jonathan Brassow <jbrassow> Date: Fri Nov 8 09:52:00 2013 -0600 mirror: Handle failures in tmp mirror used when up-converting. Failures in the temporary mirror used when up-converting cause dmeventd to issue 'lvconvert --repair' on the sub-LV, <lv_name>_mimagetmp_?. The 'lvconvert' command refuses to deal with this sub-LV outright - it expects to be given the name of the top-level LV. So, just like we do with mirrored logs, we strip-off the portion of the name that is not the top-level LV and issue the command on the top-level LV instead.
Sorry, comment 2 was meant to show what dmeventd was doing - which was wrong. The fix was to make dmeventd call 'lvconvert' with the name of the top-level mirror device, not fix the command to accept mirror legs. So, this is the expected behavior as long as you can verify dmeventd (i.e. the original bug) is fixed.
Tested with multiple iterations of revolution tests, especially with 'remove' policy which causes down-convert and did not run into the the issues mentioned in the opening comment. By checking /var/log/messages I could not find LVM trying to repair mimage_tmp anymore. Marking this bug VERIFIED with: lvm2-2.02.105-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014 lvm2-libs-2.02.105-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014 lvm2-cluster-2.02.105-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014 device-mapper-1.02.84-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014 device-mapper-libs-1.02.84-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014 device-mapper-event-1.02.84-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014 device-mapper-event-libs-1.02.84-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014 device-mapper-persistent-data-0.2.8-5.el7 BUILT: Sat Mar 1 02:15:56 CET 2014 cmirror-2.02.105-13.el7 BUILT: Wed Mar 19 11:38:19 CET 2014
This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request.