Bug 1003604 - LVM prevents itself from doing a down-convert (name including _mimage reserved error)
LVM prevents itself from doing a down-convert (name including _mimage reserve...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-02 09:24 EDT by Nenad Peric
Modified: 2014-06-17 21:19 EDT (History)
9 users (show)

See Also:
Fixed In Version: lvm2-2.02.103-8.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:18:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nenad Peric 2013-09-02 09:24:12 EDT
Description of problem:

When running revolution tests, sometimes LVM refuses to do a lvconvert to compensate for a missing device.
Sometimes the resulting failure locks up lvm so no other lvm commands can be executed. 


Here's the situation:

Current LV layout (before disabling the device: /dev/sdh):

  LV                     VG            Attr       LSize Pool Origin Data%  Move Log           Cpy%Sync Convert              Devices
  mirror_1               revolution_9  cwi-aom--- 2.00g                                         100.00 mirror_1_mimagetmp_2 mirror_1_mimagetmp_2(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0]    revolution_9  iwi-aom--- 2.00g                                                                     /dev/sdd1(0)
  [mirror_1_mimage_1]    revolution_9  iwi-aom--- 2.00g                                                                     /dev/sde1(0)
  [mirror_1_mimage_2]    revolution_9  iwi-aom--- 2.00g                                                                     /dev/sdg1(0)
  [mirror_1_mimagetmp_2] revolution_9  mwi-aom--- 2.00g                         mirror_1_mlog   100.00                      mirror_1_mimage_0(0),mirror_1_mimage_1(0)
  [mirror_1_mlog]        revolution_9  lwi-aom--- 4.00m                                                                     /dev/sdh1(0)
  root                   rhel_virt-129 -wi-ao---- 5.48g                                                                     /dev/vda2(520)
  swap                   rhel_virt-129 -wi-ao---- 2.03g                                                                     /dev/vda2(0)


Disabling the device, snip from /var/log/messages:


Sep  2 14:34:24 virt-129 qarshd[20848]: Running cmdline: echo offline > /sys/block/sdh/device/state &
Sep  2 14:34:26 virt-129 kernel: [ 4928.773183] sd 7:0:0:1: rejecting I/O to offline device
Sep  2 14:34:26 virt-129 lvm[2070]: Log device 253:2 has failed (D).
Sep  2 14:34:26 virt-129 lvm[2070]: Device failure in revolution_9-mirror_1_mimagetmp_2.
Sep  2 14:34:26 virt-129 lvm[2070]: Names including "_mimage" are reserved. Please choose a different LV name.
Sep  2 14:34:26 virt-129 kernel: [ 4928.782085] sd 7:0:0:1: rejecting I/O to offline device
Sep  2 14:34:26 virt-129 lvm[2070]: Run `lvconvert --help' for more information.
Sep  2 14:34:26 virt-129 lvm[2070]: Repair of mirrored device revolution_9-mirror_1_mimagetmp_2 failed.
Sep  2 14:34:26 virt-129 lvm[2070]: Failed to remove faulty devices in revolution_9-mirror_1_mimagetmp_2.
Sep  2 14:34:28 virt-129 kernel: [ 4930.005594] sd 7:0:0:1: rejecting I/O to offline device
Sep  2 14:34:28 virt-129 kernel: [ 4930.011186] sd 7:0:0:1: rejecting I/O to offline device


[root@virt-129 ~]# lvs -a -o +devices

  Couldn't find device with uuid dpbIKv-vhLK-Q6wX-f3MH-LvFi-pJrU-iFoRbX.
  LV                     VG            Attr       LSize Pool Origin Data%  Move Log           Cpy%Sync Convert              Devices                                     
  mirror_1               revolution_9  cwi-aom-p- 2.00g                                         100.00 mirror_1_mimagetmp_2 mirror_1_mimagetmp_2(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0]    revolution_9  iwi-aom--- 2.00g                                                        /dev/sdd1(0)                                
  [mirror_1_mimage_1]    revolution_9  iwi-aom--- 2.00g                                                        /dev/sde1(0)                                
  [mirror_1_mimage_2]    revolution_9  iwi-aom--- 2.00g                                                        /dev/sdg1(0)                                
  [mirror_1_mimagetmp_2] revolution_9  mwi-aom-p- 2.00g                         mirror_1_mlog   100.00                      mirror_1_mimage_0(0),mirror_1_mimage_1(0)   
  [mirror_1_mlog]        revolution_9  lwi-aom-p- 4.00m                                                        unknown device(0)                           
  root                   rhel_virt-129 -wi-ao---- 5.48g                                                        /dev/vda2(520)                              
  swap                   rhel_virt-129 -wi-ao---- 2.03g                                                        /dev/vda2(0)                                


[root@virt-129 ~]# dmsetup ls
revolution_9-mirror_1	(253:6)
rhel_virt--129-swap	(253:0)
rhel_virt--129-root	(253:1)
revolution_9-mirror_1_mimagetmp_2	(253:5)
revolution_9-mirror_1_mlog	(253:2)
revolution_9-mirror_1_mimage_2	(253:7)
revolution_9-mirror_1_mimage_1	(253:4)
revolution_9-mirror_1_mimage_0	(253:3)

[root@virt-129 ~]# dmsetup status
revolution_9-mirror_1: 0 4194304 mirror 2 253:5 253:7 4096/4096 1 AA 1 core
rhel_virt--129-swap: 0 4259840 linear 
rhel_virt--129-root: 0 11485184 linear 
revolution_9-mirror_1_mimagetmp_2: 0 4194304 mirror 2 253:3 253:4 4096/4096 1 AA 3 disk 253:2 D
revolution_9-mirror_1_mlog: 0 8192 linear 
revolution_9-mirror_1_mimage_2: 0 4194304 linear 
revolution_9-mirror_1_mimage_1: 0 4194304 linear 
revolution_9-mirror_1_mimage_0: 0 4194304 linear 

The lvconvert is issued by LVM itself, the test did not try to convert anything at this point, it just turned off a device (/dev/sdh).

Version-Release number of selected component (if applicable):

lvm2-2.02.101-0.140.el7

How reproducible:
Sometimes.

Expected results:
down-convert should happen without lvm denying itself. 

Additional info:

use_lvmetad is 0
mirror_segtype_default = "mirror"
raid10_segtype_default = "mirror"
mirror_log_fault_policy = "remove"
mirror_image_fault_policy = "remove"
Comment 2 Jonathan Earl Brassow 2013-11-07 17:17:14 EST
simple way to reproduce:

# Create mirror
[root@bp-02 ~]# lvcreate --type mirror -m1 -L 500M -n lv vg
  Logical volume "lv" created

# Upconvert, but kill polling process before it gets to 100%
# This prevents the mirror from removing the temporary layer
[root@bp-02 ~]# lvconvert -m +1 vg/lv
  vg/lv: Converted: 2.4%
^C

# Wait for 100% sync (makes it easier to avoid dmeventd triggering)
[root@bp-02 ~]# devices vg
  LV               Attr       Cpy%Sync Devices                         
  lv               cwi-a-m---   100.00 lv_mimagetmp_2(0),lv_mimage_2(0)
  [lv_mimage_0]    iwi-aom---          /dev/sdb1(0)                    
  [lv_mimage_1]    iwi-aom---          /dev/sdc1(0)                    
  [lv_mimage_2]    iwi-aom---          /dev/sdd1(0)                    
  [lv_mimagetmp_2] mwi-aom---   100.00 lv_mimage_0(0),lv_mimage_1(0)   
  [lv_mlog]        lwi-aom---          /dev/sdi1(0)                    

# Kill device
[root@bp-02 ~]# off.sh sdi
Turning off sdi

# command fails.
[root@bp-02 ~]# lvconvert --repair vg/lv_mimagetmp_2
  Names including "_mimage" are reserved. Please choose a different LV name.
  Run `lvconvert --help' for more information.
Comment 3 Jonathan Earl Brassow 2013-11-07 17:20:54 EST
Final command in comment 2 will succeed if the top-most mirror is used as the LV to be repaired.

Thus, any device that fails in 'lv_mimagetmp_2' will cause this kind of a failure.

The solution is to make dmeventd realize that it must repair 'lv' and not its sub-LV, 'lv_mimagetmp_2'.  The code already avoids calling repair on *_mlog, so this shouldn't be too difficult.
Comment 4 Jonathan Earl Brassow 2013-11-08 10:57:36 EST
Fix checked-in upstream:

commit 7de533ad12972f5a9c5bf2d2b477d8320f7e4a8e
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Fri Nov 8 09:52:00 2013 -0600

    mirror: Handle failures in tmp mirror used when up-converting.
    
    Failures in the temporary mirror used when up-converting cause dmeventd
    to issue 'lvconvert --repair' on the sub-LV, <lv_name>_mimagetmp_?.  The
    'lvconvert' command refuses to deal with this sub-LV outright - it
    expects to be given the name of the top-level LV.  So, just like we do
    with mirrored logs, we strip-off the portion of the name that is not
    the top-level LV and issue the command on the top-level LV instead.
Comment 8 Jonathan Earl Brassow 2014-03-27 11:12:18 EDT
Sorry, comment 2 was meant to show what dmeventd was doing - which was wrong.  The fix was to make dmeventd call 'lvconvert' with the name of the top-level mirror device, not fix the command to accept mirror legs.

So, this is the expected behavior as long as you can verify dmeventd (i.e. the original bug) is fixed.
Comment 9 Nenad Peric 2014-03-28 12:59:55 EDT
Tested with  multiple iterations of revolution tests, especially with 'remove' policy which causes down-convert and did not run into the the issues mentioned in the opening comment. 
By checking /var/log/messages I could not find LVM trying to repair mimage_tmp anymore. 


Marking this bug VERIFIED with:

lvm2-2.02.105-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
lvm2-libs-2.02.105-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
lvm2-cluster-2.02.105-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
device-mapper-1.02.84-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
device-mapper-libs-1.02.84-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
device-mapper-event-1.02.84-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
device-mapper-event-libs-1.02.84-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
device-mapper-persistent-data-0.2.8-5.el7    BUILT: Sat Mar  1 02:15:56 CET 2014
cmirror-2.02.105-13.el7    BUILT: Wed Mar 19 11:38:19 CET 2014
Comment 10 Ludek Smid 2014-06-13 06:18:13 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.