Bug 1039218

Summary: mirror conversion _mimagetmp_ device sticking around after sync completes
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
lvm2 sub component: Mirroring and RAID QA Contact: Cluster QE <mspqa-list>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-20 22:25:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Corey Marthaler 2013-12-07 00:26:28 UTC
Description of problem:
Having this tmp image device hang around may lead to issues like bug 1038870.

./revolution_9 -e kill_primary_leg

host-050.virt.lab.msp.redhat.com: lvcreate --type mirror --mirrorlog disk -m 2 -n mirror_1 -L 2G revolution_9  /dev/sdb1:0-2557 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-150
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 6.91% )
   [...]
   0/1 mirror(s) are fully synced: ( 96.26% )
   1/1 mirror(s) are fully synced: ( 100.00% )
Creating ext on top of mirror(s) on host-050.virt.lab.msp.redhat.com...
mke2fs 1.42.8 (20-Jun-2013)
Mounting mirrored ext filesystems on host-050.virt.lab.msp.redhat.com...

================================================================================
Iteration 0.1 started at Fri Dec  6 17:29:26 CST 2013
================================================================================
ACTUAL LEG ORDER: /dev/sdb1 /dev/sdc1 /dev/sdd1
Scenario kill_primary_leg: Kill primary leg

********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdb1 /dev/sdc1 /dev/sdd1
* log devices:        /dev/sde1
* failpv(s):          /dev/sdb1
* failnode(s):        host-050.virt.lab.msp.redhat.com
* lvmetad:            1
* leg fault policy:   remove
* log fault policy:   allocate
*************************************************


Current mirror/raid device structure(s):
  LV                  Attr       LSize   Cpy%Sync Devices
  mirror_1            mwi-aom---   2.00g   100.00 mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)
  [mirror_1_mimage_0] iwi-aom---   2.00g          /dev/sdb1(0)
  [mirror_1_mimage_1] iwi-aom---   2.00g          /dev/sdc1(0)
  [mirror_1_mimage_2] iwi-aom---   2.00g          /dev/sdd1(0)
  [mirror_1_mlog]     lwi-aom---   4.00m          /dev/sde1(0)

AVAIL:2 - NEEDED:1
will_alloc_work=1

PV=/dev/sdb1
        mirror_1_mimage_0: 6
PV=/dev/sdb1
        mirror_1_mimage_0: 6

Writing verification files (checkit) to mirror(s) on...
        ---- host-050.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-050.virt.lab.msp.redhat.com ----

Disabling device sdb on host-050.virt.lab.msp.redhat.com (lvmetad is on)

Getting recovery check start time from /var/log/messages: Dec  6 17:29
Attempting I/O to cause mirror down conversion(s) on host-050.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.118981 s, 353 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  PV TjAkvc-TFK2-4HVV-FsFR-7S4O-Sbpg-LRsVam not recognised. Is the device missing?
  PV TjAkvc-TFK2-4HVV-FsFR-7S4O-Sbpg-LRsVam not recognised. Is the device missing?
  LV                  Attr       LSize   Cpy%Sync Devices
   mirror_1            mwi-aom---   2.00g   100.00 mirror_1_mimage_1(0),mirror_1_mimage_2(0)
   [mirror_1_mimage_1] iwi-aom---   2.00g          /dev/sdc1(0)
   [mirror_1_mimage_2] iwi-aom---   2.00g          /dev/sdd1(0)
   [mirror_1_mlog]     lwi-aom---   4.00m          /dev/sde1(0)

Verify that each of the mirror repairs finished successfully

Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s)
olog: 2.2
Verifying LOG device(s) /dev/sde1 *ARE* in the mirror(s)
Verifying LEG device /dev/sdc1 *IS* in the volume(s)
Verifying LEG device /dev/sdd1 *IS* in the volume(s)
verify the dm devices associated with /dev/sdb1 have been removed as expected
Checking REMOVAL of mirror_1_mimage_0 on:  host-050.virt.lab.msp.redhat.com

Verify that the mirror image order remains the same after the down conversion
EXPECTED LEG ORDER: /dev/sdc1 /dev/sdd1
  PV TjAkvc-TFK2-4HVV-FsFR-7S4O-Sbpg-LRsVam not recognised. Is the device missing?
  [...]
  PV TjAkvc-TFK2-4HVV-FsFR-7S4O-Sbpg-LRsVam not recognised. Is the device missing?
ACTUAL LEG ORDER: /dev/sdc1 /dev/sdd1
Verifying files (checkit) on mirror(s) on...
        ---- host-050.virt.lab.msp.redhat.com ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_mirror_1
Working dir:        /mnt/mirror_1/checkit

Enabling device sdb on host-050.virt.lab.msp.redhat.com (lvmetad is on)

-------------------------------------------------------------------------------
Force a vgreduce to either clean up the corrupt additional LV or lack of PV MDA
-------------------------------------------------------------------------------
Recreating PVs /dev/sdb1
Extending the recreated PVs back into VG revolution_9

HACK FOR BUG 799048: vgextend --restoremissing revolution_9 /dev/sdb1
  WARNING: PV /dev/sdb1 was not missing in VG revolution_9
  No PV has been restored.
HACK FOR BUG 799048: vgextend revolution_9 /dev/sdb1
  Physical volume '/dev/sdb1' is already in volume group 'revolution_9'
  Unable to add physical volume '/dev/sdb1' to volume group 'revolution_9'.
Getting PE for leg: /dev/sdb1
Getting PE for leg: /dev/sdc1
Getting PE for leg: /dev/sdd1
Up converting linear(s) back to mirror(s) on host-050.virt.lab.msp.redhat.com...
host-050.virt.lab.msp.redhat.com: lvconvert --mirrorlog disk -m 2 -b revolution_9/mirror_1 /dev/sdb1:0-2557 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-150
Waiting for convert to finish
mirror_1_mimagetmp_ still exists...
mirror_1_mimagetmp_ still exists...
mirror_1_mimagetmp_ still exists...

Dec  6 17:30:12 host-050 qarshd[4216]: Running cmdline: lvconvert --mirrorlog disk -m 2 -b revolution_9/mirror_1 /dev/sdb1:0-2557 /dev/sdc1:0-2557 /dev/sdd1:0-2557 /dev/sde1:0-150
Dec  6 17:30:13 host-050 lvm[1982]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Dec  6 17:30:13 host-050 lvm[1982]: Monitoring mirror device revolution_9-mirror_1_mimagetmp_3 for events.
Dec  6 17:30:13 host-050 lvm[1982]: Monitoring mirror device revolution_9-mirror_1 for events.
Dec  6 17:30:13 host-050 lvm[1982]: revolution_9-mirror_1_mimagetmp_3 is now in-sync.


[root@host-050 ~]# lvs -a -o +devices
LV                     Attr       LSize   Log           Cpy%Sync Convert              Devices
mirror_1               cwi-aom---   2.00g                 100.00 mirror_1_mimagetmp_3 mirror_1_mimagetmp_3(0),mirror_1_mimage_3(0)
[mirror_1_mimage_1]    iwi-aom---   2.00g                                             /dev/sdc1(0)
[mirror_1_mimage_2]    iwi-aom---   2.00g                                             /dev/sdd1(0)
[mirror_1_mimage_3]    iwi-aom---   2.00g                                             /dev/sdb1(0)
[mirror_1_mimagetmp_3] mwi-aom---   2.00g mirror_1_mlog   100.00                      mirror_1_mimage_1(0),mirror_1_mimage_2(0)   
[mirror_1_mlog]        lwi-aom---   4.00m                                             /dev/sde1(0)


Version-Release number of selected component (if applicable):
3.10.0-55.el7.x86_64
lvm2-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
lvm2-libs-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
lvm2-cluster-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-libs-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-event-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-event-libs-1.02.82-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013
device-mapper-persistent-data-0.2.8-2.el7    BUILT: Wed Oct 30 10:20:48 CDT 2013
cmirror-2.02.103-6.el7    BUILT: Wed Nov 27 02:28:25 CST 2013


How reproducible:
Often

Comment 1 Corey Marthaler 2014-01-24 21:57:05 UTC
This still exists in the latest rpms:

[root@host-050 ~]# lvs -a -o +devices
 LV                     Attr       LSize  Log           Cpy%Sync Convert            Devices
 mirror_1               cwi-aom--- 2.00g                100.00 mirror_1_mimagetmp_3 mirror_1_mimagetmp_3(0),mirror_1_mimage_3(0)
 [mirror_1_mimage_1]    iwi-aom--- 2.00g                                            /dev/sdb1(0)
 [mirror_1_mimage_2]    iwi-aom--- 2.00g                                            /dev/sdd1(0)
 [mirror_1_mimage_3]    iwi-aom--- 2.00g                                            /dev/sda1(0)
 [mirror_1_mimagetmp_3] mwi-aom--- 2.00g mirror_1_mlog   100.00                     mirror_1_mimage_1(0),mirror_1_mimage_2(0)   
 [mirror_1_mlog]        lwi-aom--- 4.00m                                            /dev/sdf1(0)

[root@host-050 ~]# /usr/tests/sts-rhel7.0/lvm2/bin/lvm_rpms 
lvm2-2.02.105-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014
lvm2-libs-2.02.105-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014
lvm2-cluster-2.02.105-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014
device-mapper-1.02.84-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014
device-mapper-libs-1.02.84-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014
device-mapper-event-1.02.84-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014
device-mapper-event-libs-1.02.84-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014
device-mapper-persistent-data-0.2.8-3.el7    BUILT: Fri Dec 27 13:40:56 CST 2013
cmirror-2.02.105-1.el7    BUILT: Wed Jan 22 07:34:15 CST 2014

Comment 2 Jonathan Earl Brassow 2014-02-20 22:16:40 UTC
Test suite problem... the lvconvert background process is being killed after the calling process logs out.

[root@host-074 ~]# lvcreate --type mirror -m 1 -L 100M -n lv vg
  Logical volume "lv" created
[root@host-074 ~]# lvs -a -o name,attr,copy_percent
  LV            Attr       Cpy%Sync
  root          -wi-ao----         
  swap          -wi-ao----         
  lv            mwi-a-m---   100.00
  [lv_mimage_0] iwi-aom---         
  [lv_mimage_1] iwi-aom---         
  [lv_mlog]     lwi-aom---         
[root@host-074 ~]# lvconvert -m 2 vg/lv -b
[root@host-074 ~]# lvs -a -o name,attr,copy_percent vg
  LV            Attr       Cpy%Sync
  lv            mwi-a-m---   100.00
  [lv_mimage_0] iwi-aom---         
  [lv_mimage_1] iwi-aom---         
  [lv_mimage_2] iwi-aom---         
  [lv_mlog]     lwi-aom---         
[root@host-074 ~]# lvconvert -m 1 vg/lv -b
  Logical volume lv converted.
[root@host-074 ~]# lvs -a -o name,attr,copy_percent vg
  LV            Attr       Cpy%Sync
  lv            mwi-a-m---   100.00
  [lv_mimage_0] iwi-aom---         
  [lv_mimage_1] iwi-aom---         
  [lv_mlog]     lwi-aom---         
[root@host-074 ~]# ssh localhost lvconvert -m 2 vg/lv -b
root@localhost's password: 
[root@host-074 ~]# lvs -a -o name,attr,copy_percent vg
  LV            Attr       Cpy%Sync
  lv            mwi-a-m---   100.00
  [lv_mimage_0] iwi-aom---         
  [lv_mimage_1] iwi-aom---         
  [lv_mimage_2] iwi-aom---         
  [lv_mlog]     lwi-aom---         
[root@host-074 ~]# lvconvert -m 1 vg/lv -b
  Logical volume lv converted.
[root@host-074 ~]# lvs -a -o name,attr,copy_percent vg
  LV            Attr       Cpy%Sync
  lv            mwi-a-m---   100.00
  [lv_mimage_0] iwi-aom---         
  [lv_mimage_1] iwi-aom---         
  [lv_mlog]     lwi-aom---         
[root@host-074 ~]# qarsh host-074 lvconvert -m 2 vg/lv -b
[root@host-074 ~]# lvs -a -o name,attr,copy_percent vg
  LV               Attr       Cpy%Sync
  lv               cwi-a-m---   100.00
  [lv_mimage_0]    iwi-aom---         
  [lv_mimage_1]    iwi-aom---         
  [lv_mimage_2]    iwi-aom---         
  [lv_mimagetmp_2] mwi-aom---   100.00
  [lv_mlog]        lwi-aom---

Comment 3 Jonathan Earl Brassow 2014-02-20 22:25:29 UTC
You can take the '-b' out of the 'lvconvert' command, or you'll have to find a way to allow the process to keep running like ssh does.