Bug 683270

Summary: HA LVM service relocation after leg device failure causes resume transient error
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: agk, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.95-8.el6 Doc Type: Bug Fix
Doc Text:
This bug made it impossible to add another image (copy) to a mirrored logical volume whose activation is regulated by tags present in the 'volume_list' parameter of lvm.conf. The issue has been resolved by temporarily copying the mirror's tags to the in-coming image so that it can be properly activated.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 14:51:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 756082    

Description Corey Marthaler 2011-03-08 23:17:07 UTC
Description of problem:
Scenario: Kill random legs

********* Mirror info for this scenario *********
* mirrors:            ha1
* leg devices:        /dev/sdg1 /dev/sdf1 /dev/sdh1
* log devices:        /dev/sdb1
* failpv(s):          /dev/sdg1
* failnode(s):        taft-01
* leg fault policy:   remove
* log fault policy:   allocate

* HA mirror:          YES
* HA service name:    halvm
* HA current owner:   taft-01
*************************************************

PV=/dev/sdg1
        ha1_mimage_0: 6
PV=/dev/sdg1
        ha1_mimage_0: 6


Disabling device sdg on taft-01
leaving disable_device

Attempting I/O to cause mirror down conversion(s) on taft-01
1+0 records in
1+0 records out
512 bytes (512 B) copied, 4.3195e-05 s, 11.9 MB/s

CURRENT OWNER (and PREVIOUS): taft-01
NEW CURRENT OWNER: taft-02
Relocating halvm from taft-02 to taft-02


Verifying current sanity of lvm after the failure
Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
olog: 2
Verifying LOG device(s) /dev/sdb1 *ARE* in the mirror(s)
Verifying LEG device /dev/sdf1 *IS* in the volume(s)
Verifying LEG device /dev/sdh1 *IS* in the volume(s)
verify the dm devices associated with /dev/sdg1 have been removed as expected
Checking REMOVAL of ha1_mimage_0 on:  taft-02

Verify that the mirror image order remains the same after the down conversion
Verify that each of the mirror repairs finished successfully

Enabling device sdg on taft-01

Recreating PVs /dev/sdg1
Extending the recreated PVs back into VG TAFT
Up converting linear(s) back to mirror(s) on taft-02...
taft-02: lvconvert -m 2 -b TAFT/ha1 /dev/sdg1:0-20000 /dev/sdf1:0-20000 /dev/sdh1:0-20000 /dev/sdb1:0-150
  Not activating TAFT/ha1_mimagetmp_3 since it does not pass activation filter.
  Failed to resume transient error LV ha1_mimagetmp_3 for mirror conversion in VG TAFT.
  Failed to insert resync layer
  Failed to insert resync layer
couldn't up convert mirror ha1 on taft-02
FI_engine: recover() method failed


[root@taft-02 ~]# clustat
Cluster Status for TAFT @ Tue Mar  8 17:10:07 2011
Member Status: Quorate

 Member Name   ID   Status
 ------ ----   ---- ------
 taft-01         1  Online, rgmanager
 taft-02         2  Online, Local, rgmanager
 taft-03         3  Online, rgmanager
 taft-04         4 Online, rgmanager

 Service Name    Owner (Last)    State
 ------- ----    ----- ------    -----
 service:halvm   taft-02         started

[root@taft-02 ~]# lvs -a -o +devices
  LV                VG        Attr   LSize  Log      Copy%  Convert Devices
  ha1               TAFT      mwi-ao  3.00g ha1_mlog 100.00         ha1_mimage_1(0),ha1_mimage_2(0)
  [ha1_mimage_1]    TAFT      iwi-ao  3.00g                         /dev/sdf1(0)
  [ha1_mimage_2]    TAFT      iwi-ao  3.00g                         /dev/sdh1(0)
  [ha1_mimagetmp_3] TAFT      vwi---  3.00g
  [ha1_mlog]        TAFT      lwi-ao  4.00m                         /dev/sdb1(0)


Version-Release number of selected component (if applicable):
2.6.32-94.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.31.el6    BUILT: Wed Jan 26 05:39:15 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011

Comment 1 Corey Marthaler 2011-03-08 23:26:36 UTC
Odd. After deactivating and reactivating this VG, it no longer thinks this
mirror is a mirror, instead, it now thinks these are all individual linear
devices.

[root@taft-02 ~]# vgchange -ay TAFT
  4 logical volume(s) in volume group "TAFT" now active

[root@taft-02 ~]# lvs -a -o +devices
  LV                VG        Attr   LSize  Convert Devices
  ha1               TAFT      -wi-a-  3.00g /dev/sdf1(0)
  ha1_mimage_1      TAFT      vwi-a-  3.00g
  ha1_mimage_2      TAFT      -wi-a-  3.00g /dev/sdh1(0)
  [ha1_mimagetmp_3] TAFT      vwi---  3.00g
  ha1_mlog          TAFT      -wi-a-  4.00m /dev/sdb1(0)

[root@taft-02 ~]# lvs
  LV           VG        Attr   LSize  
  ha1          TAFT      -wi-a-  3.00g
  ha1_mimage_1 TAFT      vwi-a-  3.00g
  ha1_mimage_2 TAFT      -wi-a-  3.00g
  ha1_mlog     TAFT      -wi-a-  4.00m

Comment 2 RHEL Program Management 2011-04-04 02:03:46 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 5 Jonathan Earl Brassow 2012-04-27 18:50:39 UTC
This could be caused by the fact that the new LV which is to become a sub-lv of the mirror does not have the necessary tag on it to be activated.  Since the activation fails, the mirror fails to up-convert.  This will leave whatever LVs were created (but not activated) in place and all LVs will appear as linear devices.

Need to find a way to carry the tag over to the new sub-LV while it is being activated to clear it, then remove the tag as it becomes part of the mirror.

Comment 6 Jonathan Earl Brassow 2012-05-04 19:28:35 UTC
Here are the two commands necessary to reproduce this (no HA LVM needed):

[root@bp-01 ~]# lvcreate -L 200M -n lv vg
  Logical volume "lv" created

[root@bp-01 ~]# lvconvert -m +1 vg/lv --config 'activation { volume_list = [ "@foo" ] }'
  Not activating vg/lv_mlog since it does not pass activation filter.
  Aborting. Failed to activate mirror log.
  Failed to initialise mirror log.

Comment 7 Jonathan Earl Brassow 2012-05-05 13:09:43 UTC
Committed upstream:

commit 679f946dd991d4116d63e60e68f212312447e5aa
Author: Jonathan Earl Brassow <jbrassow>
Date:   Sat May 5 02:08:46 2012 +0000

    Fix up-convert when mirror activation is controled by volume_list and tags.
    
    When mirrors are up-converted, a transient mirror layer is put in so that
    only the new devices are sync'ed.  That transient layer must carry the tags
    of the original mirror LV, otherwise it will fail to activate when activation
    is regulated by lvm.conf:activation/volume_list.  The conversion would then
    fail.
    
    The fix is to do exactly the same thing that is being done for linear ->
    mirror converting (lib/metadata/mirror.c:_init_mirror_log()).  We copy the
    tags temporarily for the new LV and remove them after the activation.

Comment 9 Jonathan Earl Brassow 2012-05-07 13:53:57 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This bug made it impossible to add another image (copy) to a mirrored logical volume whose activation is regulated by tags present in the 'volume_list' parameter of lvm.conf.  The issue has been resolved by temporarily copying the mirror's tags to the in-coming image so that it can be properly activated.

Comment 14 Corey Marthaler 2012-05-10 19:11:16 UTC
Fix verified in the latest rpms.


2.6.32-269.el6.x86_64
lvm2-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
lvm2-libs-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
lvm2-cluster-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-libs-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-event-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-event-libs-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
cmirror-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012



[root@hayes-01 ~]# lvcreate -L 200M -n lv hayes --addtag foo
  Logical volume "lv" created
[root@hayes-01 ~]# lvconvert -m +1 hayes/lv --config 'activation { volume_list = [ "@foo" ] }'
  hayes/lv: Converted: 0.0%
  hayes/lv: Converted: 2.0%
  hayes/lv: Converted: 4.0%
  [...]
  hayes/lv: Converted: 96.0%
  hayes/lv: Converted: 98.0%
  hayes/lv: Converted: 100.0%

Comment 16 errata-xmlrpc 2012-06-20 14:51:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html