Bug 1266932

Summary: "Volume group not found" issues when doing meta intensive operations on shared VGs
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: LVM lock daemon / lvmlockd QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: unspecified CC: agk, heinzm, jbrassow, prajnoha, teigland, zkabelac
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.165-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 04:10:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
-vvvv of failing lvcreate cmd
none
verbose output of failing create cmd none

Description Corey Marthaler 2015-09-28 14:53:30 UTC
Description of problem:
I've seen this quite a few times now when running a test that creates many different types of LVs from different nodes on shared storage.

An lvm command will eventually fail with "VG metadata comparison failed" and "Volume group "activator1" not found" errors. Then if you wait for a bit, the cmd will eventually succeed. So unlike bug 1261648 where once the "metadata comparison" error showed up, it stuck around, in this bug, it goes away 



setting up first "global lock" dummy vg for lvmlockd...
vgcreate --shared global /dev/mapper/mpathe4
mckinley-01: vgchange --lock-start global
mckinley-02: vgchange --lock-start global
  Skipping global lock: lockspace not found or started
mckinley-03: vgchange --lock-start global
  Skipping global lock: lockspace not found or started
mckinley-04: vgchange --lock-start global
  Skipping global lock: lockspace not found or started

creating lvm devices...
mckinley-01: pvcreate /dev/mapper/mpathe3 /dev/mapper/mpathe2 /dev/mapper/mpathe1 /dev/mapper/mpathd4 /dev/mapper/mpathd3 /dev/mapper/mpathd2 /dev/mapper/mpathd1
mckinley-01: vgcreate --shared activator1 /dev/mapper/mpathe3 /dev/mapper/mpathe2 /dev/mapper/mpathe1 /dev/mapper/mpathd4 /dev/mapper/mpathd3 /dev/mapper/mpathd2 /dev/mapper/mpathd1
mckinley-01: vgchange --lock-start activator1
mckinley-02: vgchange --lock-start activator1
mckinley-03: vgchange --lock-start activator1
mckinley-04: vgchange --lock-start activator1

mckinley-01: pvcreate /dev/mapper/mpatha4 /dev/mapper/mpatha3 /dev/mapper/mpatha2 /dev/mapper/mpatha1 /dev/mapper/mpathf4 /dev/mapper/mpathf3 /dev/mapper/mpathf2
mckinley-01: vgcreate --shared activator3 /dev/mapper/mpatha4 /dev/mapper/mpatha3 /dev/mapper/mpatha2 /dev/mapper/mpatha1 /dev/mapper/mpathf4 /dev/mapper/mpathf3 /dev/mapper/mpathf2
mckinley-01: vgchange --lock-start activator3
mckinley-02: vgchange --lock-start activator3
mckinley-03: vgchange --lock-start activator3
mckinley-04: vgchange --lock-start activator3

mckinley-01: pvcreate /dev/mapper/mpathf1 /dev/mapper/mpathb4 /dev/mapper/mpathb3 /dev/mapper/mpathb2 /dev/mapper/mpathb1 /dev/mapper/mpathg4 /dev/mapper/mpathg3
mckinley-01: vgcreate --shared activator2 /dev/mapper/mpathf1 /dev/mapper/mpathb4 /dev/mapper/mpathb3 /dev/mapper/mpathb2 /dev/mapper/mpathb1 /dev/mapper/mpathg4 /dev/mapper/mpathg3
mckinley-01: vgchange --lock-start activator2
mckinley-02: vgchange --lock-start activator2
mckinley-03: vgchange --lock-start activator2
mckinley-04: vgchange --lock-start activator2

mckinley-01: pvcreate /dev/mapper/mpathg2 /dev/mapper/mpathg1 /dev/mapper/mpathh4 /dev/mapper/mpathh3 /dev/mapper/mpathh2 /dev/mapper/mpathh1 /dev/mapper/mpathc4
mckinley-01: vgcreate --shared activator4 /dev/mapper/mpathg2 /dev/mapper/mpathg1 /dev/mapper/mpathh4 /dev/mapper/mpathh3 /dev/mapper/mpathh2 /dev/mapper/mpathh1 /dev/mapper/mpathc4
mckinley-01: vgchange --lock-start activator4
mckinley-02: vgchange --lock-start activator4
mckinley-03: vgchange --lock-start activator4
mckinley-04: vgchange --lock-start activator4


Creating LVs on VG activator1
        create origin (slow) volume
                mckinley-01: lvcreate --activate ey -L 150M -n cache1 activator1 /dev/mapper/mpathd1
        create cache (fast) volume
                mckinley-01: lvcreate --activate ey -L 50M -n cache1_fast activator1 /dev/mapper/mpathe1
        create cache metadata (fast) volume
                mckinley-01: lvcreate --activate ey -L 8M -n cache1_fast_meta activator1 /dev/mapper/mpathe1
        create cache pool volume by combining the cache data and cache metadata (fast) volumes
                mckinley-01: lvconvert --type cache-pool --poolmetadata activator1/cache1_fast_meta activator1/cache1_fast --yes
  WARNING: Converting logical volume activator1/cache1_fast and activator1/cache1_fast_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
        create cached volume by combining the cache pool (fast) and origin (slow) volumes
                mckinley-01: lvconvert --type cache --cachepool activator1/cache1_fast activator1/cache1 --yes
        create a 2 redundant legged raid1
                mckinley-01 lvcreate --activate ey --type raid1 -m 2 -n raid1 -L 150M activator1
        create a thin pool volume
                mckinley-01 lvcreate --activate ey --thinpool pool1 --zero n -L 150M activator1
        create a thin virtual volume
                mckinley-01 lvcreate --activate ey --virtualsize 75M -T activator1/pool1 -n virt1
        create a linear on mckinley-02
                mckinley-02: lvcreate --activate sy -n linear1 -L 150M activator1
        create a stripe on mckinley-03
                mckinley-03 lvcreate --activate sy -i 1 -n stripe1 -L 150M activator1
        create a 2 redundant legged mirror on mckinley-03
                mckinley-03 lvcreate --type mirror -m 2 -n mirror1 -L 150M activator1
        create origin (slow) volume
                mckinley-01: lvcreate --activate ey -L 150M -n cache2 activator1 /dev/mapper/mpathe3
        create cache (fast) volume
                mckinley-01: lvcreate --activate ey -L 50M -n cache2_fast activator1 /dev/mapper/mpathd3
        create cache metadata (fast) volume
                mckinley-01: lvcreate --activate ey -L 8M -n cache2_fast_meta activator1 /dev/mapper/mpathd3
        create cache pool volume by combining the cache data and cache metadata (fast) volumes
                mckinley-01: lvconvert --type cache-pool --poolmetadata activator1/cache2_fast_meta activator1/cache2_fast --yes
  WARNING: Converting logical volume activator1/cache2_fast and activator1/cache2_fast_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
        create cached volume by combining the cache pool (fast) and origin (slow) volumes
                mckinley-01: lvconvert --type cache --cachepool activator1/cache2_fast activator1/cache2 --yes
        create a 1 redundant legged raid1
                mckinley-03 lvcreate --activate ey --type raid1 -m 1 -n raid2 -L 150M activator1
        create a thin pool volume
  VG metadata comparison failed
  Volume group "activator1" not found
                mckinley-02 lvcreate --activate ey --thinpool pool2 --zero n -L 150M activator1
couldn't create pool2 on mckinley-02

Sep 28 09:18:36 mckinley-02 qarshd[69644]: Running cmdline: lvcreate --activate ey --thinpool pool2 --zero n -L 150M activator1
Sep 28 09:24:57 mckinley-02 sanlock[46232]: 2015-09-28 09:24:57-0500 1031910 [46238]: r1039 ballot 21 choose bk_max[1522] lver 21 mbal 1523 bal 1523 inp 1523 1 1031135
Sep 28 09:24:57 mckinley-02 sanlock[46232]: 2015-09-28 09:24:57-0500 1031910 [46238]: r1039 ballot 21 commit other owner 1523 1 1031135
Sep 28 09:24:57 mckinley-02 sanlock[46232]: 2015-09-28 09:24:57-0500 1031910 [46238]: r1039 acquire_token held error -245
Sep 28 09:24:57 mckinley-02 sanlock[46232]: 2015-09-28 09:24:57-0500 1031910 [46238]: r1039 cmd_acquire 3,9,46244 acquire_token -245 lease owned by other host



[root@mckinley-02 ~]# lvs -a -o +devices
  VG metadata comparison failed
  Volume group "activator4" not found
  Cannot process volume group activator4
  LV                  VG          Attr       LSize   Pool          Origin         Data%  Meta%  Log           Devices
  cache1              activator1  Cwi---C--- 152.00m [cache1_fast] [cache1_corig]                             cache1_corig(0)
  [cache1_corig]      activator1  owi---C--- 152.00m                                                          /dev/mapper/mpathb1(0)
  [cache1_fast]       activator1  Cwi---C---  52.00m                                                          cache1_fast_cdata(0)
  [cache1_fast_cdata] activator1  Cwi-------  52.00m                                                          /dev/mapper/mpatha1(0)
  [cache1_fast_cmeta] activator1  ewi-------   8.00m                                                          /dev/mapper/mpatha1(13)
  cache2              activator1  Cwi---C--- 152.00m [cache2_fast] [cache2_corig]                             cache2_corig(0)
  [cache2_corig]      activator1  owi---C--- 152.00m                                                          /dev/mapper/mpatha3(257)
  [cache2_fast]       activator1  Cwi---C---  52.00m                                                          cache2_fast_cdata(0)
  [cache2_fast_cdata] activator1  Cwi-------  52.00m                                                          /dev/mapper/mpathb3(0)
  [cache2_fast_cmeta] activator1  ewi-------   8.00m                                                          /dev/mapper/mpathb3(13)
  linear1             activator1  -wi-a----- 152.00m                                                          /dev/mapper/mpatha3(143)
  [lvmlock]           activator1  -wi-ao---- 256.00m                                                          /dev/mapper/mpatha3(0)
  [lvol0_pmspare]     activator1  ewi-------   8.00m                                                          /dev/mapper/mpatha3(64)
  mirror1             activator1  mwi---m--- 152.00m                                            mirror1_mlog  mirror1_mimage_0(0),mirror1_mimage_1(0),mirror1_mimage_2(0)
  [mirror1_mimage_0]  activator1  Iwi---m--- 152.00m                                                          /dev/mapper/mpatha3(219)
  [mirror1_mimage_1]  activator1  Iwi---m--- 152.00m                                                          /dev/mapper/mpatha2(39)
  [mirror1_mimage_2]  activator1  Iwi---m--- 152.00m                                                          /dev/mapper/mpatha1(54)
  [mirror1_mlog]      activator1  lwi---m---   4.00m                                                          /dev/mapper/mpathb1(39)
  pool1               activator1  twi---t--- 152.00m                                                          pool1_tdata(0)
  [pool1_tdata]       activator1  Twi------- 152.00m                                                          /dev/mapper/mpatha3(105)
  [pool1_tmeta]       activator1  ewi-------   4.00m                                                          /dev/mapper/mpathb1(38)
  raid1               activator1  rwi---r--- 152.00m                                                          raid1_rimage_0(0),raid1_rimage_1(0),raid1_rimage_2(0)
  [raid1_rimage_0]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha3(67)
  [raid1_rimage_1]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha2(1)
  [raid1_rimage_2]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha1(16)
  [raid1_rmeta_0]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha3(66)
  [raid1_rmeta_1]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha2(0)
  [raid1_rmeta_2]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha1(15)
  raid2               activator1  rwi---r--- 152.00m                                                          raid2_rimage_0(0),raid2_rimage_1(0)
  [raid2_rimage_0]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha3(296)
  [raid2_rimage_1]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha2(78)
  [raid2_rmeta_0]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha3(295)
  [raid2_rmeta_1]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha2(77)
  stripe1             activator1  -wi------- 152.00m                                                          /dev/mapper/mpatha3(181)
  virt1               activator1  Vwi---t---  76.00m pool1
  [lvmlock]           activator2  -wi-ao---- 256.00m                                                          /dev/mapper/mpathh1(0)
  [lvmlock]           activator3  -wi-ao---- 256.00m                                                          /dev/mapper/mpathc4(0)
  [lvmlock]           global      -wi-ao---- 256.00m                                                          /dev/mapper/mpatha4(0)

[root@mckinley-02 ~]# lvcreate --activate ey --thinpool pool2 --zero n -L 150M activator1
  Rounding up size to full physical extent 152.00 MiB
  Logical volume "pool2" created.

[root@mckinley-02 ~]# lvs -a -o +devices
  LV                  VG          Attr       LSize   Pool          Origin         Data%  Meta%  Log           Devices
  cache1              activator1  Cwi---C--- 152.00m [cache1_fast] [cache1_corig]                             cache1_corig(0)
  [cache1_corig]      activator1  owi---C--- 152.00m                                                          /dev/mapper/mpathb1(0)
  [cache1_fast]       activator1  Cwi---C---  52.00m                                                          cache1_fast_cdata(0)
  [cache1_fast_cdata] activator1  Cwi-------  52.00m                                                          /dev/mapper/mpatha1(0)
  [cache1_fast_cmeta] activator1  ewi-------   8.00m                                                          /dev/mapper/mpatha1(13)
  cache2              activator1  Cwi---C--- 152.00m [cache2_fast] [cache2_corig]                             cache2_corig(0)
  [cache2_corig]      activator1  owi---C--- 152.00m                                                          /dev/mapper/mpatha3(257)
  [cache2_fast]       activator1  Cwi---C---  52.00m                                                          cache2_fast_cdata(0)
  [cache2_fast_cdata] activator1  Cwi-------  52.00m                                                          /dev/mapper/mpathb3(0)
  [cache2_fast_cmeta] activator1  ewi-------   8.00m                                                          /dev/mapper/mpathb3(13)
  linear1             activator1  -wi-a----- 152.00m                                                          /dev/mapper/mpatha3(143)
  [lvmlock]           activator1  -wi-ao---- 256.00m                                                          /dev/mapper/mpatha3(0)
  [lvol0_pmspare]     activator1  ewi-------   8.00m                                                          /dev/mapper/mpatha3(64)
  mirror1             activator1  mwi---m--- 152.00m                                            mirror1_mlog  mirror1_mimage_0(0),mirror1_mimage_1(0),mirror1_mimage_2(0)
  [mirror1_mimage_0]  activator1  Iwi---m--- 152.00m                                                          /dev/mapper/mpatha3(219)
  [mirror1_mimage_1]  activator1  Iwi---m--- 152.00m                                                          /dev/mapper/mpatha2(39)
  [mirror1_mimage_2]  activator1  Iwi---m--- 152.00m                                                          /dev/mapper/mpatha1(54)
  [mirror1_mlog]      activator1  lwi---m---   4.00m                                                          /dev/mapper/mpathb1(39)
  pool1               activator1  twi---t--- 152.00m                                                          pool1_tdata(0)
  [pool1_tdata]       activator1  Twi------- 152.00m                                                          /dev/mapper/mpatha3(105)
  [pool1_tmeta]       activator1  ewi-------   4.00m                                                          /dev/mapper/mpathb1(38)
  pool2               activator1  twi-a-t--- 152.00m                              0.00   0.88                 pool2_tdata(0)
  [pool2_tdata]       activator1  Twi-ao---- 152.00m                                                          /dev/mapper/mpatha3(334)
  [pool2_tmeta]       activator1  ewi-ao----   4.00m                                                          /dev/mapper/mpathb1(40)
  raid1               activator1  rwi---r--- 152.00m                                                          raid1_rimage_0(0),raid1_rimage_1(0),raid1_rimage_2(0)
  [raid1_rimage_0]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha3(67)
  [raid1_rimage_1]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha2(1)
  [raid1_rimage_2]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha1(16)
  [raid1_rmeta_0]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha3(66)
  [raid1_rmeta_1]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha2(0)
  [raid1_rmeta_2]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha1(15)
  raid2               activator1  rwi---r--- 152.00m                                                          raid2_rimage_0(0),raid2_rimage_1(0)
  [raid2_rimage_0]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha3(296)
  [raid2_rimage_1]    activator1  Iwi---r--- 152.00m                                                          /dev/mapper/mpatha2(78)
  [raid2_rmeta_0]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha3(295)
  [raid2_rmeta_1]     activator1  ewi---r---   4.00m                                                          /dev/mapper/mpatha2(77)
  stripe1             activator1  -wi------- 152.00m                                                          /dev/mapper/mpatha3(181)
  virt1               activator1  Vwi---t---  76.00m pool1
  [lvmlock]           activator2  -wi-ao---- 256.00m                                                          /dev/mapper/mpathh1(0)
  [lvmlock]           activator3  -wi-ao---- 256.00m                                                          /dev/mapper/mpathc4(0)
  [lvmlock]           activator4  -wi-ao---- 256.00m                                                          /dev/mapper/mpathe2(0)
  [lvmlock]           global      -wi-ao---- 256.00m                                                          /dev/mapper/mpatha4(0)



Version-Release number of selected component (if applicable):
3.10.0-306.el7.x86_64
lvm2-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-libs-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-cluster-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015


How reproducible:
Everytime

Comment 2 Corey Marthaler 2015-09-28 16:27:55 UTC
This appears to have to do with mixing shared and exclusively locked lvs on the same vg.

Comment 3 David Teigland 2015-09-28 19:46:31 UTC
Bits of the logs from one reproduction shows that lvmetad holds VG version 54, the VG lock is acquired which finds the latest version on disk is 56, so lvmlockd invalidates the VG in lvmetad, and tells lvmetad the latest version is 56.  This all looks to be correctly recorded in lvmetad.  What's not clear is what the command does when it reads the VG.  It should read lvmetad version 54, see the invalid flag, then read version 56 from disk.  I'm guessing that it's complaining about version 56 on disk.  We probably need to enable command debugging to print the actual metadata that it's complaining about.

# VGID to INFO version mapping

vgid_to_info {
    n2PS1S-ZQrj-ueSe-SyDJ-1RmD-3xig-bqKRJb = 0
    BKOtn7-N6vL-r0wN-AgRP-iA1p-FUEm-8JgKrc = 0
    3Kz9wm-iT3M-UCvn-zkep-RsCb-kVjS-r8gSH2 = 56
    UYZH9P-4742-NB5N-d4Im-RF6C-6q2W-8WkUyO = 0
    f02nyp-JAja-TCbt-DYFZ-dG67-25Kx-VbjT9C = 0
}

# VGID to INFO flags mapping

vgid_to_info {
    n2PS1S-ZQrj-ueSe-SyDJ-1RmD-3xig-bqKRJb = 0
    BKOtn7-N6vL-r0wN-AgRP-iA1p-FUEm-8JgKrc = 0
    3Kz9wm-iT3M-UCvn-zkep-RsCb-kVjS-r8gSH2 = 1
    UYZH9P-4742-NB5N-d4Im-RF6C-6q2W-8WkUyO = 0
    f02nyp-JAja-TCbt-DYFZ-dG67-25Kx-VbjT9C = 0
}

# VGID to VGNAME mapping

vgid_to_vgname {
    n2PS1S-ZQrj-ueSe-SyDJ-1RmD-3xig-bqKRJb = "activator3"
    BKOtn7-N6vL-r0wN-AgRP-iA1p-FUEm-8JgKrc = "activator2"
    3Kz9wm-iT3M-UCvn-zkep-RsCb-kVjS-r8gSH2 = "activator1"
    2AAFX5-QPAe-oTxp-O1I0-0VwH-gwBA-cVKNnF = "rhel_mckinley-01"
    UYZH9P-4742-NB5N-d4Im-RF6C-6q2W-8WkUyO = "activator4"
    f02nyp-JAja-TCbt-DYFZ-dG67-25Kx-VbjT9C = "global"
}

# VGNAME to VGID mapping

vgname_to_vgid {
    activator1 = "3Kz9wm-iT3M-UCvn-zkep-RsCb-kVjS-r8gSH2"
    activator2 = "BKOtn7-N6vL-r0wN-AgRP-iA1p-FUEm-8JgKrc"
    activator4 = "UYZH9P-4742-NB5N-d4Im-RF6C-6q2W-8WkUyO"
    activator3 = "n2PS1S-ZQrj-ueSe-SyDJ-1RmD-3xig-bqKRJb"
    rhel_mckinley-01 = "2AAFX5-QPAe-oTxp-O1I0-0VwH-gwBA-cVKNnF"
    global = "f02nyp-JAja-TCbt-DYFZ-dG67-25Kx-VbjT9C"
}


3Kz9wm-iT3M-UCvn-zkep-RsCb-kVjS-r8gSH2 {
        id="3Kz9wm-iT3M-UCvn-zkep-RsCb-kVjS-r8gSH2"
        seqno=54
        format="lvm2"
        status=["RESIZEABLE","READ"]
        flags="WRITE_LOCKED"
        lock_type="sanlock"
        lock_args="1.0.0:lvmlock"
        extent_size=8192
        max_lv=0
        max_pv=0
        metadata_copies=0


1443466540 recv lvcreate[77724.1549] lock vg "activator1" mode ex flags 0
1443466540 S lvm_activator1 R VGLK action lock ex
1443466540 S lvm_activator1 R VGLK res_lock mode ex
1443466540 S lvm_activator1 R VGLK lock_san acquire /dev/mapper/activator1-lvmlock:69206016
1443466540 S lvm_activator1 R VGLK lock_san get r_version 56
1443466540 S lvm_activator1 R VGLK res_lock lm done r_version 56
1443466540 S lvm_activator1 R VGLK res_lock set lvmetad vg version 56
1443466540 send lvcreate[77724.1549] lock vg rv 0  
1443466541 close lvcreate[77724.1549] fd 9
1443466541 S lvm_activator1 R VGLK res_unlock from close
1443466541 S lvm_activator1 R VGLK unlock_san r_version 0 flags 0
1443466541 S lvm_activator1 R VGLK res_unlock lm done


Sep 28 13:55:40 mckinley-01 qarshd[77723]: Running cmdline: lvcreate --activate ey --thinpool pool4 --zero n -L 150M activator1


VG metadata comparison failed
Volume group "activator1" not found

Comment 5 Corey Marthaler 2015-09-28 22:51:53 UTC
Created attachment 1078080 [details]
-vvvv of failing lvcreate cmd

Comment 6 David Teigland 2015-09-29 18:57:51 UTC
I've added a commit upstream that will show what exactly is causing the comparison error.  Will still to run with -vvv to see those messages.

Comment 7 David Teigland 2016-06-13 22:02:29 UTC
I reproduced this and have a fix that I'm doing more testing with.

Comment 8 David Teigland 2016-06-15 18:09:10 UTC
I fixed a metadata comparison bug here:
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=ae0a8740c5899520ec4c94fbf080623167d8ecba

We never had enough information about the problem above to say if it would be fixed by this.

Comment 10 Corey Marthaler 2016-09-09 19:48:54 UTC
Created attachment 1199597 [details]
verbose output of failing create cmd

This is still reproducible with the latest rpms.


#format_text/format-text.c:548         Read activator3 metadata (64) from /dev/mapper/mpathe1 at 1079296 size 33784
#cache/lvmetad.c:1923   VG activator3 metadata comparison failed for device /dev/mapper/mpathe1 vs /dev/mapper/mpathd1
#cache/lvmetad.c:70         VG activator3 metadata inequality at creation_time / creation_time: 1473449603 / 1473449604
#metadata/vg.c:89         Freeing VG activator3 at 0x7f6bf47a1de0.
#cache/lvmetad.c:1093         VG activator3 from lvmetad not found during rescan.
#metadata/vg.c:89         Freeing VG activator3 at 0x7f6bf474dca0.
#metadata/metadata.c:4793         <backtrace>
#metadata/metadata.c:5821   Volume group "activator3" not found



3.10.0-498.el7.x86_64
lvm2-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
lvm2-libs-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
lvm2-cluster-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-libs-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-event-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-event-libs-1.02.134-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.165-1.el7    BUILT: Wed Sep  7 11:04:22 CDT 2016

Comment 11 Corey Marthaler 2016-09-12 22:02:23 UTC
This actually doesn't require many LVs or VGs. I'm able to reproduce this with a test using just one VG and doing meta data intensive operations on a few LVs (create, up/down mirror convert, read/write perm changes, up/down activation, deletion).

Comment 12 David Teigland 2016-09-13 15:09:28 UTC
Do you have a record of the last few commands that are run prior to hitting this, and possibly an lvm log of those commands?  There's a good chance that the problem is created by a command that has already completed, not by the command that reports the problem.  Also, if this happens again, it might be useful to just stop everything and let me take a look at the system.  I'm curious to inspect the metadata on each of the PVs and in the cache at that point.

Comment 13 Alasdair Kergon 2016-09-13 16:21:21 UTC
Indeed, while we can fix the symptom (mismatched creation_time) pretty easily and it's harmless in the case you've reported, we want to understand what's causing it in case it indicates a more serious problem that needs fixing.

Comment 14 Alasdair Kergon 2016-09-13 16:28:58 UTC
The internal VG metadata is translated into a buffer which is attached to the 'fid' and vg_write shares this so all disks should indeed have the same content and show identical creation_times if the locking and sequence number incrementing is behaving correctly and the calls to vg_write() and fid initialisation/sharing are done correctly.  So we need to see a system in this state and examine the on-disk metadata on each PV (including previous versions), the backup/archive files on each node, the -vvvv traces of the previous commands that were run, and reconstruct how the state arose to find the error in the logic.

Comment 15 David Teigland 2016-09-13 18:39:15 UTC
This appears to be an issue where the lvmcache state is not quite correct for the one command.  Perhaps the creation_time is not being invalidated or refreshed in lvmcache when the rest of the metadata is.  Analyzing the lvmcache related messages when reading the two devices with mismatching creation_time may give us a more exact answer (that's the log in comment 10.)

Comment 16 David Teigland 2016-09-13 21:28:28 UTC
The problem has been found.  To compare the metadata between two devs, the code is generating a new config tree for each instance of the metadata from the two devs, and then comparing those config trees.  The generation of the config tree inserts the extraneous data (desc and creation_time), which causes a mismatch if it just happens that a one second boundary is crossed in the process.  The actual VG metadata itself matches. The extraneous data should be excluded from the comparison; working on a fix for that now.

Comment 19 Corey Marthaler 2016-09-21 23:18:37 UTC
Marking verified in the latest rpms. Many iterations of the same test that used to reproduce this right away now pass.



3.10.0-498.el7.x86_64
lvm2-2.02.165-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
lvm2-libs-2.02.165-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
lvm2-cluster-2.02.165-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
device-mapper-1.02.134-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
device-mapper-libs-1.02.134-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
device-mapper-event-1.02.134-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
device-mapper-event-libs-1.02.134-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.165-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016
sanlock-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
sanlock-lib-3.4.0-1.el7    BUILT: Fri Jun 10 11:41:03 CDT 2016
lvm2-lockd-2.02.165-3.el7    BUILT: Wed Sep 21 08:26:18 CDT 2016

Comment 21 errata-xmlrpc 2016-11-04 04:10:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html