Bug 213754

Summary: volume group can be reported as inconsistent while displaying from one node and deleting from another
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: medium    
Version: 4CC: agk, ccaulfie, cfeist, dwysocha, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-05 21:37:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of a failed 'lvs'
none
Metadata at seqno 3315
none
Metadata at seqno 3316 none

Description Corey Marthaler 2006-11-02 20:54:11 UTC
Description of problem:
I'm seeing the following message while doing looping mirror creation/deletions
on one node in the cluster and while also running lvs on another node in the
cluster.

  Volume group "vg" inconsistent

Version-Release number of selected component (if applicable):
[root@link-07 ~]# rpm -q lvm2
lvm2-2.02.13-1
[root@link-07 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.13-1
[root@link-07 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-13.0
[root@link-07 ~]# rpm -q device-mapper
device-mapper-1.02.12-3

Comment 1 Jonathan Earl Brassow 2006-11-13 21:39:15 UTC
This can happen in a cluster when using clvmd, but operating on a NON-CLUSTERED
vg.  IOW, single machine mirroring is employed, but clvmd is the method of
activation.

Will try just single machine next.


Comment 2 Jonathan Earl Brassow 2006-11-13 21:50:41 UTC
Haven't seen it when bypassing clvmd.


Comment 3 Alasdair Kergon 2006-11-15 16:59:09 UTC
Insufficient info here about what was done, but if VG 'vg' is non-clustered, it
should not be accessible across a cluster.  If it's visible on more than one
machine and not marked CLUSTERED then expect problems.

Comment 4 Jonathan Earl Brassow 2006-11-29 03:34:42 UTC
I've isolated the code to here:
lib/metadata/metadata.c:_vg_read
		/* FIXME Also ensure contents same - checksum compare? */
		if (correct_vg->seqno != vg->seqno) {
			inconsistent = 1;
			if (vg->seqno > correct_vg->seqno)
				correct_vg = vg;
		}

I don't know why the sequence numbers would be different if we are holding a
lock... we are, aren't we?


Comment 5 Jonathan Earl Brassow 2006-11-29 03:47:45 UTC
Adding a simple print, we get:

  correct_vg->seqno(805) != vg->seqno(806)
  Volume group "vg" inconsistent


Comment 6 Jonathan Earl Brassow 2006-11-29 18:52:40 UTC
Created attachment 142406 [details]
Output of a failed 'lvs'

This 'lvs -vvvv' gives the inconsistent error

Comment 7 Jonathan Earl Brassow 2006-11-29 20:06:23 UTC
Created attachment 142418 [details]
Metadata at seqno 3315

Comment 8 Jonathan Earl Brassow 2006-11-29 20:10:36 UTC
Created attachment 142419 [details]
Metadata at seqno 3316

These are the two metadata sets that were committed, nothing seems wrong with
them.

Comment 9 Jonathan Earl Brassow 2006-12-01 16:53:40 UTC
I took out the partitions and used the whole device for the PVs.  I still get
the "inconsistent" message, but it prints things out properly.  Now I can
remember if this is equivalent to what I was seeing before...

  Volume group "vg" inconsistent
  LV       VG         Attr   LSize   Origin Snap%  Move Log     Copy%
  LogVol00 VolGroup00 -wi-ao  36.62G
  LogVol01 VolGroup00 -wi-ao 512.00M
  lv       vg         mwi-a-   5.00G                    lv_mlog   1.80

In any case, it is still present.


Comment 10 Jonathan Earl Brassow 2006-12-01 18:04:13 UTC
Here are the locking operations made during 'create/change/delete':

CREATING:
  Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU:
LCK_READ/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x99) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU:
<UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU:
LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_READ/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x99) [3288]
  Logical volume "lv" created
  Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3288]
CHANGING:
  Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3359]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
<UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3359]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3359]
  Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3359]
REMOVING:
  Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3405]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_EXCL/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9d) [3405]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
<UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3405]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3405]
  Logical volume "lv" successfully removed
  Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3405]

And here are the lock ops for an 'lvs':
#cluster_locking.c:435   Performing lock operation on V_VolGroup00: LCK_READ/VG
(0x1) [31631]
#cluster_locking.c:435   Performing lock operation on V_VolGroup00:
LCK_UNLOCK/VG (0x6) [31631]
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_READ/VG (0x1)
[31631]
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6)
[31631]
*** #toollib.c:348   Volume group "vg" inconsistent ***
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_WRITE/VG (0x4)
[31631]
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6)
[31631]


Comment 11 Jonathan Earl Brassow 2006-12-01 20:02:37 UTC
Doesn't matter if O_DIRECT_SUPPORT is not defined (unused).



Comment 12 Jonathan Earl Brassow 2006-12-01 20:20:45 UTC
Interesting...  I run lvs in ddd (aka gdb) and simply stop right after I aquire
the VG read lock.  This should have the effect of stopping the
create/change/delete loop on the other machine, but it doesn't... it just keeps
going.



Comment 13 Jonathan Earl Brassow 2006-12-01 22:36:19 UTC
locking problem in lvm

vg locks should have been PR not CR



Comment 14 Corey Marthaler 2007-04-11 19:39:07 UTC
Have not seen inconsistent errors while executing the test case in comment #0.
Marking verified.

Comment 15 Chris Feist 2008-08-05 21:37:58 UTC
Fixed in current release (4.7).