Bug 213754 - volume group can be reported as inconsistent while displaying from one node and deleting from another
volume group can be reported as inconsistent while displaying from one node a...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror (Show other bugs)
4
All Linux
medium Severity low
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-02 15:54 EST by Corey Marthaler
Modified: 2010-01-11 21:01 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-05 17:37:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output of a failed 'lvs' (59.99 KB, text/plain)
2006-11-29 13:52 EST, Jonathan Earl Brassow
no flags Details
Metadata at seqno 3315 (3.31 KB, text/plain)
2006-11-29 15:06 EST, Jonathan Earl Brassow
no flags Details
Metadata at seqno 3316 (2.11 KB, text/plain)
2006-11-29 15:10 EST, Jonathan Earl Brassow
no flags Details

  None (edit)
Description Corey Marthaler 2006-11-02 15:54:11 EST
Description of problem:
I'm seeing the following message while doing looping mirror creation/deletions
on one node in the cluster and while also running lvs on another node in the
cluster.

  Volume group "vg" inconsistent

Version-Release number of selected component (if applicable):
[root@link-07 ~]# rpm -q lvm2
lvm2-2.02.13-1
[root@link-07 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.13-1
[root@link-07 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-13.0
[root@link-07 ~]# rpm -q device-mapper
device-mapper-1.02.12-3
Comment 1 Jonathan Earl Brassow 2006-11-13 16:39:15 EST
This can happen in a cluster when using clvmd, but operating on a NON-CLUSTERED
vg.  IOW, single machine mirroring is employed, but clvmd is the method of
activation.

Will try just single machine next.
Comment 2 Jonathan Earl Brassow 2006-11-13 16:50:41 EST
Haven't seen it when bypassing clvmd.
Comment 3 Alasdair Kergon 2006-11-15 11:59:09 EST
Insufficient info here about what was done, but if VG 'vg' is non-clustered, it
should not be accessible across a cluster.  If it's visible on more than one
machine and not marked CLUSTERED then expect problems.
Comment 4 Jonathan Earl Brassow 2006-11-28 22:34:42 EST
I've isolated the code to here:
lib/metadata/metadata.c:_vg_read
		/* FIXME Also ensure contents same - checksum compare? */
		if (correct_vg->seqno != vg->seqno) {
			inconsistent = 1;
			if (vg->seqno > correct_vg->seqno)
				correct_vg = vg;
		}

I don't know why the sequence numbers would be different if we are holding a
lock... we are, aren't we?
Comment 5 Jonathan Earl Brassow 2006-11-28 22:47:45 EST
Adding a simple print, we get:

  correct_vg->seqno(805) != vg->seqno(806)
  Volume group "vg" inconsistent
Comment 6 Jonathan Earl Brassow 2006-11-29 13:52:40 EST
Created attachment 142406 [details]
Output of a failed 'lvs'

This 'lvs -vvvv' gives the inconsistent error
Comment 7 Jonathan Earl Brassow 2006-11-29 15:06:23 EST
Created attachment 142418 [details]
Metadata at seqno 3315
Comment 8 Jonathan Earl Brassow 2006-11-29 15:10:36 EST
Created attachment 142419 [details]
Metadata at seqno 3316

These are the two metadata sets that were committed, nothing seems wrong with
them.
Comment 9 Jonathan Earl Brassow 2006-12-01 11:53:40 EST
I took out the partitions and used the whole device for the PVs.  I still get
the "inconsistent" message, but it prints things out properly.  Now I can
remember if this is equivalent to what I was seeing before...

  Volume group "vg" inconsistent
  LV       VG         Attr   LSize   Origin Snap%  Move Log     Copy%
  LogVol00 VolGroup00 -wi-ao  36.62G
  LogVol01 VolGroup00 -wi-ao 512.00M
  lv       vg         mwi-a-   5.00G                    lv_mlog   1.80

In any case, it is still present.
Comment 10 Jonathan Earl Brassow 2006-12-01 13:04:13 EST
Here are the locking operations made during 'create/change/delete':

CREATING:
  Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU:
LCK_READ/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x99) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU:
<UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYNpgq8hAq2sB2Y0ycqavEyuEFPy8KRAeU:
LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3288]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_READ/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x99) [3288]
  Logical volume "lv" created
  Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3288]
CHANGING:
  Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3359]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
<UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3359]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3359]
  Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3359]
REMOVING:
  Performing lock operation on V_vg: LCK_WRITE/VG (0x4) [3405]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_EXCL/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9d) [3405]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
<UNKNOWN>/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x98) [3405]
  Performing lock operation on
vvi2oi0LlyYIrYI0JdO4CR8otxQBN6oYMxodct0b1xUOEUKIzvaP4jCUiskjt8gx:
LCK_UNLOCK/LV/LCK_NONBLOCK/LCK_CLUSTER_VG (0x9e) [3405]
  Logical volume "lv" successfully removed
  Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6) [3405]

And here are the lock ops for an 'lvs':
#cluster_locking.c:435   Performing lock operation on V_VolGroup00: LCK_READ/VG
(0x1) [31631]
#cluster_locking.c:435   Performing lock operation on V_VolGroup00:
LCK_UNLOCK/VG (0x6) [31631]
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_READ/VG (0x1)
[31631]
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6)
[31631]
*** #toollib.c:348   Volume group "vg" inconsistent ***
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_WRITE/VG (0x4)
[31631]
#cluster_locking.c:435   Performing lock operation on V_vg: LCK_UNLOCK/VG (0x6)
[31631]
Comment 11 Jonathan Earl Brassow 2006-12-01 15:02:37 EST
Doesn't matter if O_DIRECT_SUPPORT is not defined (unused).

Comment 12 Jonathan Earl Brassow 2006-12-01 15:20:45 EST
Interesting...  I run lvs in ddd (aka gdb) and simply stop right after I aquire
the VG read lock.  This should have the effect of stopping the
create/change/delete loop on the other machine, but it doesn't... it just keeps
going.

Comment 13 Jonathan Earl Brassow 2006-12-01 17:36:19 EST
locking problem in lvm

vg locks should have been PR not CR

Comment 14 Corey Marthaler 2007-04-11 15:39:07 EDT
Have not seen inconsistent errors while executing the test case in comment #0.
Marking verified.
Comment 15 Chris Feist 2008-08-05 17:37:58 EDT
Fixed in current release (4.7).

Note You need to log in before you can comment on or make changes to this bug.