165832 – Clvm fails, seems to be trying to lock local LV's

Bug 165832 - Clvm fails, seems to be trying to lock local LV's

Summary: Clvm fails, seems to be trying to lock local LV's

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	lvm2-cluster
Sub Component:
Version:	4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Christine Caulfield
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	165787
TreeView+	depends on / blocked

Reported:	2005-08-12 18:11 UTC by Josef Bacik
Modified:	2010-01-12 04:04 UTC (History)
CC List:	2 users (show)
Fixed In Version:	U3
Clone Of:
Environment:
Last Closed:	2006-03-07 19:52:55 UTC
Embargoed:

Attachments	(Terms of Use)
straces of clvmd and vgchange in the init script while the box is starting up (1.56 MB, application/x-bzip2) 2005-08-12 18:16 UTC, Josef Bacik	no flags	Details
View All

Description Josef Bacik 2005-08-12 18:11:12 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050719 Red Hat/1.0.6-1.4.1 Firefox/1.0.6

Description of problem:
Basically CLVM was setup on one of the nodes, and then another node came up and it was getting a connect() failed error.  the only way to resovle was to set the locking type to one.

See the output from a run of lvcreate on twedldum...

        Processing: lvcreate -vvv -s --permission r -n var_snapshot -L
200M /dev/vg0/var
        O_DIRECT will be used
      Setting global/locking_type to 2
      Setting global/locking_library to /lib/liblvm2clusterlock.so
      Opening shared locking library /lib/liblvm2clusterlock.so
    Loaded external locking library /lib/liblvm2clusterlock.so
      External locking enabled.
    Setting chunksize to 16 sectors.
      Getting target version for snapshot
        dm version   O
        dm versions   O
      Getting target version for snapshot-origin
        dm versions   O
      Locking V_vg0 at 0x4
    Finding volume group "vg0"
        Opened /dev/md0 RW
        /dev/md0: block size is 1024 bytes
      /dev/md0: No label detected
        Closed /dev/md0
        Opened /dev/etherd/e0.0 RW
        /dev/etherd/e0.0: block size is 4096 bytes
      /dev/etherd/e0.0: lvm2 label detected
        Closed /dev/etherd/e0.0
        lvmcache: /dev/etherd/e0.0 now orphaned
        Opened /dev/etherd/e0.0 RW
        /dev/etherd/e0.0: block size is 4096 bytes
        Closed /dev/etherd/e0.0
        lvmcache: /dev/etherd/e0.0 now in VG alice
        Opened /dev/md1 RW
        /dev/md1: block size is 4096 bytes
      /dev/md1: lvm2 label detected
        Closed /dev/md1
        lvmcache: /dev/md1 now orphaned
        Opened /dev/md1 RW
        /dev/md1: block size is 4096 bytes
        Closed /dev/md1
        lvmcache: /dev/md1 now in VG vg0
        Opened /dev/md1 RW
        /dev/md1: block size is 4096 bytes
      /dev/md1: lvm2 label detected
        Closed /dev/md1
        Opened /dev/md1 RW
        /dev/md1: block size is 4096 bytes
        Closed /dev/md1
        Opened /dev/md1 RW
        /dev/md1: block size is 4096 bytes
      /dev/md1: lvm2 label detected
        Read vg0 metadata (41) from /dev/md1 at 108544 size 2712
        Closed /dev/md1
  Rounding up size to full physical extent 224.00 MB
    Creating logical volume var_snapshot
        Allowing allocation on /dev/md1 start PE 1200 length 21
    Archiving volume group "vg0" metadata.
        Opened /dev/md1 RW
        /dev/md1: block size is 4096 bytes
        Writing vg0 metadata to /dev/md1 at 111616 len 2951
    Creating volume group backup "/etc/lvm/backup/vg0"
        Writing vg0 metadata to
/etc/lvm/backup/.lvm_twedldum.yewess.us_24881_499525191
        Committing vg0 metadata (42)
        Renaming /etc/lvm/backup/vg0.tmp to /etc/lvm/backup/vg0
        Committing vg0 metadata (42) to /dev/md1 header at 2048
        Closed /dev/md1
      Locking
zxDALJyxHmoZQ6qxuho4QMfvZuqU9GbuqAh32luObinFHZg1Cm2LbPjx0rtGep2X at 0x19
  Error locking on node twedldee.yewess.us: Internal lvm error, check
syslog
  Aborting. Failed to activate snapshot exception store. Remove new LV
and retry.
      Locking V_vg0 at 0x6

However, on twedldum-
vgdisplay -vvv shows:
vg0 UUID: zxDALJ-yxHm-oZQ6-qxuh-o4QM-fvZu-qU9Gbu
vg0/var_snapshot UUID: qAh32l-uObi-nFHZ-g1Cm-2LbP-jx0r-tGep2X

LVM2 is trying to lock /vg0/var_snapshot on twedldee (the other node)!
This is impossible because it is a local disk.  Twedldee gets the UUID
and says "WTF is that" return a locking failure.  We don't see the error
because it would be generated on the other node.

What's happening on clvmd startup is that locking is failing for local
filesystems but clvmd only "sees" the "lock failure" state.  I think this is generating the connect failure messages somehow.  It is
probably assuming that the other node has a lock on the non-clustered VG freaks out and reports it as a connect failure.

vgchange -ay then goes into a dead-lock situation waiting for the other node to relase the lock the first node thinks its holding.  This will never happen because clvmd on the other node has no knowledge of the first nodes local filesystem UUIDs.  Therefor, clvmd startup ends up in a deadlock situation for a local VG that doesn't need locking in the first place!

LVM2 Logic should be added/corrected so that it ignores cluster locking
when addressing non-clustered volumes.  Non-clustered volume groups have
a special "clustered" status whereas local volumes do not.  Perhaps this
state can be used in some way?


Version-Release number of selected component (if applicable):
lvm2-cluster-2.0.1.09-3.0

How reproducible:
Always

Steps to Reproduce:
1.Setup CLVM on one node
2.Bring up another node
3.
  

Actual Results:  The other node can't see the CLVM stuff and cannot start CLVM

Expected Results:  The other node shouldn't get these errors.

Additional info:

Comment 1 Josef Bacik 2005-08-12 18:16:51 UTC

Created attachment 117684 [details]
straces of clvmd and vgchange in the init script while the box is starting up

Comment 2 Christine Caulfield 2005-08-15 08:13:49 UTC

Did you mark the VG non-clustered ?

vghcnage -cn vg0

Comment 3 Chris Evich 2005-08-15 13:49:27 UTC

Not AFAIK, the only clustered filesystem is alice on e0.0, vg0 is the local VG.
 Output of vgdisplay doesn't indicate vg0 is clustered, only alice.  Though I
just noticed, alice does not show as "Shared", could that be relivant to this issue?

[root@twedldum ~]# vgdisplay
  --- Volume group ---
  VG Name               vg0
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  55
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                8
  Open LV               5
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               38.16 GB
  PE Size               32.00 MB
  Total PE              1221
  Alloc PE / Size       1200 / 37.50 GB
  Free  PE / Size       21 / 672.00 MB
  VG UUID               zxDALJ-yxHm-oZQ6-qxuh-o4QM-fvZu-qU9Gbu
    
  --- Volume group ---
  VG Name               alice
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  46
  VG Access             read/write
  VG Status             resizable
  Clustered             yes
  Shared                no
  MAX LV                0
  Cur LV                5
  Open LV               4
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               33.73 GB
  PE Size               8.00 MB
  Total PE              4318
  Alloc PE / Size       921 / 7.20 GB
  Free  PE / Size       3397 / 26.54 GB
  VG UUID               YmkiNE-ATUE-PHRV-vj5l-h4Ci-bnok-X1Ln7d

Comment 4 Chris Evich 2005-08-15 13:52:08 UTC

[root@twedldum ~]# vgchange -cn /dev/vg0
  /dev/cdrom: open failed: Read-only file system
  Volume group "vg0" is already not clustered
[root@twedldum ~]# vgchange -cy /dev/alice
  /dev/cdrom: open failed: Read-only file system
  Volume group "alice" is already clustered

Comment 5 Christine Caulfield 2005-08-18 14:52:55 UTC

A fix for this is in CVS, but I'm not sure when it will arrive in a package.

Note You need to log in before you can comment on or make changes to this bug.