Bug 444608 - vgsplit failure due to locking errors
vgsplit failure due to locking errors
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: lvm2 (Show other bugs)
4.0
All Linux
high Severity high
: rc
: ---
Assigned To: Milan Broz
Corey Marthaler
: Regression
Depends On: 450474
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-29 10:58 EDT by Corey Marthaler
Modified: 2013-02-28 23:06 EST (History)
9 users (show)

See Also:
Fixed In Version: RHBA-2008-0776
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-24 16:08:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2008-04-29 10:58:45 EDT
Description of problem:
There seems to be some kind of timing issue regression with vgsplits. Sometimes
the cmd will work and sometimes the exact cmd will not.

[root@grant-03 lvm]# vgchange -an linear_8_1953
  0 logical volume(s) in volume group "linear_8_1953" now active

[root@grant-03 lvm]# lvs -a -o +devices
  LV             VG            Attr   LSize  Origin Snap%  Move Log Copy% 
Convert Devices
  LogVol00       VolGroup00    -wi-ao 72.34G                                   
   /dev/sda2(0)
  LogVol01       VolGroup00    -wi-ao  1.94G                                   
   /dev/sda2(2315)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdd1(0)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdd2(0)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdd3(0)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdd4(0)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdb4(0)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdb1(0)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdb2(0)
  linear_8_19530 linear_8_1953 -wim--  1.44T                                   
   /dev/sdb3(0)

[root@grant-03 lvm]# vgsplit linear_8_1953 split_777 /dev/sdb1 /dev/sdb2
/dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4
  Error locking on node grant-03: Volume group for uuid not found:
BWegvBBwJjJTw8hoTqsqPHMaZafd6Ua3Mt0yjPbKHYrOGmlAgTAZzx4ycepvowTd
  Logical volume "linear_8_19530" must be inactive

[ *after about 5 minutes* ]

[root@grant-03 lvm]# vgsplit linear_8_1953 split_777 /dev/sdb1 /dev/sdb2
/dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4
  New volume group "split_777" successfully split from "linear_8_1953"

# it fails right away when attempting it again.

[root@grant-03 lvm]# vgsplit split_777 linear_8_1953 /dev/sdb1 /dev/sdb2
/dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4
  Error locking on node grant-03: Volume group for uuid not found:
2l0KXkdTQMeosoEBYSc65rW1qpctLRTAMt0yjPbKHYrOGmlAgTAZzx4ycepvowTd
  Logical volume "linear_8_19530" must be inactive

# after a vgscan, it's works:

[root@grant-03 lvm]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "split_777" using metadata type lvm2
  Found volume group "VolGroup00" using metadata type lvm2
  Device '/dev/sda2' has been left open.
  Device '/dev/sda2' has been left open.
[root@grant-03 lvm]# vgsplit split_777 linear_8_1953 /dev/sdb1 /dev/sdb2
/dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4
  New volume group "linear_8_1953" successfully split from "split_777"


Version-Release number of selected component (if applicable):
2.6.9-68.26.ELsmp
lvm2-2.02.35-1.el4
lvm2-cluster-2.02.35-1.el4
Comment 1 Corey Marthaler 2008-04-30 18:19:28 EDT
This bug exists in the new 4.7 rpms as well.

[lvm_cluster_config] VOLUME SPLIT split_708 back into linear_6_3960 on grant-01
[lvm_cluster_config]   Error locking on node grant-01: Volume group for uuid not
found: jWXAJg36Jvf4EbQuhgWMzxbzRFxjiUmUH05ERTuAC0fG3I9CsTWhAeGRlXctcDUm
[lvm_cluster_config]   Logical volume "linear_6_39600" must be inactive
[lvm_cluster_config] vgsplit failed:
[lvm_cluster_config] qarsh root@grant-01 vgsplit split_708 linear_6_3960
/dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb5 /dev/sdb6 /dev/sdb7

lvm2-2.02.36-1.el4
lvm2-cluster-2.02.36-1.el4

Comment 2 Dave Wysochanski 2008-05-01 09:43:13 EDT
[root@grant-03 ~]# rpm -q cman cman-kernel dlm dlm-kernel lvm2 lvm2-cluster
device-mapper ccs rgmanager
cman-1.0.23-1
cman-kernel-2.6.9-55.4
dlm-1.0.7-1
dlm-kernel-2.6.9-53.3
lvm2-2.02.36-1.el4
lvm2-cluster-2.02.36-1.el4
device-mapper-1.02.25-1.el4
ccs-1.0.12-1
rgmanager-1.9.76-1
Comment 3 Dave Wysochanski 2008-05-01 12:51:23 EDT
I'm working on reproducing this but so far no luck.  I have a 3-node xen cluster
with the above RPMs installed.  I have a volume group comprised of 5 multipath
devices (iscsi).

The failure message "Volume group for uuid not found" comes from the following
snippit in lv_from_lvid():
        if (!(vg = _vg_read_by_vgid(cmd, (char *)lvid->id[0].uuid, precommitted))) {
                log_error("Volume group for uuid not found: %s", lvid_s);
                return NULL;
        }

Comment 4 Dave Wysochanski 2008-05-01 13:00:35 EDT
Interestingly, I just saw the failure message, but with a different sequence.
1) start the cluster services on all nodes.  At this point, the volume group was
already created, so I did not do a "vgcreate"
2) lvcreate of a linear LV
3) vgsplit
4) lvremove

#4 is where the message showed up.  Running the lvremove repeatedly fails.  Note
that no node has the lv active.

[root@rhel4u5-node1 ~]# lvremove vg1/lv0linear
  Error locking on node rhel4u5-node1: Volume group for uuid not found:
gt6JWrWklmJMDNSovETTSjABgCaiXGGXbQyTXsetoNQow0gH3dLkwYQDQCpc54uc
Logical volume "lv0linear" is active on other cluster nodes.  Really remove? [y/n]: 

Just as in corey's case, a vgscan solves the problem.
Comment 5 Dave Wysochanski 2008-05-01 15:03:15 EDT
I can reproduce corey's failure now.  "clvmd -R" fixes the problem as well. 
Might be a caching problem.
Comment 6 Dave Wysochanski 2008-05-01 18:18:20 EDT
This looks to be a more generic caching problem.  We are debugging it now. 
Another sequence that will trigger the locking error is a vgcreate followed by
an lvcreate.  The lvcreate fails with a similar locking error.
Comment 15 Milan Broz 2008-06-09 16:08:51 EDT
Fixed in lvm2-2.02.37-1.el4
Comment 17 Corey Marthaler 2008-06-10 16:39:51 EDT
There looks to still be issues when vgspliting on cluster mirrors, I'll mark
this bz verified and open another one for that issue.
Comment 19 errata-xmlrpc 2008-07-24 16:08:10 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0776.html

Note You need to log in before you can comment on or make changes to this bug.