Description of problem: There seems to be some kind of timing issue regression with vgsplits. Sometimes the cmd will work and sometimes the exact cmd will not. [root@grant-03 lvm]# vgchange -an linear_8_1953 0 logical volume(s) in volume group "linear_8_1953" now active [root@grant-03 lvm]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Convert Devices LogVol00 VolGroup00 -wi-ao 72.34G /dev/sda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/sda2(2315) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdd1(0) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdd2(0) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdd3(0) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdd4(0) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdb4(0) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdb1(0) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdb2(0) linear_8_19530 linear_8_1953 -wim-- 1.44T /dev/sdb3(0) [root@grant-03 lvm]# vgsplit linear_8_1953 split_777 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4 Error locking on node grant-03: Volume group for uuid not found: BWegvBBwJjJTw8hoTqsqPHMaZafd6Ua3Mt0yjPbKHYrOGmlAgTAZzx4ycepvowTd Logical volume "linear_8_19530" must be inactive [ *after about 5 minutes* ] [root@grant-03 lvm]# vgsplit linear_8_1953 split_777 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4 New volume group "split_777" successfully split from "linear_8_1953" # it fails right away when attempting it again. [root@grant-03 lvm]# vgsplit split_777 linear_8_1953 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4 Error locking on node grant-03: Volume group for uuid not found: 2l0KXkdTQMeosoEBYSc65rW1qpctLRTAMt0yjPbKHYrOGmlAgTAZzx4ycepvowTd Logical volume "linear_8_19530" must be inactive # after a vgscan, it's works: [root@grant-03 lvm]# vgscan Reading all physical volumes. This may take a while... Found volume group "split_777" using metadata type lvm2 Found volume group "VolGroup00" using metadata type lvm2 Device '/dev/sda2' has been left open. Device '/dev/sda2' has been left open. [root@grant-03 lvm]# vgsplit split_777 linear_8_1953 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdd1 /dev/sdd2 /dev/sdd3 /dev/sdd4 New volume group "linear_8_1953" successfully split from "split_777" Version-Release number of selected component (if applicable): 2.6.9-68.26.ELsmp lvm2-2.02.35-1.el4 lvm2-cluster-2.02.35-1.el4
This bug exists in the new 4.7 rpms as well. [lvm_cluster_config] VOLUME SPLIT split_708 back into linear_6_3960 on grant-01 [lvm_cluster_config] Error locking on node grant-01: Volume group for uuid not found: jWXAJg36Jvf4EbQuhgWMzxbzRFxjiUmUH05ERTuAC0fG3I9CsTWhAeGRlXctcDUm [lvm_cluster_config] Logical volume "linear_6_39600" must be inactive [lvm_cluster_config] vgsplit failed: [lvm_cluster_config] qarsh root@grant-01 vgsplit split_708 linear_6_3960 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb5 /dev/sdb6 /dev/sdb7 lvm2-2.02.36-1.el4 lvm2-cluster-2.02.36-1.el4
[root@grant-03 ~]# rpm -q cman cman-kernel dlm dlm-kernel lvm2 lvm2-cluster device-mapper ccs rgmanager cman-1.0.23-1 cman-kernel-2.6.9-55.4 dlm-1.0.7-1 dlm-kernel-2.6.9-53.3 lvm2-2.02.36-1.el4 lvm2-cluster-2.02.36-1.el4 device-mapper-1.02.25-1.el4 ccs-1.0.12-1 rgmanager-1.9.76-1
I'm working on reproducing this but so far no luck. I have a 3-node xen cluster with the above RPMs installed. I have a volume group comprised of 5 multipath devices (iscsi). The failure message "Volume group for uuid not found" comes from the following snippit in lv_from_lvid(): if (!(vg = _vg_read_by_vgid(cmd, (char *)lvid->id[0].uuid, precommitted))) { log_error("Volume group for uuid not found: %s", lvid_s); return NULL; }
Interestingly, I just saw the failure message, but with a different sequence. 1) start the cluster services on all nodes. At this point, the volume group was already created, so I did not do a "vgcreate" 2) lvcreate of a linear LV 3) vgsplit 4) lvremove #4 is where the message showed up. Running the lvremove repeatedly fails. Note that no node has the lv active. [root@rhel4u5-node1 ~]# lvremove vg1/lv0linear Error locking on node rhel4u5-node1: Volume group for uuid not found: gt6JWrWklmJMDNSovETTSjABgCaiXGGXbQyTXsetoNQow0gH3dLkwYQDQCpc54uc Logical volume "lv0linear" is active on other cluster nodes. Really remove? [y/n]: Just as in corey's case, a vgscan solves the problem.
I can reproduce corey's failure now. "clvmd -R" fixes the problem as well. Might be a caching problem.
This looks to be a more generic caching problem. We are debugging it now. Another sequence that will trigger the locking error is a vgcreate followed by an lvcreate. The lvcreate fails with a similar locking error.
Fixed in lvm2-2.02.37-1.el4
There looks to still be issues when vgspliting on cluster mirrors, I'll mark this bz verified and open another one for that issue.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0776.html