Bug 444608
Summary: | vgsplit failure due to locking errors | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Corey Marthaler <cmarthal> |
Component: | lvm2 | Assignee: | Milan Broz <mbroz> |
Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.0 | CC: | agk, ccaulfie, dwysocha, edamato, heinzm, jbrassow, mbroz, prockai, pvrabec |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2008-0776 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-07-24 20:08:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 450474 | ||
Bug Blocks: |
Description
Corey Marthaler
2008-04-29 14:58:45 UTC
This bug exists in the new 4.7 rpms as well. [lvm_cluster_config] VOLUME SPLIT split_708 back into linear_6_3960 on grant-01 [lvm_cluster_config] Error locking on node grant-01: Volume group for uuid not found: jWXAJg36Jvf4EbQuhgWMzxbzRFxjiUmUH05ERTuAC0fG3I9CsTWhAeGRlXctcDUm [lvm_cluster_config] Logical volume "linear_6_39600" must be inactive [lvm_cluster_config] vgsplit failed: [lvm_cluster_config] qarsh root@grant-01 vgsplit split_708 linear_6_3960 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb5 /dev/sdb6 /dev/sdb7 lvm2-2.02.36-1.el4 lvm2-cluster-2.02.36-1.el4 [root@grant-03 ~]# rpm -q cman cman-kernel dlm dlm-kernel lvm2 lvm2-cluster device-mapper ccs rgmanager cman-1.0.23-1 cman-kernel-2.6.9-55.4 dlm-1.0.7-1 dlm-kernel-2.6.9-53.3 lvm2-2.02.36-1.el4 lvm2-cluster-2.02.36-1.el4 device-mapper-1.02.25-1.el4 ccs-1.0.12-1 rgmanager-1.9.76-1 I'm working on reproducing this but so far no luck. I have a 3-node xen cluster with the above RPMs installed. I have a volume group comprised of 5 multipath devices (iscsi). The failure message "Volume group for uuid not found" comes from the following snippit in lv_from_lvid(): if (!(vg = _vg_read_by_vgid(cmd, (char *)lvid->id[0].uuid, precommitted))) { log_error("Volume group for uuid not found: %s", lvid_s); return NULL; } Interestingly, I just saw the failure message, but with a different sequence. 1) start the cluster services on all nodes. At this point, the volume group was already created, so I did not do a "vgcreate" 2) lvcreate of a linear LV 3) vgsplit 4) lvremove #4 is where the message showed up. Running the lvremove repeatedly fails. Note that no node has the lv active. [root@rhel4u5-node1 ~]# lvremove vg1/lv0linear Error locking on node rhel4u5-node1: Volume group for uuid not found: gt6JWrWklmJMDNSovETTSjABgCaiXGGXbQyTXsetoNQow0gH3dLkwYQDQCpc54uc Logical volume "lv0linear" is active on other cluster nodes. Really remove? [y/n]: Just as in corey's case, a vgscan solves the problem. I can reproduce corey's failure now. "clvmd -R" fixes the problem as well. Might be a caching problem. This looks to be a more generic caching problem. We are debugging it now. Another sequence that will trigger the locking error is a vgcreate followed by an lvcreate. The lvcreate fails with a similar locking error. Fixed in lvm2-2.02.37-1.el4 There looks to still be issues when vgspliting on cluster mirrors, I'll mark this bz verified and open another one for that issue. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0776.html |