I have 4 nodes in my cluster bp-xen-0[1234]. I run 'lvcreate -m1 -L 25G -n $(hostname -s) vg' on all machines. 2 out of 4 fail. Here is one of the failures: [root@bp-xen-02 ~]# lvcreate -m1 -L 25G -n $(hostname -s) vg /dev/vg/bp-xen-02: not found: device not cleared Aborting. Failed to wipe start of new LV. [root@bp-xen-02 ~]# lvcreate -m1 -L 25G -n bp-xXx-02 vg Logical volume "bp-xXx-02" created You will note with a little substitution, the second command succeeds. Also, I have tried putting in 'bp-xen-02' instead of $(hostname -s) and it still fails. If I run 'lvcreate -m1 -L 25G -n bp-xen-02 vg' from bp-xen-01, it succeeds; but it always fails on bp-xen-02. I have removed /etc/lvm/cache/.cache and restarted clvmd on all machines, but the results are always the same - bp-xen-0[23] always fail. I will attach -vvvv outputs.
Created attachment 305501 [details] Failed creation of 'bp-xen-02' from bp-xen-02
Created attachment 305502 [details] Successful creation of 'bpxen02' Note that the '-'s don't matter, I tried 'bp-xXx-02', and that succeeded too.
All machines are verified identical: [root@bp-xen-01 ~]# rpm -q --queryformat "%{NAME}-%{VERSION}-%{RELEASE} %{BUILDTIME}\n" kernel-xen cman openais lvm2 lvm2-cluster device-mapper; uname -a kernel-xen-2.6.18-91.el5 1208903991 cman-2.0.84-2.el5 1208290778 openais-0.80.3-15.el5 1207125749 lvm2-2.02.32-1.4 1202233817 lvm2-cluster-2.02.32-1.4 1202233880 device-mapper-1.02.24-1.el5 1200609965 Linux bp-xen-01 2.6.18-91.el5xen #1 SMP Tue Apr 22 17:59:53 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
Easier dates to read: [root@bp-xen-01 ~]# ./rpmdate kernel-xen cman openais lvm2 lvm2-cluster device-mapper; uname -a kernel-xen-2.6.18-91.el5 BUILT: Tue Apr 22 17:39:51 CDT 2008 cman-2.0.84-2.el5 BUILT: Tue Apr 15 15:19:38 CDT 2008 openais-0.80.3-15.el5 BUILT: Wed Apr 2 03:42:29 CDT 2008 lvm2-2.02.32-1.4 BUILT: Tue Feb 5 11:50:17 CST 2008 lvm2-cluster-2.02.32-1.4 BUILT: Tue Feb 5 11:51:20 CST 2008 device-mapper-1.02.24-1.el5 BUILT: Thu Jan 17 16:46:05 CST 2008 Linux bp-xen-01 2.6.18-91.el5xen #1 SMP Tue Apr 22 17:59:53 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
ideas: 1) kill udev 2) change hostname on machine 2 and try again...
and test single machine...
This seems to be the same problem like bug 449344 resp. bug 449350 - clvmd hold some old lock... # lvcreate -L 100M -n lv1 vg_local /dev/vg_local/lv: open failed: No such device or address /dev/vg_local/lv1: not found: device not cleared Aborting. Failed to wipe start of new LV. # vgchange -a n vg_local /dev/vg_local/lv: open failed: No such device or address 0 logical volume(s) in volume group "vg_local" now active # vgchange -a y vg_local /dev/vg_local/lv: open failed: No such device or address 0 logical volume(s) in volume group "vg_local" now active after restrating clvmd on local node it works,
As mentioned in comment #7 - probably old lock in clvmd hash table caused this. There were several fixes in RHEL5.4/5.5 which adresses source of this problem, so I hope it is fixed. If not, please reopen and paste here reproducer script. (I was not able to reproduce it anyway even with old code without cheating with lock table:-)