Description of problem: I've seen this quite a few time when bringing up my cluster. After everyone is in the cman cluster and the clvmd service is in the run state: [root@morph-01 root]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 6 M morph-01 2 1 6 M morph-06 3 1 6 M morph-04 4 1 6 M morph-03 5 1 6 M morph-02 6 1 6 M morph-05 [root@morph-01 root]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [1 3 6 2 5 4] DLM Lock Space: "clvmd" 2 3 run - [1 3 2 4 5 6] I attempt on one of the nodes to vgremove a volume. This causes these cdrom drive errors along with a failure of the vgremove cmd. [root@morph-01 root]# vgremove corey cluster send request failed: Invalid argument hdc: packet command error: status=0x51 { DriveReady SeekComplete Error } hdc: packet command error: error=0x54 I then try the remove again and it hangs and the cluster needs to be rebooted. I did turn on the cdrom filter in /etc/lvm/lvm.conf and added hdc but that doesn't seem to help or do anything. # Exclude the cdrom drive filter = [ "r|/dev/cdrom|hdc" ] How reproducible: Sometimes
I'm not convinced that the cdrom messages have anything to do with this bug because even though I always see them right before seeing this bug I do also see them other times without issue.
yes, the cdrom messages are a red herring. This is a bug introduced by me fixing a different bug yesterday.
it's more complicated than even that. The following checkin fixes clvmd to cope with more then one VG lock, but there seems to be an LVM command-line bug in there too. I need to check with agk about that. Checking in clvmd-cman.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd-cman.c,v <-- clvmd-cman.c new revision: 1.2; previous revision: 1.1 done Checking in clvmd-command.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd-command.c,v <-- clvmd-command.c new revision: 1.3; previous revision: 1.2 done Checking in clvmd.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v <-- clvmd.c new revision: 1.3; previous revision: 1.2 done Checking in clvmd.h; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.h,v <-- clvmd.h new revision: 1.2; previous revision: 1.1 done Checking in cnxman-socket.h; /cvs/lvm2/LVM2/daemons/clvmd/cnxman-socket.h,v <-- cnxman-socket.h new revision: 1.3; previous revision: 1.2 done
This works for me now. Alasdair has given provisional blessing to the change, but it's in CVS anyhow.
It looks to be the only line that got missed when the locking lines were converted to use new definitions, LCK_VG_WRITE.
Will be in LVM2 2.00.25.
fix verified.
Updating version to the right level in the defects. Sorry for the storm.