Our HA cluster is configured as follows:
Two node cluster - one node in the 'B' datacentre and one node in the 'C' datacentre.
Two disk arrays - one in each datacentre.
Two services - each using lvm volumes that are mirrored across the two disk arrays.
IPMI is the only automatic fencing mechanism.
To simulate a failure of the 'C' datacentre, we simultaneously shut the power off to the 'C' node while
disabling the SAN ports to the 'C' disk array. To prevent the 'B' node from fencing the 'C' node - the 'B'
node network interface that connects to the 'C' node IPMI device was disabled.
After the 'C' node had missed too many heartbeats, the 'B' node attempted to fence the 'C' node using
fence_ipmilan. This failed because the 'B' node couldn't connect to the 'C' node IPMI device.
I then initiated a manual fence with the fence_ack_manual command. The 'B' node successfully took
over the services from the 'C' node. It handled the volume group inconsistencies, and successfully
activated the previously mirrored volumes as linear volumes.
Up to this point I'm very happy with how it's operating!
The problems begin if I then power on the 'C' node again. At the point when the 'C' node is powered on,
the 'B' node is running all the services and the SAN ports to the 'C' disk array are still unavailable.
When rgmanager starts on the 'C' node, it attempts to stop all the resources that are running on the 'B'
node. It then appears to attempt to start the services locally - even though they are running on the B-
node. When I run clustat on the 'C' node, it now reports that all the services are failed and that the last
node they ran on was the 'C' node.
I wanted to see if the logical volumes were still active on the 'B' node; however, when I entered the 'lvs -
a -o +devices,tags' command on the 'B' node, it hung and never returned. No LVM commands would
return on the 'B' node. The only way I could recover the 'B' node was to power it off and on again. I
couldn't reboot the node because the Cluster Suite services were hung.
When I enter the 'lvs -a -o +devices,tags' command on the 'C' node, the Cluster Suite-managed
volumes are NOT active, but they are tagged with BOTH nodenames!
I'm seeing something very different from you, but it may be worth trying with the changes I've made.
Here's what I see:
Site fail-over works fine. If I reactivate the failed site (including the storage device), when the service
tries to move back, it fails to activate due to a conflict it sees in the available devices. [The failed device
has now come back - leaving a LVM metadata conflict.] This leaves the service in the 'failed' state.
Here's what I've done.
I've added some code to determine what the valid devices are, and use those and only those devices
when activating. This solved the problem for me. You will need to ensure that this works fine with
your multipath setup. I don't think there should be issues in that regard, but I don't want to guess.
This may not be the issue you are seeing, but the bug I found could certainly cause similar problems.
Be sure that you have the latest updates. I've attached the lvm.sh file to be placed in /usr/share/cluster
on all the machines. When we've gone through a few successful iterations of testing we will be sure to
commit the changes.
Created attachment 152701 [details]
lvm.sh script with bad device exclusion
bad device exclusion script with minor changes checked-in
assigned -> post
Another concern I have in the user's implementation is the initrd. The initrd
should (must) contain the correctly modified lvm.conf
This has been built and is in the current RHEL4 release of rgmanager.