Description of problem:
Not sure if this is possible, but could a service be attempted on the next
server in line in a failure domain if the service is failed due to a loss of SAN
connectivity on the current running node? Or for that matter, a loss of SAN
connectivity on the node being relocated to?
Currenlty you will just see a failure and the following message:
<err> stop: Could not match /dev/foo/bar with a real device
Version-Release number of selected component (if applicable):
I was thinking that a solution to this would be to have a self_fence mechanism
similar to the one we have for the filesystem agent. Using this flag we can
trigger a reboot of the node who is not able to clean his tags. If a reboot is
performed the node is no more part of the cluster so the other node can proceed
in taking the ownership of the LVs.
I have modified lvm.sh adding the self_fence tag and the logic to perform the
reboot in case the node is not able to clean up his tags. I am attaching here
the patch created against the RHEL5.1 lvm.sh script.
What is you opinion about this approach to resolve this problem? Could you
review the script to see if there might be any problem with it?
I have tested it and it worked for me but obviously I would like to hear you
opinion on this matter before saying anything to customer.
Created attachment 280961 [details]
proposed patch to add self_fence to lvm resource agent
Has this patch made it into a build for QA to test?
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.