Description of problem: Not sure if this is possible, but could a service be attempted on the next server in line in a failure domain if the service is failed due to a loss of SAN connectivity on the current running node? Or for that matter, a loss of SAN connectivity on the node being relocated to? Currenlty you will just see a failure and the following message: <err> stop: Could not match /dev/foo/bar with a real device Version-Release number of selected component (if applicable): rgmanager-1.9.67-0
I was thinking that a solution to this would be to have a self_fence mechanism similar to the one we have for the filesystem agent. Using this flag we can trigger a reboot of the node who is not able to clean his tags. If a reboot is performed the node is no more part of the cluster so the other node can proceed in taking the ownership of the LVs. I have modified lvm.sh adding the self_fence tag and the logic to perform the reboot in case the node is not able to clean up his tags. I am attaching here the patch created against the RHEL5.1 lvm.sh script. What is you opinion about this approach to resolve this problem? Could you review the script to see if there might be any problem with it? I have tested it and it worked for me but obviously I would like to hear you opinion on this matter before saying anything to customer.
Created attachment 280961 [details] proposed patch to add self_fence to lvm resource agent
Has this patch made it into a build for QA to test?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0791.html