Description of problem:
Currently, there is no easy way to enable one host to "delay" fencing. Users often simply craft a "fence_sleep", which works just fine, but requires a whole lot more work than should be necessary. (Obviously, the agent "fence_sleep" itself is not suitable for inclusion in linux-cluster because it doesn't actually take any fencing actions; it simply sleeps and returns 0...)
Some fencing devices, such as HP iLO, take a long time to process requests. On a cluster which partitions but still has access to the iLO devices, this can be problematic. Because it takes a long time for iLO to process the requests, there is a window where two in-flight 'power-off' requests can cause the cluster to turn itself off - but not back on.
While this is great from a power-saving perspective, it is not very good from an availability perspective.
Ordinarily, this can be resolved using a quorum disk, however, a quorum disk is a fair bit of additional complexity and wholly unnecessary (or even undesirable) in many instances - for example, in clusters which serve data via NFS instead of a SAN, a quorum disk may not even be an option.
The proposal here is to add a method to make one node delay fencing for a period of time in order to allow the other node to "win" in the case of a network partition of the cluster intraconnect. In the event that the "primary" node goes down, the "backup" node will, indeed, take longer to fence - but at a benefit of reduced complexity and highly deterministic behavior (which can't currently be achieved using qdiskd).
Fortunately, all of the core code required exists in the cman package today. All we have to do is enable it on a per-host basis.
The specific proposal here, after talking with others, is to simply expose post_fail_delay via /etc/sysconfig/cman, and add it to the list of options when we start fenced.
For example, adding the following to /etc/sysconfig/cman on one host:
... and then, in the cman initscript, calling fenced with the corresponding -f option:
fenced -f $POST_FAIL_DELAY
... should have the desired effect.
The only problem I see with this suggestion is that the delay is not immediately visible in cluster.conf.
My suggestion would be to have a generic/reserved keyword that fenced would process and consider as a sleep($time) directly.
We only need to make sure the keyword is not currently use by any fence agents.
fenced already does some parsing of fence agents options, so adding one keyword should be fairly simple and non-intrusive.
Making post_fail_delay configurable in /etc/sysconfig doesn't preclude also adding delay args to agents where they are useful like ilo. Both seem fine to me.
The /etc/sysconfig settings are obvious when you run ps, so they are not hidden.
Created attachment 431421 [details]
proposed patch in attachment.
<clusternode name="rhel6-node1" votes="1" nodeid="1">
<device name="virsh_fence" port="rhel6-node1"/>
<clusternode name="rhel6-node2" votes="1" nodeid="2">
<device name="virsh_fence" port="rhel6-node2" delay="20"/>
[root@rhel6-node2 libfence]# fence_node rhel6-node1
fence rhel6-node1 success
[root@rhel6-node2 libfence]# fence_node rhel6-node2
Delay execution by 20 seconds
fence rhel6-node2 success
The keyword "delay" is currently unused and I briefly spoke to Marek on IRC that agrees it can be used as reserved word (since it won´t hit any agent).
David, I commented out the test code I used, I don´t plan to commit it in the final patch (assuming the patch is ok with you. this is mostly to prove that it works as we expect.
Oh, dear, sorry, I completely misunderstood. I thought you were talking about adding "delay" as a fence agent arg. That would be ok with me. I don't like at all hijacking one of the node args like comment 7 does.
So the two options which are both ok with me are
1. using post_fail_delay, with local config in /etc/sysconfig
2. adding delay args to fence agents where it's useful, like ilo
I agree with "delay" as reserved word
Ok reassigning to Marek since we'll do this as a delay option to the core python fencing library, and then on an as needed basis extend to other fences that are outside of the core fencing library.
Fixed in upstream:
fencing library based agents:
fence agents drac + egenera:
fence agents ipmilan:
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.