Bug 613064
Summary: | Method to cause one node to delay fencing in a two node cluster | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Lon Hohberger <lhh> | ||||
Component: | cman | Assignee: | Marek Grac <mgrac> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.5 | CC: | clasohm, cluster-maint, djansa, djuran, edamato, fdinitto, jha, mgrac, slevine, teigland | ||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | cman-2.0.115-47.el5 | Doc Type: | Enhancement | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 614046 (view as bug list) | Environment: | |||||
Last Closed: | 2011-01-13 22:35:25 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 614046 | ||||||
Attachments: |
|
Description
Lon Hohberger
2010-07-09 15:54:03 UTC
The only problem I see with this suggestion is that the delay is not immediately visible in cluster.conf. My suggestion would be to have a generic/reserved keyword that fenced would process and consider as a sleep($time) directly. We only need to make sure the keyword is not currently use by any fence agents. fenced already does some parsing of fence agents options, so adding one keyword should be fairly simple and non-intrusive. Making post_fail_delay configurable in /etc/sysconfig doesn't preclude also adding delay args to agents where they are useful like ilo. Both seem fine to me. The /etc/sysconfig settings are obvious when you run ps, so they are not hidden. Created attachment 431421 [details]
proposed patch
proposed patch in attachment.
<clusternode name="rhel6-node1" votes="1" nodeid="1">
<fence>
<method name="single">
<device name="virsh_fence" port="rhel6-node1"/>
</method>
</fence>
</clusternode>
<clusternode name="rhel6-node2" votes="1" nodeid="2">
<fence>
<method name="single">
<device name="virsh_fence" port="rhel6-node2" delay="20"/>
</method>
</fence>
</clusternode>
[root@rhel6-node2 libfence]# fence_node rhel6-node1
fence rhel6-node1 success
[root@rhel6-node2 libfence]# fence_node rhel6-node2
Delay execution by 20 seconds
fence rhel6-node2 success
[root@rhel6-node2 libfence]#
The keyword "delay" is currently unused and I briefly spoke to Marek on IRC that agrees it can be used as reserved word (since it won´t hit any agent).
David, I commented out the test code I used, I don´t plan to commit it in the final patch (assuming the patch is ok with you. this is mostly to prove that it works as we expect.
Oh, dear, sorry, I completely misunderstood. I thought you were talking about adding "delay" as a fence agent arg. That would be ok with me. I don't like at all hijacking one of the node args like comment 7 does. So the two options which are both ok with me are 1. using post_fail_delay, with local config in /etc/sysconfig 2. adding delay args to fence agents where it's useful, like ilo I agree with "delay" as reserved word Ok reassigning to Marek since we'll do this as a delay option to the core python fencing library, and then on an as needed basis extend to other fences that are outside of the core fencing library. Fixed in upstream: fencing library based agents: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=52e9397d969966542367e832c6a3eff91204c117 http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=59222e02a1f69cbd8956f7ac39b515a8c038a15d fence agents drac + egenera: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=91450b013aefe5374c330b83b515da11f5daf338 fence agents ipmilan: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=274e85bc92fa3ffc5405dc2598b190d8470d0b32 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0036.html |