Bug 1784601

Summary: RFE: A way to implement a more dynamic fence delay so that in the event of a network split, the cluster will fence the node running the fewest resources
Product: Red Hat Enterprise Linux 8 Reporter: Chad Newsom <cnewsom>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact: Steven J. Levine <slevine>
Priority: medium    
Version: 8.0CC: cluster-maint, lmanasko, mjuricek, nwahl, pasik, phagara, sbradley, slevine
Target Milestone: rcKeywords: FutureFeature
Target Release: 8.3   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: pacemaker-2.0.4-2.el8 Doc Type: Enhancement
Doc Text:
.New `priority-fencing-delay` cluster property Pacemaker now supports the new `priority-fencing-delay` cluster property, which allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest resources running is the node that gets fenced. The `priority-fencing-delay` property can be set to a time duration. The default value for this property is 0 (disabled). If this property is set to a non-zero value, and the `priority` meta-attribute is configured for at least one resource, then in a split-brain situation the node with the highest combined priority of all resources running on it will be more likely to survive. For example, if you set `pcs resource defaults priority=1` and `pcs property set priority-fencing-delay=15s` and no other priorities are set, then the node running the most resources will be more likely to survive because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority. The node running the master role of a promotable clone will get an extra 1 point if a priority has been configured for that clone. Any delay set with `priority-fencing-delay` will be added to any delay from the `pcmk_delay_base` and `pcmk_delay_max` fence device properties. This behavior allows some delay when both nodes have equal priority, or both nodes need to be fenced for some reason other than node loss (for example, `on-fail=fencing` is set for a resource monitor operation). If used in combination, it is recommended that you set the `priority-fencing-delay` property to a value that is significantly greater than the maximum delay from `pcmk_delay_base` and `pcmk_delay_max`, to be sure the prioritized node is preferred (twice the value would be completely safe).
Story Points: ---
Clone Of:
: 1840407 (view as bug list) Environment:
Last Closed: 2020-11-04 04:00:53 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1840407    

Description Chad Newsom 2019-12-17 20:13:00 UTC
1. Proposed title of this feature request

A way to implement a more dymanic fence delay so that in the event of a network split, the cluster will fence the node running the fewest resources. 

2. Who is the customer behind the request?

     Account: EBS Financial Technologies Ltd (5695203)

     TAM customer: no

     SRM customer: no

     Strategic: no

3. What is the nature and description of the request?

The customer has a fence delay as described in https://access.redhat.com/solutions/54829 to avoid fence races.

The customer described a scenario in which a network split would result in the fencing of the node that is hosting a majority of cluster resources if the resources happen to be running on the node whose stonith resource is not configured with a delay. We discussed that currently the delay attribute is something that must be configured to decide the winner of a fence race ahead of time. The customer has requested that we file an RFE to look into ways for the delay to apply to the node that is hosting a majority of resource to minimize the impact of a fence event that results from a network split. 


4. Why does the customer need this? (List the business requirements here)

The goal would be to minimize disruption caused by fence events


5. How would the customer like to achieve this? (List the functional requirements here)

We discussed this on the phone. For now their team is interested in setting up a cron job that will check where resources are running, check that the node running has a stonith delay set, and set one if there isn't one set. For the RFE, it may be good to have it as an option somewhere in the cluster configuration so that we are not relying on a custom cron job. 


6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.

Once a solution is devised a series of network failure tests with varying cluster resource placement should confirm or deny if the solution works.


7. Is there already an existing RFE upstream or in Red Hat Bugzilla?

I was unable to locate one.


8. Does the customer have any specific time-line dependencies and which release would they like to target?

The customer is running RHEL 7. I'm filing this for RHEL 8 due to the fact that RHEL 7 is in Maintenance Support 1. 


9. Is the sales team involved in this request and do they have any additional input?

No

10. List any affected packages or components.

pcs, pacemaker, fence-agents

11. Would the customer be able to assist in testing this functionality if implemented?

Yes

Comment 1 Ken Gaillot 2019-12-17 20:29:51 UTC
Coincidentally, there was a discussion upstream about this topic recently.[1] The proposed solution design is:

* Users would use the existing "priority" resource meta-attribute to weight each resource.

* A new cluster property "priority-fencing-delay" would set the specified delay on all cluster-initiated fencing targeting the node with the highest cumulative priority of all resources active on it. (It would not apply to manual fencing via stonith_admin, or externally initiated fencing by something like DLM.)

For a simple resource count as described here, it would be sufficient to give every resource a priority of 1. Of course if some resources are more important, they could be given higher priority.

[1] https://github.com/ClusterLabs/fence-agents/pull/308

Comment 2 Ken Gaillot 2020-03-21 17:24:50 UTC
This has been fixed upstream as of commit 3dec930.

A new cluster property, priority-fencing-delay, will default to -1 (meaning disabled) and can be set to a time duration.

This is really only useful for 2-node clusters. If the new property is set, and the "priority" meta-attribute is configured for at least one resource, then in a split-brain situation, the node with the highest combined priority of all resources running on it will be more likely to survive.

As an example, with:

  pcs resource defaults priority=1
  pcs property set priority-fencing-delay=15s

and no other priorities, then the node running the most resources will be more likely to survive, because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority.

The node running the master role of a promotable clone will get an extra 1 point, if a priority has been configured for that clone.

If both nodes have equal priority, or fencing is needed for some reason other than node loss (e.g. on-fail=fencing for some monitor), then the usual delay properties apply (pcmk_delay_base, etc.). Otherwise priority-fencing-delay takes precedence over other delay properties.

Comment 3 Patrik Hagara 2020-03-23 11:59:58 UTC
qa_ack+, feature described in comment#2

Comment 6 Ken Gaillot 2020-05-15 16:50:33 UTC
The Doc Text has been updated with the final behavior, which is slightly different from the description in Comment 2

Comment 7 Ken Gaillot 2020-05-26 21:42:36 UTC
We might be able to get this into RHEL 7.9 as well, which will be tracked as Bug 1840407

Comment 17 errata-xmlrpc 2020-11-04 04:00:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4804