Bug 1784601
Summary: | RFE: A way to implement a more dynamic fence delay so that in the event of a network split, the cluster will fence the node running the fewest resources | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Chad Newsom <cnewsom> | |
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | medium | Docs Contact: | Steven J. Levine <slevine> | |
Priority: | medium | |||
Version: | 8.0 | CC: | cluster-maint, lmanasko, mjuricek, nwahl, pasik, phagara, sbradley, slevine | |
Target Milestone: | rc | Keywords: | FutureFeature | |
Target Release: | 8.3 | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | pacemaker-2.0.4-2.el8 | Doc Type: | Enhancement | |
Doc Text: |
.New `priority-fencing-delay` cluster property
Pacemaker now supports the new `priority-fencing-delay` cluster property, which allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest resources running is the node that gets fenced.
The `priority-fencing-delay` property can be set to a time duration. The default value for this property is 0 (disabled). If this property is set to a non-zero value, and the `priority` meta-attribute is configured for at least one resource, then in a split-brain situation the node with the highest combined priority of all resources running on it will be more likely to survive.
For example, if you set `pcs resource defaults priority=1` and `pcs property set priority-fencing-delay=15s` and no other priorities are set, then the node running the most resources will be more likely to survive because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority.
The node running the master role of a promotable clone will get an extra 1 point if a priority has been configured for that clone.
Any delay set with `priority-fencing-delay` will be added to any delay from the `pcmk_delay_base` and `pcmk_delay_max` fence device properties. This behavior allows some delay when both nodes have equal priority, or both nodes need to be fenced for some reason other than node loss (for example, `on-fail=fencing` is set for a resource monitor operation). If used in combination, it is recommended that you set the `priority-fencing-delay` property to a value that is significantly greater than the maximum delay from `pcmk_delay_base` and `pcmk_delay_max`, to be sure the prioritized node is preferred (twice the value would be completely safe).
|
Story Points: | --- | |
Clone Of: | ||||
: | 1840407 (view as bug list) | Environment: | ||
Last Closed: | 2020-11-04 04:00:53 UTC | Type: | Feature Request | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1840407 |
Description
Chad Newsom
2019-12-17 20:13:00 UTC
Coincidentally, there was a discussion upstream about this topic recently.[1] The proposed solution design is: * Users would use the existing "priority" resource meta-attribute to weight each resource. * A new cluster property "priority-fencing-delay" would set the specified delay on all cluster-initiated fencing targeting the node with the highest cumulative priority of all resources active on it. (It would not apply to manual fencing via stonith_admin, or externally initiated fencing by something like DLM.) For a simple resource count as described here, it would be sufficient to give every resource a priority of 1. Of course if some resources are more important, they could be given higher priority. [1] https://github.com/ClusterLabs/fence-agents/pull/308 This has been fixed upstream as of commit 3dec930. A new cluster property, priority-fencing-delay, will default to -1 (meaning disabled) and can be set to a time duration. This is really only useful for 2-node clusters. If the new property is set, and the "priority" meta-attribute is configured for at least one resource, then in a split-brain situation, the node with the highest combined priority of all resources running on it will be more likely to survive. As an example, with: pcs resource defaults priority=1 pcs property set priority-fencing-delay=15s and no other priorities, then the node running the most resources will be more likely to survive, because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority. The node running the master role of a promotable clone will get an extra 1 point, if a priority has been configured for that clone. If both nodes have equal priority, or fencing is needed for some reason other than node loss (e.g. on-fail=fencing for some monitor), then the usual delay properties apply (pcmk_delay_base, etc.). Otherwise priority-fencing-delay takes precedence over other delay properties. The Doc Text has been updated with the final behavior, which is slightly different from the description in Comment 2 We might be able to get this into RHEL 7.9 as well, which will be tracked as Bug 1840407 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4804 |