Bug 1443666
| Summary: | [RFE] Add ability to use sbd on cluster nodes but not remote nodes | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Ken Gaillot <kgaillot> | |
| Component: | pacemaker | Assignee: | Klaus Wenninger <kwenning> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | low | Docs Contact: | Steven J. Levine <slevine> | |
| Priority: | high | |||
| Version: | 8.0 | CC: | cfeist, cluster-maint, dpeess, jwboyer, kwalker, kwenning, lmiccini, ltamagno, michele, mkelly, mnovacek, msmazova, phagara, pzimek, sbradley, slevine, toneata | |
| Target Milestone: | rc | Keywords: | FutureFeature, Triaged, ZStream | |
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | pacemaker-2.1.0-8.el8 | Doc Type: | Enhancement | |
| Doc Text: |
.Ability to configure watchdog-only SBD for fencing on subset of cluster nodes
Previously, to use a watchdog-only SBD configuration, all nodes in the cluster had to use SBD. That prevented using SBD in a cluster where some nodes support it but other nodes (often remote nodes) required some other form of fencing. Users can now configure a watchdog-only SBD setup using the new `fence_watchdog` agent, which allows cluster configurations where only some nodes use watchdog-only SBD for fencing and other nodes use other fencing types. A cluster may only have a single such device, and it must be named `watchdog`.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1757982 1988568 1993891 (view as bug list) | Environment: | ||
| Last Closed: | 2021-11-09 18:44:49 UTC | Type: | Enhancement | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | 2.1.2 | |
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1757982, 1988568, 1993891 | |||
|
Description
Ken Gaillot
2017-04-19 16:37:19 UTC
just a few thoughts on the current state ... It is already possible to have concurrent fencing methods in parallel to sbd-fencing. But there are limitations that might create undesired effects. If the currently global cluster property watchdog-timeout is set pacemaker already enforces hardware-watchdogs on all nodes being set to a shorter timeout. (checks SBD_WATCHDOG_TIMEOUT environment variable configured via /etc/sysconfig/sbd and terminates its own service otherwise; no check for an actually running daemon though) For sbd-fencing via shared block-device(s) that would already be used just for fencing nodes that have advertised themselves by writing their name in one of the slots on the shared block-device(s). It is inherent that shared block-device fencing can just be triggered on nodes that have access to the block-device(s). As in this scenario all cluster-nodes do have the same sbd-configuration this is of no concern (remote-nodes anyway never trigger fencing) although in a more generic scenario (some cluster nodes have sbd others don't) this might be an issue. Internally sbd-watchdog-fencing adds a fence-device that successfully returns for all nodes after a configured timeout. This device though isn't externally visible and thus can't be used to properly integrate into a fence-level-hierarchy. So in the best case the alternative fencing-method would be tried but if it fails it would always fall back to the watchdog-fencing being fatal for nodes that don't have it working properly (the remote-nodes in this scenario). probably burns down to either: adding a cluster-property with a list of all nodes that are doing quorum-based watchdog-fencing - matching enabling/disabling via cluster-property stonith-watchdog-timeout making the hidden device visible somehow and use existant meta-attributes Due to capacity constraints, this is unlikely to be addressed in the 7.5 timeframe *** Bug 1449982 has been marked as a duplicate of this bug. *** As to take care of the issue raised by bz1449982 (never use watchdog-fencing with 2-node) cluster nodes can be taken out of the list of watchdog-fenced nodes if 2-node is enabled. A proposed solution for discussion (till now cluster-nodes are not automatically disabled if 2-node-option is set): https://github.com/ClusterLabs/pacemaker/pull/1432 qa_ack+ if we get documentation or a howto on how to test this. klaus: do we have something? As I'm currently working on it let me give you an outline of how I test the feature:
My cluster consists of node2, node3, node4, remote_node1.
As we've finally decided to go the route of making the fencing-device visible I have the following fencing-resource (note that the name has to be watchdog):
Resource: watchdog (class=stonith type=fence_watchdog)
Attributes: pcmk_host_list="node2 node3"
Operations: monitor interval=60s (watchdog-monitor-interval-60s)
stonith-watchdog-timeout: 30
This config makes watchdog-fencing available for node2 & node3 while
node4 & remote_node1 don't support being fenced via watchdog-fencing.
As long as you don't configure any other means of fencing you can now easily test the feature by simple ungracefully bringing down one of the 4 nodes.
Doing that with node2 or node3 - while watching the cluster with crm_mon - should show the node as unclean for 30 seconds and afterwards it should go to offline.
Doing the same with node4 or remote_node1 would make them stay unclean for eternity.
just a side-node: Pacemaker 2 is bringing a feature to automatically set the timeout for watchdog-fencing by setting stonith-watchdog-timeout=-1. The first implementation is dangerous as it assumes the value read from the environment to be the same for all nodes. As much as for this feature here all nodes have to have a common understanding of which nodes do watchdog-fencing and which don't there as well has to be a common understanding of which node is using which timeout. Grabbing a common timeout from the environment of one of the nodes and have all other nodes check against that before starting resources sounds viable but might impose unnecessarily long timeouts in heterogeneous environments (if certain nodes can't set their watchdog-timeout below a certain value - either because the watchdog isn't configurable or crashdump takes long - this has to be the value for the whole cluster). So sharing a table of timeouts might rather be the way to go. Thus the watchdog-fencing yes/no per node to satisfy this bz should be implemented in a way that it can easily be extended to dealing with a list of different timeouts per node. Bumping to 8.1 due to devel/QA capacity constraints qa_ack+ testcase in comment#9 Changing target to 8.4 due to developer time constraints Fixed upstream by commits b49f495 5dd1e44 53dd360 *** Bug 1995419 has been marked as a duplicate of this bug. *** QA: The latest (-8) build fixes a small issue with the meta-data for the new fence_watchdog agent, so please also check that the meta-data output looks reasonable [root@virt-273 ~]# rpm -q pacemaker
pacemaker-2.1.0-8.el8.x86_64
[root@virt-273 ~]# rpm -q sbd
sbd-1.5.0-2.el8.x86_64
Check fence_watchdog meta-data:
[root@virt-273 ~]# pcs stonith describe fence_watchdog
fence_watchdog - Dummy watchdog fence agent
fence_watchdog just provides
meta-data - actual fencing is done by the pacemaker internal watchdog agent.
Stonith options:
nodename: Ignored
plug: Ignored
pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names. Eg.
node1:1;node2:2,3 would tell the cluster to use port 1 for node1 and ports 2 and 3 for node2
pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).
pcmk_host_check: How to determine which machines are controlled by the device. Allowed values: dynamic-list (query
the device via the 'list' command), static-list (check the pcmk_host_list attribute), status (query
the device via the 'status' command), none (assume every device can fence every machine)
pcmk_delay_max: Enable a delay of no more than the time specified before executing fencing actions. Pacemaker
derives the overall delay by taking the value of pcmk_delay_base and adding a random delay value
such that the sum is kept below this maximum. This prevents double fencing when using slow devices
such as sbd. Use this to enable a random delay for fencing actions. The overall delay is derived
from this random delay value adding a static delay so that the sum is kept below the maximum delay.
pcmk_delay_base: Enable a base delay for fencing actions and specify base delay value. This prevents double fencing
when different delays are configured on the nodes. Use this to enable a static delay for fencing
actions. The overall delay is derived from a random delay value adding this static delay so that
the sum is kept below the maximum delay.
pcmk_action_limit: The maximum number of actions can be performed in parallel on this device Cluster property
concurrent-fencing=true needs to be configured first. Then use this to specify the maximum number
of actions can be performed in parallel on this device. -1 is unlimited.
Default operations:
monitor: interval=60s
[root@virt-273 ~]# crm_resource --show-metadata=stonith:fence_watchdog
<resource-agent name="fence_watchdog" shortdesc="Dummy watchdog fence agent">
<longdesc>fence_watchdog just provides
meta-data - actual fencing is done by the pacemaker internal watchdog agent.</longdesc>
<parameters>
<parameter name="action" required="0">
<getopt mixed="-o, --action=[action]"/>
<content type="string" default="metadata"/>
<shortdesc lang="en">Fencing Action</shortdesc>
</parameter>
<parameter name="nodename" required="0">
<getopt mixed="-N, --nodename"/>
<content type="string"/>
<shortdesc lang="en">Ignored</shortdesc>
</parameter>
<parameter name="plug" required="0">
<getopt mixed="-n, --plug=[id]"/>
<content type="string"/>
<shortdesc lang="en">Ignored</shortdesc>
</parameter>
<parameter name="version" required="0">
<getopt mixed="-V, --version"/>
<content type="boolean"/>
<shortdesc lang="en">Display version information and exit</shortdesc>
</parameter>
<parameter name="help" required="0">
<getopt mixed="-h, --help"/>
<content type="boolean"/>
<shortdesc lang="en">Display help and exit</shortdesc>
</parameter>
</parameters>
<actions>
<action name="on"/>
<action name="off"/>
<action name="reboot"/>
<action name="monitor"/>
<action name="list"/>
<action name="metadata"/>
<action name="stop" timeout="20s"/>
<action name="start" timeout="20s"/>
</actions>
</resource-agent>
Setup 3-node cluster with 1 remote node:
[root@virt-273 ~]# pcs status --full
Cluster name: test_cluster
Cluster Summary:
* Stack: corosync
* Current DC: virt-274 (3) (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Mon Aug 30 20:18:04 2021
* Last change: Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
* 4 nodes configured
* 2 resource instances configured
Node List:
* Online: [ virt-273 (4) virt-274 (3) virt-275 (2) ]
* RemoteOnline: [ virt-276 ]
Full List of Resources:
* virt-276 (ocf::pacemaker:remote): Started virt-274
* watchdog (stonith:fence_watchdog): Started virt-275
Migration Summary:
Fencing History:
Tickets:
PCSD Status:
virt-273: Online
virt-274: Online
virt-275: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
sbd: active/enabled
Watchdog stonith is configured and sbd is running on only two of the cluster nodes ("virt-273 virt-274"):
[root@virt-273 ~]# pcs stonith config watchdog
Resource: watchdog (class=stonith type=fence_watchdog)
Attributes: pcmk_host_list="virt-273 virt-274"
Operations: monitor interval=60s (watchdog-monitor-interval-60s)
[root@virt-273 ~]# pcs stonith sbd status
SBD STATUS
<node name>: <installed> | <enabled> | <running>
virt-273: YES | YES | YES
virt-275: YES | NO | NO
virt-274: YES | YES | YES
killall -9 corosync on the node virt-273, which has watchdog fencing:
[root@virt-273 ~]# killall -9 corosync
Node becomes unclean:
[root@virt-275 ~]# crm_mon -1rfm
Cluster Summary:
* Stack: corosync
* Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Mon Aug 30 20:19:42 2021
* Last change: Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
* 4 nodes configured
* 2 resource instances configured
Node List:
* Node virt-273: UNCLEAN (offline)
* Online: [ virt-274 virt-275 ]
* RemoteOnline: [ virt-276 ]
Full List of Resources:
* virt-276 (ocf::pacemaker:remote): Started virt-274
* watchdog (stonith:fence_watchdog): Started virt-275
Migration Summary:
Fencing History:
* reboot of virt-273 pending: client=pacemaker-controld.57157, origin=virt-274
Node is fenced:
[root@virt-275 ~]# crm_mon -1rfm
Cluster Summary:
* Stack: corosync
* Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Mon Aug 30 20:19:52 2021
* Last change: Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
* 4 nodes configured
* 2 resource instances configured
Node List:
* Online: [ virt-274 virt-275 ]
* OFFLINE: [ virt-273 ]
* RemoteOnline: [ virt-276 ]
Full List of Resources:
* virt-276 (ocf::pacemaker:remote): Started virt-274
* watchdog (stonith:fence_watchdog): Started virt-275
Migration Summary:
Fencing History:
* reboot of virt-273 successful: delegate=virt-274, client=pacemaker-controld.57157, origin=virt-274, last-successful='2021-08-30 20:19:49 +02:00'
Node is rebooted again:
[root@virt-275 ~]# crm_mon -1rfm
Cluster Summary:
* Stack: corosync
* Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Mon Aug 30 20:20:49 2021
* Last change: Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
* 4 nodes configured
* 2 resource instances configured
Node List:
* Online: [ virt-273 virt-274 virt-275 ]
* RemoteOnline: [ virt-276 ]
Full List of Resources:
* virt-276 (ocf::pacemaker:remote): Started virt-274
* watchdog (stonith:fence_watchdog): Started virt-275
Migration Summary:
Fencing History:
* reboot of virt-273 successful: delegate=virt-274, client=pacemaker-controld.57157, origin=virt-274, last-successful='2021-08-30 20:19:49 +02:00'
killall -9 corosync on the node virt-275, which doesn't have watchdog fencing configured:
[root@virt-275 ~]# killall -9 corosync
Node is marked as UNCLEAN, fencing failed:
[root@virt-273 ~]# crm_mon -1rfm
Cluster Summary:
* Stack: corosync
* Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
* Last updated: Mon Aug 30 20:26:05 2021
* Last change: Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
* 4 nodes configured
* 2 resource instances configured
Node List:
* Node virt-275: UNCLEAN (offline)
* Online: [ virt-273 virt-274 ]
* RemoteOnline: [ virt-276 ]
Full List of Resources:
* virt-276 (ocf::pacemaker:remote): Started virt-274
* watchdog (stonith:fence_watchdog): Started [ virt-275 virt-273 ]
Migration Summary:
Failed Fencing Actions:
* reboot of virt-275 failed: delegate=, client=pacemaker-controld.57157, origin=virt-274, last-failed='2021-08-30 20:26:02 +02:00'
Fencing History:
* reboot of virt-273 successful: delegate=virt-274, client=pacemaker-controld.57157, origin=virt-274, last-successful='2021-08-30 20:19:49 +02:00'
Node stays in the UNCLEAN state.
Verified as SanityOnly in pacemaker-2.1.0-8.el8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:4267 |