Bug 1443666

Summary:	[RFE] Add ability to use sbd on cluster nodes but not remote nodes
Product:	Red Hat Enterprise Linux 8	Reporter:	Ken Gaillot <kgaillot>
Component:	pacemaker	Assignee:	Klaus Wenninger <kwenning>
Status:	CLOSED ERRATA	QA Contact:	cluster-qe <cluster-qe>
Severity:	low	Docs Contact:	Steven J. Levine <slevine>
Priority:	high
Version:	8.0	CC:	cfeist, cluster-maint, dpeess, jwboyer, kwalker, kwenning, lmiccini, ltamagno, michele, mkelly, mnovacek, msmazova, phagara, pzimek, sbradley, slevine, toneata
Target Milestone:	rc	Keywords:	FutureFeature, Triaged, ZStream
Target Release:	8.5	Flags:	pm-rhel: mirror+
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	pacemaker-2.1.0-8.el8	Doc Type:	Enhancement
Doc Text:	.Ability to configure watchdog-only SBD for fencing on subset of cluster nodes Previously, to use a watchdog-only SBD configuration, all nodes in the cluster had to use SBD. That prevented using SBD in a cluster where some nodes support it but other nodes (often remote nodes) required some other form of fencing. Users can now configure a watchdog-only SBD setup using the new `fence_watchdog` agent, which allows cluster configurations where only some nodes use watchdog-only SBD for fencing and other nodes use other fencing types. A cluster may only have a single such device, and it must be named `watchdog`.	Story Points:	---
Clone Of:
Clones:	1757982 1988568 1993891 (view as bug list)		Environment:
Last Closed:	2021-11-09 18:44:49 UTC	Type:	Enhancement
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:	2.1.2
Embargoed:
Bug Depends On:
Bug Blocks:	1757982, 1988568, 1993891

Description Ken Gaillot 2017-04-19 16:37:19 UTC

Pacemaker currently requires that sbd usage be configured for all nodes or none. While this makes sense for cluster nodes to ensure that any node can initiate fencing, Pacemaker Remote nodes do not initiate fencing, so they could potentially be configured with other fence devices without needing sbd.

This could be particularly useful if remote nodes are added from a pool of virtual machines (possibly via a cloud) that don't have hardware watchdogs, but do have another means of fencing.

Comment 2 Klaus Wenninger 2017-04-19 23:07:15 UTC

just a few thoughts on the current state ...

It is already possible to have concurrent fencing methods in parallel to sbd-fencing.
But there are limitations that might create undesired effects.

If the currently global cluster property watchdog-timeout is set pacemaker already enforces hardware-watchdogs on all nodes being set to a shorter timeout. (checks SBD_WATCHDOG_TIMEOUT environment variable configured via /etc/sysconfig/sbd and terminates its own service otherwise; no check for an actually running daemon though)

For sbd-fencing via shared block-device(s) that would already be used just for fencing nodes that have advertised themselves by writing their name in one of the slots on the shared block-device(s).
It is inherent that shared block-device fencing can just be triggered on nodes that have access to the block-device(s).
As in this scenario all cluster-nodes do have the same sbd-configuration this is of no concern (remote-nodes anyway never trigger fencing) although in a more generic scenario (some cluster nodes have sbd others don't) this might be an issue.

Internally sbd-watchdog-fencing adds a fence-device that successfully returns for all nodes after a configured timeout.
This device though isn't externally visible and thus can't be used to properly integrate into a fence-level-hierarchy.
So in the best case the alternative fencing-method would be tried but if it fails it would always fall back to the watchdog-fencing being fatal for nodes that don't have it working properly (the remote-nodes in this scenario).

Comment 3 Klaus Wenninger 2017-06-28 10:06:02 UTC

probably burns down to either:

adding a cluster-property with a list of all nodes that are doing quorum-based watchdog-fencing - matching enabling/disabling via cluster-property stonith-watchdog-timeout

making the hidden device visible somehow and use existant meta-attributes

Comment 4 Ken Gaillot 2017-08-01 16:35:04 UTC

Due to capacity constraints, this is unlikely to be addressed in the 7.5 timeframe

Comment 5 Klaus Wenninger 2017-10-19 14:11:43 UTC

*** Bug 1449982 has been marked as a duplicate of this bug. ***

Comment 6 Klaus Wenninger 2017-10-19 14:14:58 UTC

As to take care of the issue raised by bz1449982 (never use watchdog-fencing with 2-node) cluster nodes can be taken out of the list of watchdog-fenced nodes if 2-node is enabled.

Comment 7 Klaus Wenninger 2018-03-13 12:11:13 UTC

A proposed solution for discussion (till now cluster-nodes are not automatically disabled if 2-node-option is set):
https://github.com/ClusterLabs/pacemaker/pull/1432

Comment 8 michal novacek 2018-04-16 13:00:14 UTC

qa_ack+ if we get documentation or a howto on how to test this.

klaus: do we have something?

Comment 9 Klaus Wenninger 2018-04-16 13:16:58 UTC

As I'm currently working on it let me give you an outline of how I test the feature:

My cluster consists of node2, node3, node4, remote_node1.

As we've finally decided to go the route of making the fencing-device visible I have the following fencing-resource (note that the name has to be watchdog):

  Resource: watchdog (class=stonith type=fence_watchdog)
    Attributes: pcmk_host_list="node2 node3"
    Operations: monitor interval=60s (watchdog-monitor-interval-60s)

  stonith-watchdog-timeout: 30

This config makes watchdog-fencing available for node2 & node3 while
node4 & remote_node1 don't support being fenced via watchdog-fencing.

As long as you don't configure any other means of fencing you can now easily test the feature by simple ungracefully bringing down one of the 4 nodes.

Doing that with node2 or node3 - while watching the cluster with crm_mon - should show the node as unclean for 30 seconds and afterwards it should go to offline.
Doing the same with node4 or remote_node1 would make them stay unclean for eternity.

Comment 10 Klaus Wenninger 2018-12-20 15:48:41 UTC

just a side-node:

Pacemaker 2 is bringing a feature to automatically set the timeout for
watchdog-fencing by setting stonith-watchdog-timeout=-1.
The first implementation is dangerous as it assumes the value read from the
environment to be the same for all nodes.

As much as for this feature here all nodes have to have a common understanding
of which nodes do watchdog-fencing and which don't there as well has to be
a common understanding of which node is using which timeout.

Grabbing a common timeout from the environment of one of the nodes and have
all other nodes check against that before starting resources sounds viable
but might impose unnecessarily long timeouts in heterogeneous environments
(if certain nodes can't set their watchdog-timeout below a certain value -
either because the watchdog isn't configurable or crashdump takes long -
this has to be the value for the whole cluster).
So sharing a table of timeouts might rather be the way to go.

Thus the watchdog-fencing yes/no per node to satisfy this bz should
be implemented in a way that it can easily be extended to dealing with
a list of different timeouts per node.

Comment 11 Ken Gaillot 2019-01-15 17:37:11 UTC

Bumping to 8.1 due to devel/QA capacity constraints

Comment 12 Patrik Hagara 2019-09-20 09:50:17 UTC

qa_ack+

testcase in comment#9

Comment 16 Ken Gaillot 2020-05-27 19:57:32 UTC

Changing target to 8.4 due to developer time constraints

Comment 34 Ken Gaillot 2021-08-04 22:24:35 UTC

Fixed upstream by commits b49f495 5dd1e44 53dd360

Comment 51 Ken Gaillot 2021-08-20 14:47:45 UTC

*** Bug 1995419 has been marked as a duplicate of this bug. ***

Comment 52 Ken Gaillot 2021-08-20 15:01:02 UTC

QA: The latest (-8) build fixes a small issue with the meta-data for the new fence_watchdog agent, so please also check that the meta-data output looks reasonable

Comment 54 Markéta Smazová 2021-08-30 19:11:15 UTC

    [root@virt-273 ~]# rpm -q pacemaker
    pacemaker-2.1.0-8.el8.x86_64
    
    [root@virt-273 ~]# rpm -q sbd
    sbd-1.5.0-2.el8.x86_64


Check fence_watchdog meta-data:

    [root@virt-273 ~]# pcs stonith describe fence_watchdog
    fence_watchdog - Dummy watchdog fence agent

    fence_watchdog just provides
    meta-data - actual fencing is done by the pacemaker internal watchdog agent.

    Stonith options:
      nodename: Ignored
      plug: Ignored
      pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names. Eg.
                     node1:1;node2:2,3 would tell the cluster to use port 1 for node1 and ports 2 and 3 for node2
      pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).
      pcmk_host_check: How to determine which machines are controlled by the device. Allowed values: dynamic-list (query
                       the device via the 'list' command), static-list (check the pcmk_host_list attribute), status (query
                       the device via the 'status' command), none (assume every device can fence every machine)
      pcmk_delay_max: Enable a delay of no more than the time specified before executing fencing actions. Pacemaker
                      derives the overall delay by taking the value of pcmk_delay_base and adding a random delay value
                      such that the sum is kept below this maximum. This prevents double fencing when using slow devices
                      such as sbd. Use this to enable a random delay for fencing actions. The overall delay is derived
                      from this random delay value adding a static delay so that the sum is kept below the maximum delay.
      pcmk_delay_base: Enable a base delay for fencing actions and specify base delay value. This prevents double fencing
                       when different delays are configured on the nodes. Use this to enable a static delay for fencing
                       actions. The overall delay is derived from a random delay value adding this static delay so that
                       the sum is kept below the maximum delay.
      pcmk_action_limit: The maximum number of actions can be performed in parallel on this device Cluster property
                         concurrent-fencing=true needs to be configured first. Then use this to specify the maximum number
                         of actions can be performed in parallel on this device. -1 is unlimited.

    Default operations:
      monitor: interval=60s

    [root@virt-273 ~]# crm_resource --show-metadata=stonith:fence_watchdog
    <resource-agent name="fence_watchdog" shortdesc="Dummy watchdog fence agent">
      <longdesc>fence_watchdog just provides
    meta-data - actual fencing is done by the pacemaker internal watchdog agent.</longdesc>
      <parameters>
        <parameter name="action" required="0">
          <getopt mixed="-o, --action=[action]"/>
          <content type="string" default="metadata"/>
          <shortdesc lang="en">Fencing Action</shortdesc>
        </parameter>
        <parameter name="nodename" required="0">
          <getopt mixed="-N, --nodename"/>
          <content type="string"/>
          <shortdesc lang="en">Ignored</shortdesc>
        </parameter>
        <parameter name="plug" required="0">
          <getopt mixed="-n, --plug=[id]"/>
          <content type="string"/>
          <shortdesc lang="en">Ignored</shortdesc>
        </parameter>
        <parameter name="version" required="0">
          <getopt mixed="-V, --version"/>
          <content type="boolean"/>
          <shortdesc lang="en">Display version information and exit</shortdesc>
        </parameter>
        <parameter name="help" required="0">
          <getopt mixed="-h, --help"/>
          <content type="boolean"/>
          <shortdesc lang="en">Display help and exit</shortdesc>
        </parameter>
      </parameters>
      <actions>
        <action name="on"/>
        <action name="off"/>
        <action name="reboot"/>
        <action name="monitor"/>
        <action name="list"/>
        <action name="metadata"/>
        <action name="stop" timeout="20s"/>
        <action name="start" timeout="20s"/>
      </actions>
    </resource-agent>

Setup 3-node cluster with 1 remote node:

    [root@virt-273 ~]# pcs status --full
    Cluster name: test_cluster
    Cluster Summary:
      * Stack: corosync
      * Current DC: virt-274 (3) (version 2.1.0-8.el8-7c3f660707) - partition with quorum
      * Last updated: Mon Aug 30 20:18:04 2021
      * Last change:  Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
      * 4 nodes configured
      * 2 resource instances configured

    Node List:
      * Online: [ virt-273 (4) virt-274 (3) virt-275 (2) ]
      * RemoteOnline: [ virt-276 ]

    Full List of Resources:
      * virt-276	(ocf::pacemaker:remote):	 Started virt-274
      * watchdog	(stonith:fence_watchdog):	 Started virt-275

    Migration Summary:

    Fencing History:

    Tickets:

    PCSD Status:
      virt-273: Online
      virt-274: Online
      virt-275: Online

    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
      sbd: active/enabled

Watchdog stonith is configured and sbd is running on only two of the cluster nodes ("virt-273 virt-274"):

    [root@virt-273 ~]# pcs stonith config watchdog
     Resource: watchdog (class=stonith type=fence_watchdog)
      Attributes: pcmk_host_list="virt-273 virt-274"
      Operations: monitor interval=60s (watchdog-monitor-interval-60s)

    [root@virt-273 ~]# pcs stonith sbd status
    SBD STATUS
    <node name>: <installed> | <enabled> | <running>
    virt-273: YES | YES | YES
    virt-275: YES |  NO |  NO
    virt-274: YES | YES | YES


killall -9 corosync on the node virt-273, which has watchdog fencing:

    [root@virt-273 ~]# killall -9 corosync

Node becomes unclean:

    [root@virt-275 ~]# crm_mon -1rfm
    Cluster Summary:
      * Stack: corosync
      * Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
      * Last updated: Mon Aug 30 20:19:42 2021
      * Last change:  Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
      * 4 nodes configured
      * 2 resource instances configured

    Node List:
      * Node virt-273: UNCLEAN (offline)
      * Online: [ virt-274 virt-275 ]
      * RemoteOnline: [ virt-276 ]

    Full List of Resources:
      * virt-276	(ocf::pacemaker:remote):	 Started virt-274
      * watchdog	(stonith:fence_watchdog):	 Started virt-275

    Migration Summary:

    Fencing History:
      * reboot of virt-273 pending: client=pacemaker-controld.57157, origin=virt-274

Node is fenced:

    [root@virt-275 ~]# crm_mon -1rfm
    Cluster Summary:
      * Stack: corosync
      * Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
      * Last updated: Mon Aug 30 20:19:52 2021
      * Last change:  Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
      * 4 nodes configured
      * 2 resource instances configured

    Node List:
      * Online: [ virt-274 virt-275 ]
      * OFFLINE: [ virt-273 ]
      * RemoteOnline: [ virt-276 ]

    Full List of Resources:
      * virt-276	(ocf::pacemaker:remote):	 Started virt-274
      * watchdog	(stonith:fence_watchdog):	 Started virt-275

    Migration Summary:

    Fencing History:
      * reboot of virt-273 successful: delegate=virt-274, client=pacemaker-controld.57157, origin=virt-274, last-successful='2021-08-30 20:19:49 +02:00'

Node is rebooted again:

    [root@virt-275 ~]# crm_mon -1rfm
    Cluster Summary:
      * Stack: corosync
      * Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
      * Last updated: Mon Aug 30 20:20:49 2021
      * Last change:  Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
      * 4 nodes configured
      * 2 resource instances configured

    Node List:
      * Online: [ virt-273 virt-274 virt-275 ]
      * RemoteOnline: [ virt-276 ]

    Full List of Resources:
      * virt-276	(ocf::pacemaker:remote):	 Started virt-274
      * watchdog	(stonith:fence_watchdog):	 Started virt-275

    Migration Summary:

    Fencing History:
      * reboot of virt-273 successful: delegate=virt-274, client=pacemaker-controld.57157, origin=virt-274, last-successful='2021-08-30 20:19:49 +02:00'


killall -9 corosync on the node virt-275, which doesn't have watchdog fencing configured:

    [root@virt-275 ~]# killall -9 corosync


Node is marked as UNCLEAN, fencing failed:

    [root@virt-273 ~]# crm_mon -1rfm
    Cluster Summary:
      * Stack: corosync
      * Current DC: virt-274 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
      * Last updated: Mon Aug 30 20:26:05 2021
      * Last change:  Mon Aug 30 19:47:07 2021 by root via cibadmin on virt-273
      * 4 nodes configured
      * 2 resource instances configured

    Node List:
      * Node virt-275: UNCLEAN (offline)
      * Online: [ virt-273 virt-274 ]
      * RemoteOnline: [ virt-276 ]

    Full List of Resources:
      * virt-276	(ocf::pacemaker:remote):	 Started virt-274
      * watchdog	(stonith:fence_watchdog):	 Started [ virt-275 virt-273 ]

    Migration Summary:

    Failed Fencing Actions:
      * reboot of virt-275 failed: delegate=, client=pacemaker-controld.57157, origin=virt-274, last-failed='2021-08-30 20:26:02 +02:00' 

    Fencing History:
      * reboot of virt-273 successful: delegate=virt-274, client=pacemaker-controld.57157, origin=virt-274, last-successful='2021-08-30 20:19:49 +02:00'

Node stays in the UNCLEAN state.


Verified as SanityOnly in pacemaker-2.1.0-8.el8

Comment 56 errata-xmlrpc 2021-11-09 18:44:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:4267