To prevent manual editing of sbd-config-files at least the ability to configure reference to the shared block devices would have to be added to pcs. Maybe on top it makes sense to add some shortcut for adding fence_sbd as a fencing resource. +++ This bug was initially created as a clone of Bug #1413951 +++ Description of problem: SBD provided with RHEL doesn't support usage of shared storage - just watchdog Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Klaus Wenninger on 2017-01-17 07:36:19 EST --- Since we don't support sbd on remote-nodes we won't support shared block devices with sbd there either. As with sbd support in general on remote-nodes use of shared block devices there is not explicitly disabled and in fact seems to be working as expected if parameters '-n {remote_node_name}' is added to sbd-config.
Forgotten one point that might be handy: With sbd + shared storage you would use the sbd-cmdline-tool for 3 purposes: - initialization of the messaging layout on the block devices - manually checking for messages - reset messages after node being fenced This can all be done from one node that has access to the block device(s). So no support from pcsd is needed to enable the possibility of management just from a single node without the need to ssh anywhere else. But as the reference to the block-device(s) is needed in multiple places like initialization, creation of fencing-resource and config written to /etc/sysconfig/sbd (has to be done on all nodes) pcs might provide a shortcut here.
Regarding manual reset of the sbd-message-slots it is definitely a feature that a manual reset is required. There is a cmdline-parameter (-S) to the sbd-daemon & SBD_STARTMODE in /etc/sysconfig/sbd. Not giving -S in the cmdline and not defining SBD_STARTMODE both default to starting regardless of which was the last message conveyed via the sbd-message-slot. The sbd-startup then cleans the slot automatically. On the other hand the default /etc/sysconfig/sbd defines SBD_STARTMODE=clean which would require manual cleaning. I don't know which usecase was behind doing the latter. So I guess it would just be easiest to alter the default /etc/sysconfig/sbd to get a behaviour more similar to other fencing-devices. As this is obviously a feature it might make sense though to support setting of SBD_STARTMODE via pcs, especially as this is something you would probably have to do on all cluster nodes - regardless which default we might go for.
a few thoughts on the interface ... The simple setup has a single set of shared-block-devices that is seen by all of the nodes. This would mean that the devices have to be simultaneously accessible from 16 nodes (supported node-limit in RHEL probably being raised to 32). So probably no issue at the moment. Anyway - with the addition of future support for remote-nodes this could look differently as there can be hundreds to maybe even thousands of them and this could be an issue for shared-block-devices. From what I've seen one way to tackle this issue might be having several sets of shared-block-devices and one fence_sbd-instance for each of the sets. A node - regardless if remote-node or cluster-node - would then be accessing one of these sets. At least 2 cluster-nodes per set probably makes sense to be able to fence the remote-nodes even if one of the cluster-nodes is down and so that the cluster-nodes can fence each other. So I guess when introducing an admin-interface via pcs it probably makes sense to have a seamless extension to a scenario (and with that the admin-interface) like that in mind.
Upstream patch: https://github.com/ClusterLabs/pcs/commit/6666a61edb2c8fccf20dd719fc13f2930898 This adds support only into cli. We still need to make sure if GUI works properly.
additional patch: https://github.com/ClusterLabs/pcs/commit/fa47bdcb0a9c9ee46699a0fea2ba25846918 This patch allows to specify SBD device when adding node to cluster from web UI.
additional fix: https://github.com/ClusterLabs/pcs/commit/6555f41cb5def262435df6ec9f7c0cb8b67e TEST: 2 node cluster: rhel74-node1, rhel74-node2 requirements: - HW watchdog on all nodes (/dev/watchdog) - shared device (/dev/vdb) - sbd installed on all nodes SBD is disabled: [root@rhel74-node1 ~]# pcs stonith sbd status SBD STATUS <node name>: <installed> | <enabled> | <running> rhel74-node1: YES | NO | NO rhel74-node2: YES | NO | NO Create fence device for sbd: [root@rhel74-node1 ~]# pcs stonith create sbd-fencing fence_sbd devices=/dev/vdb method=cycle [root@rhel74-node1 ~]# pcs status Cluster name: rhel74 Stack: corosync Current DC: rhel74-node2 (version 1.1.16-6.el7-94ff4df) - partition with quorum Last updated: Fri Apr 7 09:18:07 2017 Last change: Fri Apr 7 09:17:57 2017 by root via cibadmin on rhel74-node1 2 nodes configured 1 resource configured Online: [ rhel74-node1 rhel74-node2 ] Full list of resources: sbd-fencing (stonith:fence_sbd): Started rhel74-node1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/disabled Initialize shared device: [root@rhel74-node1 ~]# pcs stonith sbd device setup --device=/dev/vdb WARNING: All current content on device(s) '/dev/vdb' will be overwritten. Are you sure you want to continue? [y/N] y Initializing device(s) /dev/vdb... Device(s) initialized successfuly Enable SBD: [root@rhel74-node1 ~]# pcs stonith sbd enable --device=/dev/vdb Running SBD pre-enabling checks... rhel74-node1: SBD pre-enabling checks done rhel74-node2: SBD pre-enabling checks done Distributing SBD config... rhel74-node1: SBD config saved rhel74-node2: SBD config saved Enabling SBD service... rhel74-node2: sbd enabled rhel74-node1: sbd enabled Warning: Cluster restart is required in order to apply these changes. Restart cluster: [root@rhel74-node1 ~]# pcs cluster stop --all rhel74-node2: Stopping Cluster (pacemaker)... rhel74-node1: Stopping Cluster (pacemaker)... rhel74-node1: Stopping Cluster (corosync)... rhel74-node2: Stopping Cluster (corosync)... [root@rhel74-node1 ~]# pcs cluster start --all rhel74-node2: Starting Cluster... rhel74-node1: Starting Cluster... Check SBD status and config: [root@rhel74-node1 ~]# pcs stonith sbd status SBD STATUS <node name>: <installed> | <enabled> | <running> rhel74-node1: YES | YES | YES rhel74-node2: YES | YES | YES Messages list on device '/dev/vdb': 0 rhel74-node2 clear 1 rhel74-node1 clear [root@rhel74-node1 ~]# pcs stonith sbd config SBD_WATCHDOG_TIMEOUT=5 SBD_STARTMODE=always SBD_DELAY_START=no Watchdogs: rhel74-node1: /dev/watchdog rhel74-node2: /dev/watchdog Devices: rhel74-node1: "/dev/vdb" rhel74-node2: "/dev/vdb" Disable SBD: [root@rhel74-node1 ~]# pcs stonith sbd disable Disabling SBD service... rhel74-node1: sbd disabled rhel74-node2: sbd disabled Warning: Cluster restart is required in order to apply these changes.
additional fix: https://github.com/ClusterLabs/pcs/commit/0364ef4c4975c0e10b189c14c7aa530b9d5d
see comment 11
I added a brief description for the release notes, just so we have something. When the final articles are ready on the Portal that document SBD with this new feature, I can add references to them.
I added a reference to the published Portal article on SBD fencing, which includes links to the other SBD articles.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1958