Bug 1413958 - Add support for shared storage with sbd
Summary: Add support for shared storage with sbd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.3
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: Ondrej Mular
QA Contact: cluster-qe@redhat.com
Steven J. Levine
URL:
Whiteboard:
Depends On: 1413951 1414053
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-17 12:43 UTC by Klaus Wenninger
Modified: 2017-08-01 18:26 UTC (History)
10 users (show)

Fixed In Version: pcs-0.9.157-1.el7
Doc Type: Release Note
Doc Text:
Support for using SBD with shared storage Support has been added for configured SBD (Storage-Based Death) with shared storage using the "pcs" commands. For information on SBD fending, see https://access.redhat.com/articles/2943361.
Clone Of: 1413951
Environment:
Last Closed: 2017-08-01 18:26:07 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1958 normal SHIPPED_LIVE pcs bug fix and enhancement update 2017-08-01 18:09:47 UTC

Description Klaus Wenninger 2017-01-17 12:43:12 UTC
To prevent manual editing of sbd-config-files at least the ability
to configure reference to the shared block devices would have
to be added to pcs.
Maybe on top it makes sense to add some shortcut for adding
fence_sbd as a fencing resource.

+++ This bug was initially created as a clone of Bug #1413951 +++

Description of problem:
SBD provided with RHEL doesn't support usage of shared storage - just watchdog


Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Klaus Wenninger on 2017-01-17 07:36:19 EST ---

Since we don't support sbd on remote-nodes we won't support shared block devices
with sbd there either.
As with sbd support in general on remote-nodes use of shared block devices there
is not explicitly disabled and in fact seems to be working as expected if
parameters '-n {remote_node_name}' is added to sbd-config.

Comment 4 Klaus Wenninger 2017-01-24 11:03:10 UTC
Forgotten one point that might be handy:

With sbd + shared storage you would use the sbd-cmdline-tool for 3 purposes:

- initialization of the messaging layout on the block devices
- manually checking for messages
- reset messages after node being fenced

This can all be done from one node that has access to the block device(s).
So no support from pcsd is needed to enable the possibility of management just
from a single node without the need to ssh anywhere else.

But as the reference to the block-device(s) is needed in multiple places
like initialization, creation of fencing-resource and config written to
/etc/sysconfig/sbd (has to be done on all nodes) pcs might provide a 
shortcut here.

Comment 6 Klaus Wenninger 2017-01-24 15:48:38 UTC
Regarding manual reset of the sbd-message-slots it is definitely a feature
that a manual reset is required.

There is a cmdline-parameter (-S) to the sbd-daemon & 
SBD_STARTMODE in /etc/sysconfig/sbd.
Not giving -S in the cmdline and not defining SBD_STARTMODE both
default to starting regardless of which was the last message conveyed via
the sbd-message-slot.
The sbd-startup then cleans the slot automatically.

On the other hand the default /etc/sysconfig/sbd defines
SBD_STARTMODE=clean which would require manual cleaning.
I don't know which usecase was behind doing the latter.
So I guess it would just be easiest to alter the default
/etc/sysconfig/sbd to get a behaviour more similar to other
fencing-devices.

As this is obviously a feature it might make sense though to support setting 
of SBD_STARTMODE via pcs, especially as this is something you would probably
have to do on all cluster nodes - regardless which default we might go for.

Comment 7 Klaus Wenninger 2017-01-26 11:32:43 UTC
a few thoughts on the interface ...

The simple setup has a single set of shared-block-devices
that is seen by all of the nodes.
This would mean that the devices have to be simultaneously
accessible from 16 nodes (supported node-limit in RHEL
probably being raised to 32).
So probably no issue at the moment.
Anyway - with the addition of future support for remote-nodes
this could look differently as there can be hundreds to 
maybe even thousands of them and this could be an issue for
shared-block-devices.

From what I've seen one way to tackle this issue might be
having several sets of shared-block-devices and one 
fence_sbd-instance for each of the sets. A node - regardless
if remote-node or cluster-node - would then be accessing one
of these sets. At least 2 cluster-nodes per set probably
makes sense to be able to fence the remote-nodes even if
one of the cluster-nodes is down and so that the 
cluster-nodes can fence each other.

So I guess when introducing an admin-interface via pcs it
probably makes sense to have a seamless extension to a
scenario (and with that the admin-interface) like that 
in mind.

Comment 9 Ondrej Mular 2017-03-31 13:58:52 UTC
Upstream patch:
https://github.com/ClusterLabs/pcs/commit/6666a61edb2c8fccf20dd719fc13f2930898

This adds support only into cli. We still need to make sure if GUI works properly.

Comment 10 Ondrej Mular 2017-04-06 15:01:28 UTC
additional patch:
https://github.com/ClusterLabs/pcs/commit/fa47bdcb0a9c9ee46699a0fea2ba25846918

This patch allows to specify SBD device when adding node to cluster from web UI.

Comment 11 Ondrej Mular 2017-04-07 11:09:38 UTC
additional fix:
https://github.com/ClusterLabs/pcs/commit/6555f41cb5def262435df6ec9f7c0cb8b67e

TEST:
2 node cluster: rhel74-node1, rhel74-node2
requirements:
 - HW watchdog on all nodes (/dev/watchdog)
 - shared device (/dev/vdb)
 - sbd installed on all nodes

SBD is disabled:
[root@rhel74-node1 ~]# pcs stonith sbd status
SBD STATUS
<node name>: <installed> | <enabled> | <running>
rhel74-node1: YES |  NO |  NO
rhel74-node2: YES |  NO |  NO

Create fence device for sbd:
[root@rhel74-node1 ~]# pcs stonith create sbd-fencing fence_sbd devices=/dev/vdb method=cycle
[root@rhel74-node1 ~]# pcs status
Cluster name: rhel74
Stack: corosync
Current DC: rhel74-node2 (version 1.1.16-6.el7-94ff4df) - partition with quorum
Last updated: Fri Apr  7 09:18:07 2017
Last change: Fri Apr  7 09:17:57 2017 by root via cibadmin on rhel74-node1

2 nodes configured
1 resource configured

Online: [ rhel74-node1 rhel74-node2 ]

Full list of resources:

 sbd-fencing	(stonith:fence_sbd):	Started rhel74-node1

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled


Initialize shared device:
[root@rhel74-node1 ~]# pcs stonith sbd device setup --device=/dev/vdb
WARNING: All current content on device(s) '/dev/vdb' will be overwritten. Are you sure you want to continue? [y/N] y
Initializing device(s) /dev/vdb...
Device(s) initialized successfuly

Enable SBD:
[root@rhel74-node1 ~]# pcs stonith sbd enable --device=/dev/vdb
Running SBD pre-enabling checks...
rhel74-node1: SBD pre-enabling checks done
rhel74-node2: SBD pre-enabling checks done
Distributing SBD config...
rhel74-node1: SBD config saved
rhel74-node2: SBD config saved
Enabling SBD service...
rhel74-node2: sbd enabled
rhel74-node1: sbd enabled
Warning: Cluster restart is required in order to apply these changes.

Restart cluster:
[root@rhel74-node1 ~]# pcs cluster stop --all
rhel74-node2: Stopping Cluster (pacemaker)...
rhel74-node1: Stopping Cluster (pacemaker)...
rhel74-node1: Stopping Cluster (corosync)...
rhel74-node2: Stopping Cluster (corosync)...
[root@rhel74-node1 ~]# pcs cluster start --all
rhel74-node2: Starting Cluster...
rhel74-node1: Starting Cluster...

Check SBD status and config:
[root@rhel74-node1 ~]# pcs stonith sbd status
SBD STATUS
<node name>: <installed> | <enabled> | <running>
rhel74-node1: YES | YES | YES
rhel74-node2: YES | YES | YES

Messages list on device '/dev/vdb':
0	rhel74-node2	clear	
1	rhel74-node1	clear	

[root@rhel74-node1 ~]# pcs stonith sbd config
SBD_WATCHDOG_TIMEOUT=5
SBD_STARTMODE=always
SBD_DELAY_START=no

Watchdogs:
  rhel74-node1: /dev/watchdog
  rhel74-node2: /dev/watchdog

Devices:
  rhel74-node1: "/dev/vdb"
  rhel74-node2: "/dev/vdb"

Disable SBD:
[root@rhel74-node1 ~]# pcs stonith sbd disable
Disabling SBD service...
rhel74-node1: sbd disabled
rhel74-node2: sbd disabled
Warning: Cluster restart is required in order to apply these changes.

Comment 13 Ivan Devat 2017-04-10 16:04:31 UTC
see comment 11

Comment 33 Steven J. Levine 2017-06-26 21:24:22 UTC
I added a brief description for the release notes, just so we have something.  When the final articles are ready on the Portal that document SBD with this new feature, I can add references to them.

Comment 34 Steven J. Levine 2017-07-26 15:23:02 UTC
I added a reference to the published Portal article on SBD fencing, which includes links to the other SBD articles.

Comment 35 errata-xmlrpc 2017-08-01 18:26:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1958


Note You need to log in before you can comment on or make changes to this bug.