Bug 1718296

Summary:	Shutting down pacemaker on node in maintenance triggers sbd-reboot
Product:	Red Hat Enterprise Linux 8	Reporter:	Klaus Wenninger <kwenning>
Component:	sbd	Assignee:	Klaus Wenninger <kwenning>
Status:	CLOSED ERRATA	QA Contact:	Michal Mazourek <mmazoure>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	8.1	CC:	cfeist, cluster-maint, mlisik, mmazoure
Target Milestone:	rc
Target Release:	8.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	sbd-1.4.0-10.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1718297 1718324 (view as bug list)		Environment:
Last Closed:	2019-11-05 20:46:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1718297, 1718324

Description Klaus Wenninger 2019-06-07 12:40:12 UTC

Description of problem:

When on a node that was put into maintenance-mode pacemaker is shut down all resources keep active.
That being detected by sbd when the pacemaker-connection goes away makes sbd
think the shutdown was ungracefully and thus it expects pacemaker to reconnect
immediately (watchdog-timeout).


Version-Release number of selected component (if applicable):

sbd-1.4.0-8.el8

How reproducible:

100%

Steps to Reproduce:
1. Setup cluster using sbd
2. Start cluster and wait till some resources are started on the node you want
   to trigger the missbehaviour on
3. Bring the node into maintenance-mode using 
   'pcs node maintenance {your-test-node}'
4. Issue 'systemctl stop pacemaker' on {your-test-node}

Actual results:

Node is rebooted by sbd

Expected results:

Sbd should detect that the pacemaker-shutdown is still graceful and not trigger
a reboot.

Additional info:

The correction would be needed e.g. in an pacemaker-upgrade-scenario where the node is first set to maintenance. Then pacemaker is shut down, upgraded, restarted and the node brought back from maintenance-mode.
The scenario is delicate though as the node still running resources then without
pacemaker-control could still be watchdog-fenced - without any impact on the
node of course. Preventing a node in maintenance-mode from being watchdog-fenced 
sounds like a solution here.

Comment 3 Klaus Wenninger 2019-06-07 17:31:38 UTC

https://github.com/ClusterLabs/sbd/pull/84

Comment 7 errata-xmlrpc 2019-11-05 20:46:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3344