Bug 1660158

Summary: timeout-action triggered by sbd-daemon isn't configurable
Product: Red Hat Enterprise Linux 7 Reporter: Klaus Wenninger <kwenning>
Component: sbdAssignee: Klaus Wenninger <kwenning>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: aherr, cfeist, cluster-maint, cluster-qe, fdinitto, mlisik, oalbrigt
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sbd-1.4.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
The sbd-daemon always triggers a reboot when it detects a timeout. If there is a hardware watchdog configured to trigger a poweroff there is gonna be a race between poweroff and reboot with potential result of an undesired reboot. In case of usage without a hardware watchdog there is no way to trigger a poweroff at all - if that is the desired action e.g. to accelerate an otherwise slow poweroff mechanism. Now that config allows to set the desired action it is possible to trigger a reliable poweroff.
Story Points: ---
Clone Of: 1660147
: 1666201 1666202 1666203 (view as bug list) Environment:
Last Closed: 2019-08-06 12:47:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1660147    
Bug Blocks: 1666201, 1666202, 1666203    

Description Klaus Wenninger 2018-12-17 16:20:30 UTC
+++ This bug was initially created as a clone of Bug #1660147 +++

Description of problem:
With most hardware-watchdog-devices the action performed if the watchdog runs off  can be configured to reboot or poweroff.
There is a race between hardware-watchdog and timeout-handling inside sbd-daemon.
If sbd-daemon is first to detect the timeout it always triggers a reboot leading to undesired effects if the action configured with the hardware-watchdog is a poweroff.
If sbd is being used without a hardware-watchdog (watchdog-device in general - softdog would apply as well though no inconsistency can arise here as softdog doesn't support being configure to poweroff) it is thus not configurable at all which action should be performed in case of a timeout.

Version-Release number of selected component (if applicable):
1.3.1-17.el8

How reproducible:
100%

Steps to Reproduce:
1. configure sbd without watchdog-device and without a shared disk
2. configure 3-node cluster
3. disconnect one of the nodes from the cluster so that loss of quorum makes pacemaker-watcher stop triggering sbd-inquisitor

Actual results:
sbd-inquisitor triggers a reboot


Expected results:
sbd-inquisitor triggers action as specified by new config-variable SBD_TIMEOUT_ACTION


Additional info:

Comment 10 errata-xmlrpc 2019-08-06 12:47:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2103