Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1449982

Summary:	SBD with Storage Integration Must NEVER Fall-Back to Quorum-Based-Watchdog-Self-Fencing on 2-node clusters: Data loss risk
Product:	Red Hat Enterprise Linux 7	Reporter:	Daniel Peess <dpeess>
Component:	pacemaker	Assignee:	Ken Gaillot <kgaillot>
Status:	CLOSED DUPLICATE	QA Contact:	cluster-qe <cluster-qe>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.4	CC:	abeekhof, cfeist, cluster-maint, kwenning, mlisik, mreinke
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-10-19 14:11:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Daniel Peess 2017-05-11 09:47:21 UTC

Description of problem:

In #1449155 Klaus Wenninger warned me that SBD might fail-back to quorum-based-watchdog-self-fencing if poison-pill-fencing fails and 'stonith-watchdog-timeout' is (still) set.

For 2-node clusters without auto-tie-breaker, this is a serious data loss risk:
2-node clusters without additional arbitrators never lose quorum,
so both might fail to send poison-pills,
wait for 'stonith-watchdog-timeout',
and both take-over exclusive resources anyway: data loss.

even after i set:
$ pcs property unset stonith-watchdog-timeout;

i still get 'Relying on watchdog integration for fencing' in corosync.log,
luckily my SBD fencing agents work properly and fence via poison-pill before that happens.

can we enforce that 2-node clusters never ever activate/fallback-to watchdog-only-self-fencing behaviour no matter if stonith-watchdog-timeout is set or not?

Version-Release number of selected component (if applicable):
RHEL 7.3 with RHEL 7.4 dev SBD packages with poison-pill storage integration.

How to reproduce:
-) Setup SBD with poison-pill-fencing.
-) Falsely set stonith-watchdog-timeout (because you do not know exactly what's the difference between those 2 SBD modes).
-) Put both SBD agents into stopped mode.
-) Check if both nodes start their RAs on split-brain.

Comment 9 Klaus Wenninger 2017-10-19 14:11:43 UTC

If there would be done any implementation solely for that issue it would probably have to be done on the pacemaker-side.
But as said before this is probably most effectively taken care of by making the mechanism described in bz1443666 automatically remove all cluster-nodes (2) from the list of nodes that are fenced via watchdog-fencing if 2-node is enabled.

*** This bug has been marked as a duplicate of bug 1443666 ***