RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1718324 - Provide a cleaner way for sbd to detect a graceful pacemaker-shutdown
Summary: Provide a cleaner way for sbd to detect a graceful pacemaker-shutdown
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pacemaker
Version: 8.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: pre-dev-freeze
: 8.3
Assignee: Klaus Wenninger
QA Contact: cluster-qe@redhat.com
Steven J. Levine
URL:
Whiteboard:
Depends On: 1718296 1743726
Blocks: 1718297 1873135
TreeView+ depends on / blocked
 
Reported: 2019-06-07 13:27 UTC by Klaus Wenninger
Modified: 2020-11-04 04:01 UTC (History)
8 users (show)

Fixed In Version: pacemaker-2.0.4-5.el8
Doc Type: Enhancement
Doc Text:
.New `SBD_SYNC_RESOURCE_STARTUP` SBD configuration parameter to improve synchronization with Pacemaker To better control synchronization between SBD and Pacemaker, the `/etc/sysconfig/sbd` file now supports the `SBD_SYNC_RESOURCE_STARTUP` parameter. When Pacemaker and SBD packages from RHEL 8.3 or later are installed and SBD is configured with `SBD_SYNC_RESOURCE_STARTUP=true`, SBD contacts the Pacemaker daemon for information about the daemon's state. In this configuration, the Pacemaker daemon will wait until it has been contacted by SBD, both before starting its subdaemons and before final exit. As a result, Pacemaker will not run resources if SBD cannot actively communicate with it, and Pacemaker will not exit until it has reported a graceful shutdown to SBD. This prevents the unlikely situation that might occur during a graceful shutdown when SBD fails to detect the brief moment when no resources are running before Pacemaker finally disconnects, which would trigger an unneeded reboot. Detecting a graceful shutdown using a defined handshake works in maintenance mode as well. The previous method of detecting a graceful shutdown on the basis of no running resources left had to be disabled in maintenance mode since running resources would not be touched on shutdown. In addition, enabling this feature avoids the risk of a split-brain situation in a cluster when SBD and Pacemaker both start successfully but SBD is unable to contact pacemaker. This could happen, for example, due to SELinux policies. In this situation, Pacemaker would assume that SBD is functioning when it is not. With this new feature enabled, Pacemaker will not complete startup until SBD has contacted it. Another advantage of this new feature is that when it is enabled SBD will contact Pacemaker repeatedly, using a heartbeat, and it is able to panic the node if Pacemaker stops responding at any time. NOTE: If you have edited your /etc/sysconfig/sbd file or configured SBD through PCS, then an RPM upgrade will not pull in the new `SBD_SYNC_RESOURCE_STARTUP` parameter. In these cases, to implement this feature you must manually add it from the `/etc/sysconfig/sbd.rpmnew` file or follow the procedure described in the `Configuration via environment` section of the `sbd`(8) man page.
Clone Of: 1718296
Environment:
Last Closed: 2020-11-04 04:00:53 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 1 Klaus Wenninger 2019-08-02 08:31:54 UTC
Watching the cib for still running resources and a final check if those left are unmanaged
seems to be robust but requires repeated checking and might still potentially be racy.
So thinking of adding a dedicated mechanism sounds reasonable.

Comment 2 Patrik Hagara 2019-09-20 10:01:01 UTC
qa_ack+, internal implementation improvement only -- to be verified as SanityOnly (ie. regression test suite run) since no reproducer is available

Comment 3 Klaus Wenninger 2019-09-20 15:27:10 UTC
(In reply to Patrik Hagara from comment #2)
> qa_ack+, internal implementation improvement only -- to be verified as
> SanityOnly (ie. regression test suite run) since no reproducer is available

A pitty that we got the hacky solution quite good meanwhile so that the reproducer is gone ;-)

Comment 6 Patrik Hagara 2020-03-23 09:53:03 UTC
re-adding qa_ack+, see comment#2

Comment 8 Ken Gaillot 2020-07-24 18:17:31 UTC
Fixed upstream as of commit 567cb6e (when used in combination with a compatible sbd version)

Comment 9 Ken Gaillot 2020-07-27 14:57:00 UTC
QA: Due to the need to maintain backward compatibility, this feature is enabled only if SBD_SYNC_RESOURCE_STARTUP=true is set in /etc/sysconfig/sbd (8.3 versions of both pacemaker and sbd must be installed). As I understand it, we do not support mixed-version packages on a single node, so only the upgraded packages need to be tested, but just for background, here is how it should behave with mixed versions:

* Old pacemaker, old sbd, any SBD_SYNC_RESOURCE_STARTUP: old behavior (pacemaker will start even if sbd is blocked from contacting it by SELinux, and sbd will panic the node if pacemaker shuts down cleanly in maintenance mode while resources are still active).

* Old pacemaker, new sbd, SBD_SYNC_RESOURCE_STARTUP false/missing: old behavior.

* Old pacemaker, new sbd, SBD_SYNC_RESOURCE_STARTUP true: most likely sbd will crash (8.3 sbd packages have a Requires for the new pacemaker, so it shouldn't normally be possible)

* New pacemaker, any sbd, SBD_SYNC_RESOURCE_STARTUP false/missing: old behavior. Pacemaker will log a warning recommending turning the setting on if sbd supports it.

* New pacemaker, old sbd, SBD_SYNC_RESOURCE_STARTUP true: pacemaker will not start any subdaemons.

* New pacemaker, new sbd, SBD_SYNC_RESOURCE_STARTUP true: new behavior (pacemaker starts subdaemons only if sbd can contact it, and sbd doesn't panic on clean shutdown even if resources are active).

Comment 10 Ken Gaillot 2020-07-27 15:04:22 UTC
The sbd side of this feature is Bug 1743726

Comment 17 Ken Gaillot 2020-09-25 20:40:25 UTC
Sorry, meant 8.3 in the doc text

SBD_SYNC_RESOURCE_STARTUP is the only thing covered here, this is just the pacemaker side of it, Bug 1743726 is the sbd side of it but it's the same feature

Comment 32 errata-xmlrpc 2020-11-04 04:00:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4804


Note You need to log in before you can comment on or make changes to this bug.