Bug 1718324
| Summary: | Provide a cleaner way for sbd to detect a graceful pacemaker-shutdown | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Klaus Wenninger <kwenning> |
| Component: | pacemaker | Assignee: | Klaus Wenninger <kwenning> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | Steven J. Levine <slevine> |
| Priority: | medium | ||
| Version: | 8.1 | CC: | cfeist, cluster-maint, cluster-qe, kgaillot, lmanasko, lmiccini, phagara, rbednar |
| Target Milestone: | pre-dev-freeze | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-2.0.4-5.el8 | Doc Type: | Enhancement |
| Doc Text: |
.New `SBD_SYNC_RESOURCE_STARTUP` SBD configuration parameter to improve synchronization with Pacemaker
To better control synchronization between SBD and Pacemaker, the `/etc/sysconfig/sbd` file now supports the `SBD_SYNC_RESOURCE_STARTUP` parameter. When Pacemaker and SBD packages from RHEL 8.3 or later are installed and SBD is configured with `SBD_SYNC_RESOURCE_STARTUP=true`, SBD contacts the Pacemaker daemon for information about the daemon's state.
In this configuration, the Pacemaker daemon will wait until it has been contacted by SBD, both before starting its subdaemons and before final exit. As a result, Pacemaker will not run resources if SBD cannot actively communicate with it, and Pacemaker will not exit until it has reported a graceful shutdown to SBD. This prevents the unlikely situation that might occur during a graceful shutdown when SBD fails to detect the brief moment when no resources are running before Pacemaker finally disconnects, which would trigger an unneeded reboot. Detecting a graceful shutdown using a defined handshake works in maintenance mode as well. The previous method of detecting a graceful shutdown on the basis of no running resources left had to be disabled in maintenance mode since running resources would not be touched on shutdown.
In addition, enabling this feature avoids the risk of a split-brain situation in a cluster when SBD and Pacemaker both start successfully but SBD is unable to contact pacemaker. This could happen, for example, due to SELinux policies. In this situation, Pacemaker would assume that SBD is functioning when it is not. With this new feature enabled, Pacemaker will not complete startup until SBD has contacted it. Another advantage of this new feature is that when it is enabled SBD will contact Pacemaker repeatedly, using a heartbeat, and it is able to panic the node if Pacemaker stops responding at any time.
NOTE: If you have edited your /etc/sysconfig/sbd file or configured SBD through PCS, then an RPM upgrade will not pull in the new `SBD_SYNC_RESOURCE_STARTUP` parameter. In these cases, to implement this feature you must manually add it from the `/etc/sysconfig/sbd.rpmnew` file or follow the procedure described in the `Configuration via environment` section of the `sbd`(8) man page.
|
Story Points: | --- |
| Clone Of: | 1718296 | Environment: | |
| Last Closed: | 2020-11-04 04:00:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1718296, 1743726 | ||
| Bug Blocks: | 1718297, 1873135 | ||
|
Comment 1
Klaus Wenninger
2019-08-02 08:31:54 UTC
qa_ack+, internal implementation improvement only -- to be verified as SanityOnly (ie. regression test suite run) since no reproducer is available (In reply to Patrik Hagara from comment #2) > qa_ack+, internal implementation improvement only -- to be verified as > SanityOnly (ie. regression test suite run) since no reproducer is available A pitty that we got the hacky solution quite good meanwhile so that the reproducer is gone ;-) Fixed upstream as of commit 567cb6e (when used in combination with a compatible sbd version) QA: Due to the need to maintain backward compatibility, this feature is enabled only if SBD_SYNC_RESOURCE_STARTUP=true is set in /etc/sysconfig/sbd (8.3 versions of both pacemaker and sbd must be installed). As I understand it, we do not support mixed-version packages on a single node, so only the upgraded packages need to be tested, but just for background, here is how it should behave with mixed versions: * Old pacemaker, old sbd, any SBD_SYNC_RESOURCE_STARTUP: old behavior (pacemaker will start even if sbd is blocked from contacting it by SELinux, and sbd will panic the node if pacemaker shuts down cleanly in maintenance mode while resources are still active). * Old pacemaker, new sbd, SBD_SYNC_RESOURCE_STARTUP false/missing: old behavior. * Old pacemaker, new sbd, SBD_SYNC_RESOURCE_STARTUP true: most likely sbd will crash (8.3 sbd packages have a Requires for the new pacemaker, so it shouldn't normally be possible) * New pacemaker, any sbd, SBD_SYNC_RESOURCE_STARTUP false/missing: old behavior. Pacemaker will log a warning recommending turning the setting on if sbd supports it. * New pacemaker, old sbd, SBD_SYNC_RESOURCE_STARTUP true: pacemaker will not start any subdaemons. * New pacemaker, new sbd, SBD_SYNC_RESOURCE_STARTUP true: new behavior (pacemaker starts subdaemons only if sbd can contact it, and sbd doesn't panic on clean shutdown even if resources are active). The sbd side of this feature is Bug 1743726 Sorry, meant 8.3 in the doc text SBD_SYNC_RESOURCE_STARTUP is the only thing covered here, this is just the pacemaker side of it, Bug 1743726 is the sbd side of it but it's the same feature Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4804 |