Bug 1377724
| Summary: | pacemaker-remote restart cause watchdog-reboot with sbd and pacemaker-watcher | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Klaus Wenninger <kwenning> | |
| Component: | sbd | Assignee: | Klaus Wenninger <kwenning> | |
| Status: | CLOSED WONTFIX | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.3 | CC: | cfeist, fdinitto, kgaillot, kwenning, mlisik | |
| Target Milestone: | rc | |||
| Target Release: | 7.9 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1693262 (view as bug list) | Environment: | ||
| Last Closed: | 2020-12-15 07:46:09 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1693262 | |||
|
Description
Klaus Wenninger
2016-09-20 13:04:05 UTC
Using upstream pacemaker & sbd packages with systemd from rhel-7.4 setting KillMode=mixed definitely doesn't solve the issue. Using partof in the systemd unit to make sbd_remote start with pacemaker_remote leads to uncoordinated restarts of sbd_remote & pacemaker_remote (systemctl restart pacemaker_remote). The restart of sbd is so quick that it still sees the pacemaker_remote-instance from before the restart just to immediately afterwards loosing the connection to the restarted pacemaker_remote and as it doesn't (and shouldn't) automatically reconnect to the new instance a reboot is triggered. Possible ways out would be to specify sbd_remote to be started after pacemaker_remote. That leads to stopping happening in the opposite order and thus to the problems above not happening. But on the other hand when stopped before stopping pacemaker_remote sbd_remote can't monitor the shutdown of pacemaker_remote and all the services running under control of pacemaker_remote anymore. Better solutions would be: - make systemd start sbd_remote after pacemaker_remote while still stopping it after pacemaker_remote has been stopped - make systemd when restating a service first stop the service + partof-services and just afterwards start them all up again - make sbd_remote watch out for a running pacemaker_remote (the one it's pid it has grabbed before already) and just stop once that is gone (quick test-implementation with probably issues found under https://github.com/ClusterLabs/sbd/pull/33) BZ1593254 is dealing with the orchestration of startup/stop/restart of sbd-remote & pacemaker-remote as well. Thus the 2 BZs should have an orchestrated solution instead of going e.g. a route as described in the PR above that takes just care of the restart issue. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |