RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1377724 - pacemaker-remote restart cause watchdog-reboot with sbd and pacemaker-watcher
Summary: pacemaker-remote restart cause watchdog-reboot with sbd and pacemaker-watcher
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sbd
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 7.9
Assignee: Klaus Wenninger
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1693262
TreeView+ depends on / blocked
 
Reported: 2016-09-20 13:04 UTC by Klaus Wenninger
Modified: 2020-12-15 07:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1693262 (view as bug list)
Environment:
Last Closed: 2020-12-15 07:46:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Klaus Wenninger 2016-09-20 13:04:05 UTC
Description of problem:
When running pacemaker-remote with sbd and pacemaker-watcher
once cluster node is connected a

  systemctl restart pacemaker_remote

triggers a watchdog reboot.

Version-Release number of selected component (if applicable):
sbd-1.2.1-21.el7

How reproducible:
100%

Steps to Reproduce:
1. Setup pacemaker-remote with sbd and pacemaker-watcher
2. wait till cluster-node is connected
3. issue 'systemctl restart pacemaker_remote'

Actual results:
watchdog-reboot

Expected results:
pacemaker-remote and sbd should both restart and 
cluster-node should be able to reconnect

Additional info:
this behaviour is due to how the sbd-remote unit file is configured
to just wait for the inquisitor-process of sbd to die before allowing
systemd to restart pacemaker-remote

As a workaround you can do:

systemctl stop pacemaker_remote
sleep 10
systemctl start pacemaker_remote

This is not the reason why package update in bz1372009 fails

setting the KillMethod=mixed in sbd-remote-unit-file fixes the issue

Comment 6 Klaus Wenninger 2017-11-03 12:23:34 UTC
Using upstream pacemaker & sbd packages with systemd from rhel-7.4 setting KillMode=mixed definitely doesn't solve the issue.

Using partof in the systemd unit to make sbd_remote start with pacemaker_remote leads to uncoordinated restarts of sbd_remote & pacemaker_remote (systemctl restart pacemaker_remote).
The restart of sbd is so quick that it still sees the pacemaker_remote-instance from before the restart just to immediately afterwards loosing the connection to the restarted pacemaker_remote and as it doesn't (and shouldn't) automatically reconnect to the new instance a reboot is triggered.

Possible ways out would be to specify sbd_remote to be started after pacemaker_remote.
That leads to stopping happening in the opposite order and thus to the problems above not happening.
But on the other hand when stopped before stopping pacemaker_remote sbd_remote can't monitor the shutdown of pacemaker_remote and all the services running under  control of pacemaker_remote anymore.

Better solutions would be:
- make systemd start sbd_remote after pacemaker_remote while still stopping 
  it after pacemaker_remote has been stopped
- make systemd when restating a service first stop the service + 
  partof-services and just afterwards start them all up again
- make sbd_remote watch out for a running pacemaker_remote (the one it's pid
  it has grabbed before already) and just stop once that is gone
  (quick test-implementation with probably issues found under
  https://github.com/ClusterLabs/sbd/pull/33)

Comment 8 Klaus Wenninger 2018-06-20 11:53:35 UTC
BZ1593254 is dealing with the orchestration of startup/stop/restart of sbd-remote & pacemaker-remote as well.
Thus the 2 BZs should have an orchestrated solution instead of going e.g. a route as described in the PR above that takes just care of the restart issue.

Comment 11 RHEL Program Management 2020-12-15 07:46:09 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.