Hide Forgot
This bug has been copied from bug #1288929 and has been proposed to be backported to 7.2 z-stream (EUS).
QA: reproducer is to configure a cluster with a Pacemaker Remote node, then run "systemctl stop pacemaker_remote" on the node while it is in the cluster. Previously, the node would be fenced; now, all resources will be moved off the node and it will gracefully stop. This should work the same for remote nodes (configured with ocf:pacemaker:remote resource) and guest nodes (configured with remote-node= attribute on a VM resource).
QA: I should have mentioned, that after a graceful stop, the cluster will immediately try to connect to the remote node again. If the remote node is not accepting connections again before the start timeout, the start will fail (and move on to another node if available, and potentially time out there, too). If start times out on all nodes, the cluster will stop trying to reconnect. If a failure-timeout has been configured for the start operation, it will begin retrying again after that time. This is necessary because all remote connections must be initiated from the cluster side, so there is no way for a newly started remote node to signal the cluster it is available. This may be changed in the future, but for now, start failures are expected if the remote node is down for an extended time -- it is only the stopping that is graceful now.
An issue in the implementation, with the symptom of a second stop hanging, was found, and fixed upstream as of commit 942efa4. An updated build has been added to the errata.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0216.html