Bug 1299348
Summary: | service pacemaker_remote stop causes node to be fenced | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Kurik <jkurik> |
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.2 | CC: | abeekhof, cfeist, cluster-maint, kgaillot, michele, mjuricek, mnavrati, oblaut, royoung, rscarazz, tlavigne |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | pacemaker-1.1.13-10.el7_2.2 | Doc Type: | Enhancement |
Doc Text: |
The pacemaker_remote service can now be stopped on an active Pacemaker Remote node, and the cluster gracefully migrates resources off the node before stopping the node. Previously, Pacemaker Remote nodes had to be taken out of the cluster before stopping the pacemaker_remote service, either explicitly or implicitly by commands such as "yum update", otherwise the node was fenced. Now, software upgrades and other routine maintenance are significantly easier to perform on Pacemaker Remote nodes. Note that all nodes in the cluster must be upgraded to a version supporting this feature before the feature can be used on any node.
|
Story Points: | --- |
Clone Of: | 1288929 | Environment: | |
Last Closed: | 2016-02-16 11:18:20 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1288929 | ||
Bug Blocks: |
Description
Jan Kurik
2016-01-18 07:50:34 UTC
QA: reproducer is to configure a cluster with a Pacemaker Remote node, then run "systemctl stop pacemaker_remote" on the node while it is in the cluster. Previously, the node would be fenced; now, all resources will be moved off the node and it will gracefully stop. This should work the same for remote nodes (configured with ocf:pacemaker:remote resource) and guest nodes (configured with remote-node= attribute on a VM resource). QA: I should have mentioned, that after a graceful stop, the cluster will immediately try to connect to the remote node again. If the remote node is not accepting connections again before the start timeout, the start will fail (and move on to another node if available, and potentially time out there, too). If start times out on all nodes, the cluster will stop trying to reconnect. If a failure-timeout has been configured for the start operation, it will begin retrying again after that time. This is necessary because all remote connections must be initiated from the cluster side, so there is no way for a newly started remote node to signal the cluster it is available. This may be changed in the future, but for now, start failures are expected if the remote node is down for an extended time -- it is only the stopping that is graceful now. An issue in the implementation, with the symptom of a second stop hanging, was found, and fixed upstream as of commit 942efa4. An updated build has been added to the errata. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0216.html |