Bug 1297564

Summary: service pacemaker_remote stop causes node to be fenced
Product: Red Hat Enterprise Linux 6 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact: Steven J. Levine <slevine>
Priority: high    
Version: 6.7CC: abeekhof, cfeist, cluster-maint, cluster-qe, kwenning, michele, royoung, rscarazz, slevine, tlavigne
Target Milestone: rc   
Target Release: 6.8   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.14-5.el6 Doc Type: Release Note
Doc Text:
Graceful migration of resources when the *pacemaker_remote* service is stopped on an active Pacemaker Remote node If the *pacemaker_remote* service is stopped on an active Pacemaker Remote node, the cluster will gracefully migrate resources off the node before stopping the node. Previously, Pacemaker Remote nodes were fenced when the service was stopped (including by commands such as "yum update"), unless the node was first explicitly taken out of the cluster. Software upgrades and other routine maintenance procedures are now much easier to perform on Pacemaker Remote nodes. Note: All nodes in the cluster must be upgraded to a version supporting this feature before it can be used on any node.
Story Points: ---
Clone Of: 1288929
: 1323259 (view as bug list) Environment:
Last Closed: 2016-05-10 23:52:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1288929    
Bug Blocks: 1185030, 1323259, 1325009    

Description Ken Gaillot 2016-01-11 21:39:17 UTC
+++ This bug was initially created as a clone of Bug #1288929 +++

Description of problem:

Graceful shutdown doesn't work

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. Run: service pacemaker_remote stop 

Actual results:

Fencing!

Expected results:

1. Contacts/notifies peer cluster
2. Peer cluster stops all services
3. Peer cluster tells pacemaker_remote it can shut down
4. Peer cluster recognises that the remote node was expected to shutdown

Additional info:

Esp. relevant for OSP upgrades

--- Additional comment from Raoul Scarazzini on 2015-12-24 05:28:02 EST ---

As a side note, and as a workaround, the sequence of the commands we are using to avoid fencing is this one:

1) Reboot the compute node from console
2) Do a nova stop <computenodeid> from the undercloud
3) Do a nova start <computenodeid> from the undercloud
4) Do a cycle like this on one of the controller:
$ while true; do sudo pcs resource cleanup overcloud-novacompute-0; sleep 5; done
from a controller node
5) Once the machine is up, stop the cycle from step 4

--- Additional comment from Ken Gaillot on 2016-01-08 16:29:33 EST ---

Fixed upstream as of commit da17fd0

Comment 3 Klaus Wenninger 2016-01-29 15:12:37 UTC
QA on RHEL-7.2 z-stream with same fixes as used  here found
some issues (see rhbz#1299348).
Fixed in Version: pacemaker-1.1.14-2.0.el6

Comment 6 Ken Gaillot 2016-03-18 17:38:26 UTC
Upstream commit cd10f0b is needed to support this feature on RHEL6 due to the use of legacy attrd, and will be backported

Comment 7 Ken Gaillot 2016-03-18 19:19:16 UTC
build has been updated

Comment 18 errata-xmlrpc 2016-05-10 23:52:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0856.html