Bug 1388602 - Document rolling upgrades to 7.3 for remote nodes
Summary: Document rolling upgrades to 7.3 for remote nodes
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: doc-Cluster_General
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Steven J. Levine
QA Contact: ecs-bugs
Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-25 18:15 UTC by Steven J. Levine
Modified: 2019-03-06 00:58 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Upgrading pacemaker clusters with remote nodes requires special procedure The version of Pacemaker in Red Hat Enterprise Linux 7.3 incorporates an increase in the version number of the remote node protocol. Because of this, cluster nodes running Pacemaker in Red Hat Enterprise Linux 7.3 and remote nodes running earlier versions of Red Hat Enterprise Linux are not able to communicate with each other. As a result, a rolling upgrade of a Pacemaker cluster that includes remote nodes requires a special procedure, as documented in https://access.redhat.com/node/2726571.
Clone Of:
Environment:
Last Closed: 2016-10-26 17:46:50 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Steven J. Levine 2016-10-25 18:15:00 UTC
This needs to be documented.  It's too late to get this into the 7.3 Pacemaker reference, so we can put out a portal article and then move the info to the product doc if we decide it needs to be there in the long turn.

From Ken Gaillot:

Hi Steven,

We recently realized a potential issue with upgrades to 7.3.

The new version of pacemaker in 7.3 involves an increase in the version
number of the remote node protocol. This is the first time that's
happened, so we didn't realize the potential consequences on customer
upgrades.

Cluster nodes running 7.3 pacemaker will not be able to talk to any
remote nodes running an older version, and vice versa. This means that
rolling upgrades of a pacemaker cluster with remote nodes is tricky.

We haven't verified the issue or the fix yet, but this is what I expect
the upgrade process will need to look like in such a case:

1. Upgrade half the cluster's corosync nodes like this:
* Add -INFINITY location constraints for each remote node or guest node
vs the corosync node being upgraded. This ensures that no remote nodes
connect to the node after it is upgraded.
* Stop and upgrade the corosync node, and return it to the cluster.

2. Upgrade all the remote/guest nodes like this:
* Stop the remote node or guest node. (Guest nodes are stopped by
disabling their resource.)
* Remove all the constraints added in step 1 for the remote node or
guest node, and add -INFINITY constraints for the remote node or guest
node vs all the *other* corosync nodes. This ensures that the upgraded
remote node will not connect to an older corosync node.
* Upgrade the node, and return it to the cluster. For guest nodes, this
involves bringing the virtual machine up outside cluster control,
upgrading it, stopping the virtual machine, then re-enabling the guest
node resource.

3. Upgrade the remaining corosync nodes like this:
* Upgrade the host as normal.
* Remove all constraints added for the node in step 2.

If the above process is not followed, the cluster will see failures of
the remote node connection resources and guest node VM resources,
potentially involving very lengthy recovery times, and depending on the
order of upgrades, eventually the remote nodes and guest nodes may be
unable to run at all until more nodes have been upgraded.

This only affects clusters with remote nodes or guest nodes. Other
clusters may perform rolling upgrades in the usual manner.
-- 
Ken Gaillot <kgaillot>

Comment 1 Steven J. Levine 2016-10-25 18:57:01 UTC
Portal article will be here (currently in very basic draft state):

https://access.redhat.com/articles/2726571

Comment 2 Steven J. Levine 2016-10-25 19:15:44 UTC
Draft is now formatted and ready for review and testing:

https://access.redhat.com/articles/2726571


Note You need to log in before you can comment on or make changes to this bug.