Hide Forgot
This needs to be documented. It's too late to get this into the 7.3 Pacemaker reference, so we can put out a portal article and then move the info to the product doc if we decide it needs to be there in the long turn. From Ken Gaillot: Hi Steven, We recently realized a potential issue with upgrades to 7.3. The new version of pacemaker in 7.3 involves an increase in the version number of the remote node protocol. This is the first time that's happened, so we didn't realize the potential consequences on customer upgrades. Cluster nodes running 7.3 pacemaker will not be able to talk to any remote nodes running an older version, and vice versa. This means that rolling upgrades of a pacemaker cluster with remote nodes is tricky. We haven't verified the issue or the fix yet, but this is what I expect the upgrade process will need to look like in such a case: 1. Upgrade half the cluster's corosync nodes like this: * Add -INFINITY location constraints for each remote node or guest node vs the corosync node being upgraded. This ensures that no remote nodes connect to the node after it is upgraded. * Stop and upgrade the corosync node, and return it to the cluster. 2. Upgrade all the remote/guest nodes like this: * Stop the remote node or guest node. (Guest nodes are stopped by disabling their resource.) * Remove all the constraints added in step 1 for the remote node or guest node, and add -INFINITY constraints for the remote node or guest node vs all the *other* corosync nodes. This ensures that the upgraded remote node will not connect to an older corosync node. * Upgrade the node, and return it to the cluster. For guest nodes, this involves bringing the virtual machine up outside cluster control, upgrading it, stopping the virtual machine, then re-enabling the guest node resource. 3. Upgrade the remaining corosync nodes like this: * Upgrade the host as normal. * Remove all constraints added for the node in step 2. If the above process is not followed, the cluster will see failures of the remote node connection resources and guest node VM resources, potentially involving very lengthy recovery times, and depending on the order of upgrades, eventually the remote nodes and guest nodes may be unable to run at all until more nodes have been upgraded. This only affects clusters with remote nodes or guest nodes. Other clusters may perform rolling upgrades in the usual manner. -- Ken Gaillot <kgaillot>
Portal article will be here (currently in very basic draft state): https://access.redhat.com/articles/2726571
Draft is now formatted and ready for review and testing: https://access.redhat.com/articles/2726571