Bug 1388827

Summary: Repair rolling upgrades from 7.2 -> 7.3
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Beekhof <abeekhof>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.3CC: abeekhof, cfeist, cluster-maint, cluster-qe, fdinitto, jpokorny, kgaillot, mjuricek, mkrcmari, phagara, plambri, snagar, tlavigne, tojeline
Target Milestone: rcKeywords: Regression, ZStream
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.15-12.el7 Doc Type: No Doc Update
Doc Text:
This issue was present in the 7.3 GA release of Pacemaker, but received an immediate 7.3.z-stream fix, so I do not think it is necessary to include this in the 7.4 release notes. For reference, the original 7.3.z documentation was: "The version of Pacemaker in Red Hat Enterprise Linux 7.3 incorporated an increase in the version number of the remote node protocol. Consequently, cluster nodes running Pacemaker in Red Hat Enterprise Linux 7.3 and remote nodes running earlier versions of Red Hat Enterprise Linux were not able to communicate with each other unless special precautions were taken. This update preserves the rolling upgrade capability."
Story Points: ---
Clone Of: 1212435
: 1389023 1389028 (view as bug list) Environment:
Last Closed: 2017-08-01 17:54:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1212435    
Bug Blocks:    

Comment 1 Andrew Beekhof 2016-10-26 09:23:04 UTC
Here's what we need to do:

1. Back versioned attrs out of the 7.3 schema
2. Set LRMD_PROTOCOL_VERSION back to 1.0 in 7.3 builds

The only other API change is remote_proxy_check() but that will fail gracefully.
So going back to a value of "1.0" is safe[1] and would mean that
either side can be updated in any order - this time.
In the future, cluster nodes would _all_ need to be updated prior to
touching _any_ remote ones.

Aside: the reason we want the cluster version to be higher (even
though its the client) is because potentially pacemaker remote could
be on a guest that cannot or should not be updated.  Eg. It hosts a
service that requires an old version RHEL.

Then, once 7.3 is out the door:

3a. Either prevent versioned attributes from being used when old
remotes are around, or
3b. Implement a graceful failure mode (like using the version from the
lrmd instead of the remote)
4. Add versioned attrs back to the schema and bump LRMD_PROTOCOL_VERSION
5. Add a "review lrmd changes that might require bumping
LRMD_PROTOCOL_VERSION" to the upstream release checklist
6. Ensure that new lrm features think about how to handle "old"
remotes in the future
7. Update the docs to indicate that the software version on remote
nodes must be lte the lowest version in the main cluster

[1] It's a possibility that I never intended to commit the "1.1"
change and it was actually a testing artefact.

Comment 8 Ken Gaillot 2016-10-26 16:03:53 UTC
Update: versioned attributes were not included in 7.3, and it has been determined that the protocol version change was not strictly necessary in any case, so Comment 1's step 2 is sufficient to fix the issue (steps 5-7 are still worthwhile for the future) .

Comment 14 errata-xmlrpc 2017-08-01 17:54:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1862