Bug 1389443

Summary: CIB upgrade does not check what version is possible to upgrade to clusterwise
Product: Red Hat Enterprise Linux 7 Reporter: Tomas Jelinek <tojeline>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.3CC: cfeist, cluster-maint, idevat, omular, rsteiger, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.156-1.el7 Doc Type: Bug Fix
Doc Text:
Cause: The user configures a new pacemaker feature (alerts in this case). Consequence: Only the nodes which support the feature get the configuration. On the other nodes the CIB is unchanged and does not contain the new configuration. This makes the CIB inconsistent across the cluster. Fix: Run the pacemaker tool for upgrading CIB version in such a way that it actually checks what version is supported clusterwise. Result: If some nodes do not support the new feature, print an error message and do not change current cluster configuration on any node.
Story Points: ---
Clone Of:
: 1397408 (view as bug list) Environment:
Last Closed: 2017-08-01 18:24:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1397408    
Attachments:
Description Flags
proposed fix
none
proposed fix none

Description Tomas Jelinek 2016-10-27 15:15:02 UTC
This is based on the discussion in bz1305130.

How the CIB upgrade process works in pacemaker:
1. Pacemaker keeps track of what pacemaker version is running on each node.
2. Pacemaker elects the DC in such a way that DC is always the node with the oldest pacemaker version in a cluster.
3. When the "cibadmin --upgrade" command is run, the request is sent to the DC.
4. DC bumps the CIB schema version to the newest version supported by that very DC and tells other nodes about the change.
5. As a result, the CIB never gets upgraded to a schema version which is not supported by all nodes.

If the CIB upgrade is requested on a file, then there is no communication in a cluster. CIB schema version simply gets bumped to the newest version supported by that particular node.


When moving to the new pcs architecture, the CIB upgrade process was moved to the pcs library. In order to get rid of the side effect, we switched the live CIB upgrade to file-based CIB upgrade. That way we bypass all the checking done in pacemaker (see steps 1-5 above).

What we need to do is switch back to the live upgrade (unless -f was specified on the command line) to ensure the correct upgrade procedure is used and deal with the resulting side effect.

Comment 1 Tomas Jelinek 2016-11-09 13:42:53 UTC
Created attachment 1218962 [details]
proposed fix

Test:

1) Setup:
- Have a cluster with no pacemaker alerts support (RHEL7.2).
- Upgrade pcs and pacemaker on one node to RHEL7.3 version with alerts support.
[root@rh72-node1:~]# rpm -q pcs
pcs-0.9.152-10.el7.x86_64
[root@rh72-node1:~]# rpm -q pacemaker
pacemaker-1.1.15-11.el7.x86_64
[root@rh72-node2:~]# rpm -q pacemaker
pacemaker-1.1.13-10.el7.x86_64
[root@rh72-node3:~]# rpm -q pacemaker
pacemaker-1.1.13-10.el7.x86_64

[root@rh72-node1:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[root@rh72-node2:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[root@rh72-node3:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"


2) Before fix:
[root@rh72-node1:~]# pcs alert create path=/some/path
CIB has been upgraded to the latest schema version.
[root@rh72-node1:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.5"
[root@rh72-node2:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[root@rh72-node3:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"


3) After fix:
[root@rh72-node1:~]# pcs alert create path=/some/path
Error: Upgrading of CIB to the latest schema failed: Call cib_upgrade failed (-62): Timer expired
[root@rh72-node1:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[root@rh72-node2:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[root@rh72-node3:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"

Note pacemaker 1.1.13 exits with an error (timer expired) if current CIB schema version matches the latest available version. This is fixed in newer builds of pacemaker. It does not have any effect on pcs bug / behavior other than pcs printing different error message instead of saying the CIB is already at the newest schema available.


Only "pcs alert" and "pcs acl" commands are affected. The bug is in the new pcs library which other commands do not use yet. Acls require schema version 2.0 which is quite old so the bug may not manifest there.

The bug was introduced in pcs-0.9.152-3.el7.

Comment 2 Tomas Jelinek 2016-12-09 15:06:24 UTC
Created attachment 1230027 [details]
proposed fix

Since we now update the CIB in the live cluster prior to making the actual requested changes, we need to report the CIB has been upgraded even if we do not make any changes due to an error.

Before fix:
[root@rh73-node1:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[root@rh73-node1:~]# pcs alert create path=/some/path id=b@d
Error: invalid alert-id 'b@d', '@' is not a valid character for a alert-id
[root@rh73-node1:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.5"

After fix:
[root@rh73-node1:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[root@rh73-node1:~]# pcs alert create path=/some/path id=b@d
CIB has been upgraded to the latest schema version.
Error: invalid alert-id 'b@d', '@' is not a valid character for a alert-id
[root@rh73-node1:~]# pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.5"

Comment 4 Ivan Devat 2017-02-20 07:40:37 UTC
Setup:

> Have cluster with enough old pacemaker.

[vm-rhel72-1 ~] $ rpm -q pacemaker
pacemaker-1.1.13-10.el7.x86_64
[vm-rhel72-1 ~] $ pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"

[vm-rhel72-3 ~] $ rpm -q pacemaker
pacemaker-1.1.13-10.el7.x86_64
[vm-rhel72-3 ~] $ pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"

> Upgrade pacemaker on one node.

[vm-rhel72-1 ~] $ rpm -q pacemaker
pacemaker-1.1.16-2.el7.x86_64


After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.156-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs alert create path=/some/path
Error: Upgrading of CIB to the latest schema failed: Call cib_upgrade failed (-62): Timer expired

[vm-rhel72-1 ~] $ pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"
[vm-rhel72-3 ~] $ pcs cluster cib | tr ' ' '\n' | grep validate-with
validate-with="pacemaker-2.3"

Comment 7 errata-xmlrpc 2017-08-01 18:24:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1958