Bug 1611631 - 'cibadmin --upgrade' times out on non-DC nodes if schema is already the latest available
Summary: 'cibadmin --upgrade' times out on non-DC nodes if schema is already the lates...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: pacemaker
Version: rawhide
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Jan Pokorný [poki]
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-02 13:42 UTC by Tomas Jelinek
Modified: 2018-08-09 14:54 UTC (History)
5 users (show)

Fixed In Version: pacemaker-2.0.0-2.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-09 14:54:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Tomas Jelinek 2018-08-02 13:42:43 UTC
Description of problem:
When 'cibadmin --upgrade --force' is run on a node which is not currently the DC and CIB schema already is the latest available, the command exits with a timeout error. When the same is run on the DC or the schema is not the latest, everything works as expected.


Version-Release number of selected component (if applicable):
pacemaker-2.0.0-1.fc29.1.x86_64


How reproducible:
always, easily


Steps to Reproduce:
[root@fed28-node1:~]# crm_mon -1 | grep DC
Current DC: fed28-node2 (version 2.0.0-1.fc29.1-8cf3fe749e) - partition with quorum
[root@fed28-node1:~]# cibadmin --query | head -n 1
<cib crm_feature_set="3.1.0" validate-with="pacemaker-3.1" epoch="12" num_updates="0" admin_epoch="2" cib-last-written="Thu Aug  2 12:12:50 2018" update-origin="fed28-node1" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
[root@fed28-node1:~]# cibadmin --upgrade --force
Call cib_upgrade failed (-62): Timer expired
[root@fed28-node1:~]# echo $?
124
[root@fed28-node1:~]# cibadmin --query | head -n 1
<cib crm_feature_set="3.1.0" validate-with="pacemaker-3.1" epoch="12" num_updates="0" admin_epoch="2" cib-last-written="Thu Aug  2 12:12:50 2018" update-origin="fed28-node1" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">


Actual results:
[root@fed28-node1:~]# cibadmin --upgrade --force
Call cib_upgrade failed (-62): Timer expired
[root@fed28-node1:~]# echo $?
124


Expected results:
It should not matter on which node the command was run, the result should be the same.
Results from DC with pacemaker-2.0.0-1.fc29.1.x86_64:
[root@fed28-node2:~]# cibadmin --upgrade --force
Call cib_upgrade failed (-211): Schema is already the latest available
[root@fed28-node2:~]# echo $?
1


Additional info:
pacemaker.log on node1:
fed28-node1 pacemaker-based     [626] (cib_process_request)     info: Forwarding cib_upgrade operation for section 'all' to all (origin=local/cibadmin/2)

pacemaker.log on node2:
fed28-node2 pacemaker-based     [598] (cib_process_request)     warning: Completed cib_upgrade operation for section 'all': Schema is already the latest available (rc=-211, origin=fed28-node1/cibadmin/2, version=2.12.0)

Comment 1 Ken Gaillot 2018-08-02 17:25:26 UTC
The exit status of 1 on the DC is also a bug. It should be 0, and the message should be "Upgrade unnecessary: Schema is already the latest available". We can include that issue in this bz, too.

Comment 2 Ken Gaillot 2018-08-08 15:52:28 UTC
Upon investigation, I found the timeout issue has existed since at least upstream 1.1.11 (I reproduced as far back as 1.1.16) and possibly always.

When a non-DC node gets an upgrade request, it forwards it to the DC. When the DC gets it, if an upgrade is required, it resends the request to all nodes asking for an upgrade to a specific version, and all the nodes perform that upgrade locally, notifying any clients (such as cibadmin) of the result.

The problem is that if an upgrade is not required, the DC does not do anything further, so the non-DC nodes never do anything either, and the client doesn't get any notification.

We will need to change it such that the DC always sends a result to at least the requesting node, even if an upgrade is not required. This will only work once all cluster nodes in a cluster are upgraded to a pacemaker version with the fix (that is, in a rolling upgrade, the fix will not take effect until all nodes are upgraded).

Comment 3 Ken Gaillot 2018-08-09 14:01:45 UTC
The timeout issue is fixed upstream by commit 1f05f5e2 and the exit status issue by commit f5e936fb.

Re-assigning to Jan Pokorný for release


Note You need to log in before you can comment on or make changes to this bug.