Description of problem: When 'cibadmin --upgrade --force' is run on a node which is not currently the DC and CIB schema already is the latest available, the command exits with a timeout error. When the same is run on the DC or the schema is not the latest, everything works as expected. Version-Release number of selected component (if applicable): pacemaker-2.0.0-1.fc29.1.x86_64 How reproducible: always, easily Steps to Reproduce: [root@fed28-node1:~]# crm_mon -1 | grep DC Current DC: fed28-node2 (version 2.0.0-1.fc29.1-8cf3fe749e) - partition with quorum [root@fed28-node1:~]# cibadmin --query | head -n 1 <cib crm_feature_set="3.1.0" validate-with="pacemaker-3.1" epoch="12" num_updates="0" admin_epoch="2" cib-last-written="Thu Aug 2 12:12:50 2018" update-origin="fed28-node1" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2"> [root@fed28-node1:~]# cibadmin --upgrade --force Call cib_upgrade failed (-62): Timer expired [root@fed28-node1:~]# echo $? 124 [root@fed28-node1:~]# cibadmin --query | head -n 1 <cib crm_feature_set="3.1.0" validate-with="pacemaker-3.1" epoch="12" num_updates="0" admin_epoch="2" cib-last-written="Thu Aug 2 12:12:50 2018" update-origin="fed28-node1" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2"> Actual results: [root@fed28-node1:~]# cibadmin --upgrade --force Call cib_upgrade failed (-62): Timer expired [root@fed28-node1:~]# echo $? 124 Expected results: It should not matter on which node the command was run, the result should be the same. Results from DC with pacemaker-2.0.0-1.fc29.1.x86_64: [root@fed28-node2:~]# cibadmin --upgrade --force Call cib_upgrade failed (-211): Schema is already the latest available [root@fed28-node2:~]# echo $? 1 Additional info: pacemaker.log on node1: fed28-node1 pacemaker-based [626] (cib_process_request) info: Forwarding cib_upgrade operation for section 'all' to all (origin=local/cibadmin/2) pacemaker.log on node2: fed28-node2 pacemaker-based [598] (cib_process_request) warning: Completed cib_upgrade operation for section 'all': Schema is already the latest available (rc=-211, origin=fed28-node1/cibadmin/2, version=2.12.0)
The exit status of 1 on the DC is also a bug. It should be 0, and the message should be "Upgrade unnecessary: Schema is already the latest available". We can include that issue in this bz, too.
Upon investigation, I found the timeout issue has existed since at least upstream 1.1.11 (I reproduced as far back as 1.1.16) and possibly always. When a non-DC node gets an upgrade request, it forwards it to the DC. When the DC gets it, if an upgrade is required, it resends the request to all nodes asking for an upgrade to a specific version, and all the nodes perform that upgrade locally, notifying any clients (such as cibadmin) of the result. The problem is that if an upgrade is not required, the DC does not do anything further, so the non-DC nodes never do anything either, and the client doesn't get any notification. We will need to change it such that the DC always sends a result to at least the requesting node, even if an upgrade is not required. This will only work once all cluster nodes in a cluster are upgraded to a pacemaker version with the fix (that is, in a rolling upgrade, the fix will not take effect until all nodes are upgraded).
The timeout issue is fixed upstream by commit 1f05f5e2 and the exit status issue by commit f5e936fb. Re-assigning to Jan Pokorný for release