Red Hat Bugzilla – Bug 1475416
[RFE]Ensure that the controller node is rolled back into the cluster if minor update fails
Last modified: 2017-09-11 21:59:14 EDT
Description of problem:
While performing the minor update, the controller nodes are taken out of the cluster one at a time stopping the cluster on each one of them using 'pcs cluster stop'.
If for some reason, the minor update fails on a controller node while its not the part of the cluster, the cluster is not started automatically on the controller node. Its shown as 'OFFLINE' is 'pcs status'. We will have to manually start the cluster on that controller node using 'pcs cluster start'.
This is okay if when the next time the update chooses the same controller node for the update.
If the update procedure chooses any other controller which is 'ONLINE', the update fails with this error: "Error: stopping the node will cause a loss of quorum, use --force to override". I faced this issue once.
It would be good that if the update fails on a controller node while its taken out of cluster, a rollback procedure should run which runs 'pcs cluster start' on the controller to bring it back in the cluster.
Version-Release number of selected component (if applicable):
To fix this issue, you need to bring the controller node back in the cluster by running 'pcs cluster start' and re-run the deployment.