Bug 1475416 - [RFE]Ensure that the controller node is rolled back into the cluster if minor update fails
[RFE]Ensure that the controller node is rolled back into the cluster if minor...
Status: NEW
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
10.0 (Newton)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Carlos Camacho
Amit Ugol
: FutureFeature, Triaged
Depends On:
  Show dependency treegraph
Reported: 2017-07-26 11:41 EDT by cshastri
Modified: 2017-09-11 21:59 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description cshastri 2017-07-26 11:41:19 EDT
Description of problem:
While performing the minor update, the controller nodes are taken out of the cluster one at a time stopping the cluster on each one of them using 'pcs cluster stop'. 

If for some reason, the minor update fails on a controller node while its not the part of the cluster, the cluster is not started automatically on the controller node. Its shown as 'OFFLINE' is 'pcs status'. We will have to manually start the cluster on that controller node using 'pcs cluster start'.

This is okay if when the next time the update chooses the same controller node for the update. 

If the update procedure chooses any other controller which is 'ONLINE', the update fails with this error: "Error: stopping the node will cause a loss of quorum, use --force to override". I faced this issue once.

It would be good that if the update fails on a controller node while its taken out of cluster, a rollback procedure should run which runs 'pcs cluster start' on the controller to bring it back in the cluster.

Version-Release number of selected component (if applicable):

How reproducible:

Additional info:

To fix this issue, you need to bring the controller node back in the cluster by running 'pcs cluster start' and re-run the deployment.

Note You need to log in before you can comment on or make changes to this bug.