Description of problem: Overcloud update attempts will fail when taking longer than 90mins; this can generally be observed by investigating the heat events list: | overcloud | e661c9a8-b8f7-470e-b7a8-1fecbb79b23f | Stack UPDATE started | UPDATE_IN_PROGRESS | 2015-11-27T18:27:38Z | ControllerNodesPostDeployment | d951ac6f-a804-4d14-a13c-c54ff56257c9 | UPDATE aborted | UPDATE_FAILED | 2015-11-27T19:57:46Z None of the resources will be found in UPDATE_FAILED; we used to set a default timeout of 240mins [1] for new deployments but not for updates. I don't think updates should take a shorter amount of time than new deployments as they can indeed take an even longer time due to yum upgrade / pcmk maintenance steps. 1. https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L725 Version-Release number of selected component (if applicable): python-rdomanager-oscplugin-0.0.10-19.el7ost.noarch
I've put a fix for this upstream.
dougal, thanks for the patch. i'm assigning this one to you :)
the assignee for this should still be dougal
Verified: Environment: openstack-tripleo-common-0.0.1.dev6-5.git49b57eb.el7ost.noarch Was able to complete an update from 7.1 to 7.2 that took more than 90 minutes.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2651