Description of problem: ----------------------- If minor update of RHOS-11 is cancelled there is no easy way to restart the update except restarting heat-engine. Version-Release number of selected component (if applicable): -------------------------------------------------------------- puppet-mistral-10.3.0-0.20170213130808.a6e8ebc.el7ost.noarch openstack-mistral-api-4.0.0-3.el7ost.noarch python-mistralclient-3.0.0-0.20170208192040.a7bf138.el7ost.noarch openstack-mistral-engine-4.0.0-3.el7ost.noarch openstack-mistral-executor-4.0.0-3.el7ost.noarch python-openstack-mistral-4.0.0-3.el7ost.noarch openstack-mistral-common-4.0.0-3.el7ost.noarch python-heatclient-1.8.0-0.20170208192329.17dd306.el7ost.noarch openstack-heat-engine-8.0.0-4.el7ost.noarch puppet-heat-10.3.0-0.20170210225040.920f4f9.el7ost.noarch openstack-heat-common-8.0.0-4.el7ost.noarch openstack-tripleo-heat-templates-6.0.0-0.20170307170102.3134785.0rc2.el7ost.noarch python-heat-agent-1.0.0-0.20170224185834.8e6dbb1.el7ost.noarch openstack-heat-api-8.0.0-4.el7ost.noarch openstack-heat-api-cfn-8.0.0-4.el7ost.noarch heat-cfntools-1.3.0-2.el7ost.noarch Steps to Reproduce: 1. Deploy RHOS-11 (2017-03-14.2) 2. Setup lates repos on uc and oc (2017-03-15.2) 3. Update undercloud 4. Start interactive overcloud update openstack overcloud update stack -i overcloud 5. After updating at least one node hit CTRL-C 6. Try to re-run update ERROR: Stack overcloud already has an action (UPDATE) in progress. Actual results: --------------- Update cannot be re-run Expected results: ----------------- There is a way to reconnect to an existing update Additional info: ---------------- Virtual setup: 3controllers + 2computes + 3ceph
One option is to abort the update: openstack overcloud update abort overcloud This does not block. You will need to monitor stack status until it returns to 'ROLLBACK_COMPLETE'. I am still investigating continuing the update instead of aborting.
Steps to continue an update: 1. openstack stack resource list -n 5 -f yaml --filter name=UpdateDeployment overcloud This command will give you the resource that the hook is on. You will get output similar to this: - physical_resource_id: 50da754b-2e09-45ff-8836-6a8b097337b6 resource_name: UpdateDeployment resource_status: UPDATE_COMPLETE resource_type: OS::Heat::SoftwareDeployment stack_name: overcloud-Controller-cnu5246du7ej-0-qcu5iunuiqmm updated_time: '2017-03-22T12:53:27Z' - physical_resource_id: 24718cee-63fb-4620-955a-20fca5678316 resource_name: UpdateDeployment resource_status: UPDATE_COMPLETE resource_type: OS::Heat::SoftwareDeployment stack_name: overcloud-Compute-mwb5lla6twbn-0-tuysaeiyisoi updated_time: '2017-03-22T12:58:04Z' 2. openstack stack event list --resource UpdateDeployment overcloud-Controller-cnu5246du7ej-0-qcu5iunuiqmm openstack stack event list --resource UpdateDeployment overcloud-Compute-mwb5lla6twbn-0-tuysaeiyisoi The last few lines of the output from these commands (you will run this for every UpdateDeployment resource) will look similar to this: 2017-03-22 12:32:31Z [UpdateDeployment]: UPDATE_COMPLETE UPDATE paused until Hook pre-update is cleared If you see that, then you know the breakpoint has been reached, and needs to be cleared. 3. openstack stack hook clear overcloud-Controller-cnu5246du7ej-0-qcu5iunuiqmm UpdateDeployment This, as you can guess, will clear the breakpoint and allow the update to continue. If you run the event list again, you will see: 2017-03-22 12:53:27Z [UpdateDeployment]: UPDATE_COMPLETE Hook pre-update is cleared 2017-03-22 12:53:27Z [UpdateDeployment]: UPDATE_IN_PROGRESS state changed 2017-03-22 12:54:21Z [UpdateDeployment]: SIGNAL_IN_PROGRESS Signal: deployment 50da754b-2e09-45ff-8836-6a8b097337b6 succeeded 2017-03-22 12:54:22Z [UpdateDeployment]: UPDATE_COMPLETE state changed 4. openstack stack list Monitor the stack for completion.
Based on comments by zaneb, abort should not be used. In fact, he advocates for complete removal of that command, and I agree.
Assigning back after confirming with Brad, I hadn't noticed all your debugging work around this - thank you!
This bugzilla has been removed from the release since it has not been Triaged, and needs to be reviewed for targeting another release.