Bug 1434437 - [UPDATES] Reconnect to interrupted udpates
Summary: [UPDATES] Reconnect to interrupted udpates
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Adriano Petrich
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-21 14:01 UTC by Yurii Prokulevych
Modified: 2021-03-11 15:04 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-29 15:37:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yurii Prokulevych 2017-03-21 14:01:05 UTC
Description of problem:
-----------------------
If minor update of RHOS-11 is cancelled there is no easy way to restart the update except restarting heat-engine.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------
puppet-mistral-10.3.0-0.20170213130808.a6e8ebc.el7ost.noarch
openstack-mistral-api-4.0.0-3.el7ost.noarch
python-mistralclient-3.0.0-0.20170208192040.a7bf138.el7ost.noarch
openstack-mistral-engine-4.0.0-3.el7ost.noarch
openstack-mistral-executor-4.0.0-3.el7ost.noarch
python-openstack-mistral-4.0.0-3.el7ost.noarch
openstack-mistral-common-4.0.0-3.el7ost.noarch

python-heatclient-1.8.0-0.20170208192329.17dd306.el7ost.noarch
openstack-heat-engine-8.0.0-4.el7ost.noarch
puppet-heat-10.3.0-0.20170210225040.920f4f9.el7ost.noarch
openstack-heat-common-8.0.0-4.el7ost.noarch
openstack-tripleo-heat-templates-6.0.0-0.20170307170102.3134785.0rc2.el7ost.noarch
python-heat-agent-1.0.0-0.20170224185834.8e6dbb1.el7ost.noarch
openstack-heat-api-8.0.0-4.el7ost.noarch
openstack-heat-api-cfn-8.0.0-4.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch

Steps to Reproduce:
1. Deploy RHOS-11 (2017-03-14.2)
2. Setup lates repos on uc and oc (2017-03-15.2)
3. Update undercloud
4. Start interactive overcloud update
    openstack overcloud update stack -i overcloud
5. After updating at least one node hit CTRL-C
6. Try to re-run update
    ERROR: Stack overcloud already has an action (UPDATE) in progress.

Actual results:
---------------
Update cannot be re-run

Expected results:
-----------------
There is a way to reconnect to an existing update


Additional info:
----------------
Virtual setup: 3controllers + 2computes + 3ceph

Comment 2 Brad P. Crochet 2017-03-21 19:17:52 UTC
One option is to abort the update:

openstack overcloud update abort overcloud

This does not block. You will need to monitor stack status until it returns to 'ROLLBACK_COMPLETE'.

I am still investigating continuing the update instead of aborting.

Comment 3 Brad P. Crochet 2017-03-22 13:06:02 UTC
Steps to continue an update:

1. openstack stack resource list -n 5 -f yaml --filter name=UpdateDeployment overcloud

This command will give you the resource that the hook is on. You will get output similar to this:

- physical_resource_id: 50da754b-2e09-45ff-8836-6a8b097337b6
  resource_name: UpdateDeployment
  resource_status: UPDATE_COMPLETE
  resource_type: OS::Heat::SoftwareDeployment
  stack_name: overcloud-Controller-cnu5246du7ej-0-qcu5iunuiqmm
  updated_time: '2017-03-22T12:53:27Z'
- physical_resource_id: 24718cee-63fb-4620-955a-20fca5678316
  resource_name: UpdateDeployment
  resource_status: UPDATE_COMPLETE
  resource_type: OS::Heat::SoftwareDeployment
  stack_name: overcloud-Compute-mwb5lla6twbn-0-tuysaeiyisoi
  updated_time: '2017-03-22T12:58:04Z'

2. openstack stack event list --resource UpdateDeployment overcloud-Controller-cnu5246du7ej-0-qcu5iunuiqmm
openstack stack event list --resource UpdateDeployment overcloud-Compute-mwb5lla6twbn-0-tuysaeiyisoi

The last few lines of the output from these commands (you will run this for every UpdateDeployment resource) will look similar to this:

2017-03-22 12:32:31Z [UpdateDeployment]: UPDATE_COMPLETE  UPDATE paused until Hook pre-update is cleared

If you see that, then you know the breakpoint has been reached, and needs to be cleared.

3. openstack stack hook clear overcloud-Controller-cnu5246du7ej-0-qcu5iunuiqmm UpdateDeployment

This, as you can guess, will clear the breakpoint and allow the update to continue. If you run the event list again, you will see:

2017-03-22 12:53:27Z [UpdateDeployment]: UPDATE_COMPLETE  Hook pre-update is cleared
2017-03-22 12:53:27Z [UpdateDeployment]: UPDATE_IN_PROGRESS  state changed
2017-03-22 12:54:21Z [UpdateDeployment]: SIGNAL_IN_PROGRESS  Signal: deployment 50da754b-2e09-45ff-8836-6a8b097337b6 succeeded
2017-03-22 12:54:22Z [UpdateDeployment]: UPDATE_COMPLETE  state changed

4. openstack stack list

Monitor the stack for completion.

Comment 4 Brad P. Crochet 2017-03-23 12:38:25 UTC
Based on comments by zaneb, abort should not be used. In fact, he advocates for complete removal of that command, and I agree.

Comment 5 Julie Pichon 2017-03-29 17:05:13 UTC
Assigning back after confirming with Brad, I hadn't noticed all your debugging work around this - thank you!

Comment 7 Red Hat Bugzilla Rules Engine 2017-08-16 12:15:00 UTC
This bugzilla has been removed from the release since it has not been Triaged, and needs to be reviewed for targeting another release.


Note You need to log in before you can comment on or make changes to this bug.