Description of problem: ======================= Attempt to cancel interactive update fails: openstack overcloud update stack -i overcloud Started Mistral Workflow tripleo.package_update.v1.package_update_plan. Execution ID: 7110c476-2c50-4fb8-a654-0b0ac90b4a66 Waiting for messages on queue 'c7e482e5-145e-4166-9a8b-679815d0b052' with no timeout. WAITING not_started: [u'compute-2', u'controller-2', u'controller-1', u'controller-0', u'ceph-1'] on_breakpoint: [u'compute-0', u'ceph-2', u'ceph-0'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear ee4f677d-29aa-42fc-aaa9-2cd1d378977d), no=cancel update, C-c=quit interactive mode: compute-* WAITING on_breakpoint: [u'compute-2', u'ceph-2', u'controller-2', u'controller-1', u'compute-0', u'controller-0', u'ceph-1', u'ceph-0'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear ee4f677d-29aa-42fc-aaa9-2cd1d378977d), no=cancel update, C-c=quit interactive mode: compute-* WAITING on_breakpoint: [u'compute-2', u'ceph-2', u'controller-2', u'controller-1', u'compute-0', u'controller-0', u'ceph-1', u'ceph-0'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear ee4f677d-29aa-42fc-aaa9-2cd1d378977d), no=cancel update, C-c=quit interactive mode: ceph-1 WAITING completed: [u'ceph-1'] on_breakpoint: [u'ceph-2', u'controller-2', u'controller-1', u'compute-0', u'controller-0', u'compute-2', u'ceph-0'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear ee4f677d-29aa-42fc-aaa9-2cd1d378977d), no=cancel update, C-c=quit interactive mode: WAITING completed: [u'ceph-0', u'ceph-1'] on_breakpoint: [u'ceph-2', u'controller-2', u'controller-1', u'compute-0', u'controller-0', u'compute-2'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear 00986c9c-3c51-464d-a814-b7d37927132b), no=cancel update, C-c=quit interactive mode: ceph-2 WAITING completed: [u'ceph-0', u'ceph-2', u'ceph-1'] on_breakpoint: [u'compute-2', u'controller-2', u'compute-0', u'controller-0', u'controller-1'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear 117c9587-9048-415a-bed8-4d98416b4e42), no=cancel update, C-c=quit interactive mode: no canceling update, doing rollback failed to remove breakpoint on compute-2: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [00986c9c-3c51-464d-a814-b7d37927132b] Stack "overcloud-Compute-yxwek6sqksrw-2-vdm7yd45llsw" [2892260b-d460-4dbd-b0c6-db04b3f56e88] failed to remove breakpoint on controller-2: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [7d511f8a-0599-44c4-a80d-f6ef5e7d7891] Stack "overcloud-Controller-e5mjlgrmtkdz-2-oeuh7feoyysr" [e4137ee4-193a-4d45-94c5-dd20f5f5b78c] failed to remove breakpoint on controller-0: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [543dae44-7779-4130-af7a-0117836b0b2a] Stack "overcloud-Controller-e5mjlgrmtkdz-0-5d2bx4yxppwl" [4d52f92a-292f-406d-b1d1-6e47d4fa192e] failed to remove breakpoint on controller-1: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" [117c9587-9048-415a-bed8-4d98416b4e42] Stack "overcloud-Controller-e5mjlgrmtkdz-1-zqbobbye2nlz" [dcac9068-5241-42ba-9167-9bb9e301e73b] WAITING on_breakpoint: [u'compute-2', u'ceph-2', u'controller-2', u'controller-1', u'compute-0', u'controller-0', u'ceph-1', u'ceph-0'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear ee4f677d-29aa-42fc-aaa9-2cd1d378977d), no=cancel update, C-c=quit interactive mode: no canceling update, doing rollback ERROR: Cancelling update when stack is ROLLBACK_IN_PROGRESS is not supported. Patch from bz 1414779 already applied. Version-Release number of selected component (if applicable): ------------------------------------------------------------- puppet-mistral-10.3.0-0.20170213130808.a6e8ebc.el7ost.noarch openstack-mistral-api-4.0.0-3.el7ost.noarch python-mistralclient-3.0.0-0.20170208192040.a7bf138.el7ost.noarch openstack-mistral-engine-4.0.0-3.el7ost.noarch openstack-mistral-executor-4.0.0-3.el7ost.noarch python-openstack-mistral-4.0.0-3.el7ost.noarch openstack-mistral-common-4.0.0-3.el7ost.noarch python-tripleoclient-6.1.0-1.el7ost.noarch openstack-tripleo-common-6.0.1-0.20170307123121.2c9fa69.el7ost.noarch python-heatclient-1.8.0-0.20170208192329.17dd306.el7ost.noarch openstack-heat-engine-8.0.0-4.el7ost.noarch puppet-heat-10.3.0-0.20170210225040.920f4f9.el7ost.noarch openstack-heat-common-8.0.0-4.el7ost.noarch openstack-tripleo-heat-templates-6.0.0-0.20170307170102.3134785.0rc2.el7ost.noarch python-heat-agent-1.0.0-0.20170224185834.8e6dbb1.el7ost.noarch openstack-heat-api-8.0.0-4.el7ost.noarch openstack-heat-api-cfn-8.0.0-4.el7ost.noarch heat-cfntools-1.3.0-2.el7ost.noarch Steps to Reproduce: ------------------- 1. Install RHOS-11 2. Setup latest repos on oc and uc 3. Update uc 4. Start interactive update: openstack overcloud update stack -i overcloud 5. After few nodes got updated choose to cancel update Actual results: --------------- Failed to cancel update Expected results: ----------------- Update is cancelled Additional info: ---------------- Virtual setup: 3controllers + 2computes + 3ceph
TBH we should probably remove the "cancel update" option from tripleo-client. There's no way to just stop an update in Heat. The stack-cancel-update command used by tripleo-client here *rolls back* an update, and it should *never* be used with TripleO because the TripleO templates are not designed in such a way as to leave the stack in a good state after a rollback. It doesn't make sense to have a UI that encourages users to do rollbacks. The error messages are probably caused by a timing improvement in Heat. I'm not totally sure why we were trying to clear the hooks at that point, but there's no correct way to implement this feature anyway. Starting with Newton Heat can now start a rollback of its in-progress nested stacks immediately when you start a rollback of the parent state. Previously I think the child stacks would keep updating? I can't remember the details. It's possible/likely that this rollback operation has become more destructive for TripleO now that Heat does it correctly.
I agree, moving to UI team to remove this from our client commands given lacking support for update/upgrade rollback. We have RFEs for this feature in coming releases.
*** Bug 1429342 has been marked as a duplicate of this bug. ***
Ideally removed in OSP11 since it is not expected to be working. I will let UI/CLI team to decide whether it is blocker or not (I don't think so).
Thanks to Brad, the command has been removed in Pike/OSP12. From what I can tell, it wouldn't be difficult to backport (v. minor conflicts in both patches) though there might be concerns about suddenly removing a command in existing releases without a deprecation period. However as indicated in comment 2 this likely never worked and has the potential for data loss, so it still sounds worthwhile doing.
The two patches documented in the external tracker were backported to stable/ocata in https://review.openstack.org/#/c/481679/ and https://review.openstack.org/#/c/481679/ and already verified as part of bug 1427153. https://review.openstack.org/#/c/481679/3/tripleo_common/_stack_update.py clearly removes the "no=cancel update" option. Closing.