Bug 1389115 - cancelling "openstack overcloud update stack" during a breakpoint behaves badly
Summary: cancelling "openstack overcloud update stack" during a breakpoint behaves badly
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: unspecified
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 9.0 (Mitaka)
Assignee: Adriano Petrich
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-26 21:42 UTC by Matt Flusche
Modified: 2022-09-26 12:22 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 21:39:12 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1613063 0 high CLOSED Entering 'no' to exit the minor update of the overcloud appears to actually complete the update 2021-02-22 00:41:40 UTC
Red Hat Issue Tracker OSP-18933 0 None None None 2022-09-26 12:22:24 UTC

Internal Links: 1613063

Description Matt Flusche 2016-10-26 21:42:41 UTC
Description of problem:

The "no" option during an "openstack overcloud update stack" breakpoint seems to be broken.

  Breakpoint reached, continue? Regexp or Enter=proceed (will clear 12300056-ffff-dddd-1111-12345678ffff), no=cancel update, C-c=quit interactive mode:

When "no" is selected a stack roll-back occurs and this actually causes all overcloud nodes to run yum updates in parallel (assuming patches are available).  All controller nodes will do a pcs cluster stop at about the same time and can cause fencing if stonith is enabled.  Obviously this is not the desired behavior.

Version-Release number of selected component (if applicable):
Current OSP 9 bits

How reproducible:
100% so far (once for a customer, once in a lab for me)

Steps to Reproduce:
1. Deploy OSP 9 via Director
2. Ensure nodes are registered or have update repos configured. 
3. Run the patching procedure
  openstack overcloud update stack overcloud -i \
  --templates -e [env file] -e [more env files] \
  ....

4. At first breakpoint cancel the update via "no"
  on_breakpoint: [u'mflusche-osd001', u'mflusche-osd000', u'mflusche-osd002', u'mflusche-compute001', u'mflusche-compute000']
Breakpoint reached, continue? Regexp or Enter=proceed (will clear fafe8cc9-e4d4-46d9-8dc1-57b62cf73b58), no=cancel update, C-c=quit interactive mode: no
canceling update, doing rollback
canceling update

5. login to overcloud nodes and observe the behavior.

  journalctl -u os-collect-config -f 
Oct 25 23:01:12 mflusche-control000.flusche.co os-collect-config[3848]: [2016-10-25 23:01:12,543] (heat-config) [DEBUG] Running /var/lib/heat-config/hooks/script < /var/lib/heat-config/deployed/feffcf44-753b-4eaf-9cd0-7b9abd0272ff.json
Oct 25 23:07:10 mflusche-control000 yum[17346]: Updated: 1:openssl-libs-1.0.1e-51.el7_2.7.x86_64
Oct 25 23:07:10 mflusche-control000 yum[17346]: Updated: systemd-libs-219-19.el7_2.13.x86_64
Oct 25 23:07:10 mflusche-control000 yum[17346]: Updated: 1:librados2-0.94.9-3.el7cp.x86_64
...

  tail -f /var/log/yum.log

  monitor on controllers: pcs status

Actual results:

the "no" options during a breakpoint seems to cause a parallel patch update on all overcloud nodes.

Expected results:

cancel update operation.

Additional info:

Comment 4 Sofer Athlan-Guyot 2018-09-04 12:57:19 UTC
Hi,

this is still happening see https://bugzilla.redhat.com/show_bug.cgi?id=1613063 for more information.

Comment 7 Zane Bitter 2018-10-22 16:35:05 UTC
I'm not sure why we ever allowed the user to cancel an update, because doing a rollback has never been safe in TripleO.

It wasn't until Queens (OSP13) that Heat offered a way for users to cancel a stack update without triggering a rollback: https://bugs.launchpad.net/heat/+bug/1709041

The code to cancel an update was removed from tripleo-common in Pike and backported to Ocata:

https://review.openstack.org/#/q/I752e061979d667c1fb2b115c1a7339002e1824d5

So OSP 10 and earlier are presumably still affected, which is what the testing discussed above appears to show.

(Ironically, it would be a useful thing to add back in now that we can cancel without triggering a rollback, as long as we did that.)

Comment 8 Alex Schultz 2018-11-19 21:39:12 UTC
Closing as a wontfix as we have provided a way to cancel in Queens and it is unlikely that we will be able to address this to any of the older versions prior to their EOL.


Note You need to log in before you can comment on or make changes to this bug.