Hide Forgot
Description of problem: The "no" option during an "openstack overcloud update stack" breakpoint seems to be broken. Breakpoint reached, continue? Regexp or Enter=proceed (will clear 12300056-ffff-dddd-1111-12345678ffff), no=cancel update, C-c=quit interactive mode: When "no" is selected a stack roll-back occurs and this actually causes all overcloud nodes to run yum updates in parallel (assuming patches are available). All controller nodes will do a pcs cluster stop at about the same time and can cause fencing if stonith is enabled. Obviously this is not the desired behavior. Version-Release number of selected component (if applicable): Current OSP 9 bits How reproducible: 100% so far (once for a customer, once in a lab for me) Steps to Reproduce: 1. Deploy OSP 9 via Director 2. Ensure nodes are registered or have update repos configured. 3. Run the patching procedure openstack overcloud update stack overcloud -i \ --templates -e [env file] -e [more env files] \ .... 4. At first breakpoint cancel the update via "no" on_breakpoint: [u'mflusche-osd001', u'mflusche-osd000', u'mflusche-osd002', u'mflusche-compute001', u'mflusche-compute000'] Breakpoint reached, continue? Regexp or Enter=proceed (will clear fafe8cc9-e4d4-46d9-8dc1-57b62cf73b58), no=cancel update, C-c=quit interactive mode: no canceling update, doing rollback canceling update 5. login to overcloud nodes and observe the behavior. journalctl -u os-collect-config -f Oct 25 23:01:12 mflusche-control000.flusche.co os-collect-config[3848]: [2016-10-25 23:01:12,543] (heat-config) [DEBUG] Running /var/lib/heat-config/hooks/script < /var/lib/heat-config/deployed/feffcf44-753b-4eaf-9cd0-7b9abd0272ff.json Oct 25 23:07:10 mflusche-control000 yum[17346]: Updated: 1:openssl-libs-1.0.1e-51.el7_2.7.x86_64 Oct 25 23:07:10 mflusche-control000 yum[17346]: Updated: systemd-libs-219-19.el7_2.13.x86_64 Oct 25 23:07:10 mflusche-control000 yum[17346]: Updated: 1:librados2-0.94.9-3.el7cp.x86_64 ... tail -f /var/log/yum.log monitor on controllers: pcs status Actual results: the "no" options during a breakpoint seems to cause a parallel patch update on all overcloud nodes. Expected results: cancel update operation. Additional info:
Hi, this is still happening see https://bugzilla.redhat.com/show_bug.cgi?id=1613063 for more information.
I'm not sure why we ever allowed the user to cancel an update, because doing a rollback has never been safe in TripleO. It wasn't until Queens (OSP13) that Heat offered a way for users to cancel a stack update without triggering a rollback: https://bugs.launchpad.net/heat/+bug/1709041 The code to cancel an update was removed from tripleo-common in Pike and backported to Ocata: https://review.openstack.org/#/q/I752e061979d667c1fb2b115c1a7339002e1824d5 So OSP 10 and earlier are presumably still affected, which is what the testing discussed above appears to show. (Ironically, it would be a useful thing to add back in now that we can cancel without triggering a rollback, as long as we did that.)
Closing as a wontfix as we have provided a way to cancel in Queens and it is unlikely that we will be able to address this to any of the older versions prior to their EOL.