"openstack overcloud delete overcloud" fails with "Error occurred during stack delete None" Environment: python-heat-agent-1.7.1-0.20180907213355.476aae2.el7ost.noarch python-heat-agent-apply-config-1.7.1-0.20180907213355.476aae2.el7ost.noarch openstack-heat-api-12.0.0-0.20180604085325.7d878a8.el7ost.noarch ansible-pacemaker-1.0.4-0.20180827141254.0e4d7c0.el7ost.noarch python2-mistral-lib-1.0.0-0.20180821152751.d1ccfd0.el7ost.noarch python2-heatclient-1.16.1-0.20180810081134.b5f3d34.el7ost.noarch python-heat-agent-json-file-1.7.1-0.20180907213355.476aae2.el7ost.noarch python-heat-agent-hiera-1.7.1-0.20180907213355.476aae2.el7ost.noarch openstack-tripleo-heat-templates-9.0.0-0.20180919080941.0rc1.0rc1.el7ost.noarch instack-undercloud-9.3.1-0.20180918171407.b0205ab.el7ost.noarch openstack-heat-common-12.0.0-0.20180604085325.7d878a8.el7ost.noarch python-tripleoclient-heat-installer-10.5.1-0.20180906012842.el7ost.noarch ansible-role-redhat-subscription-1.0.1-1.el7ost.noarch puppet-heat-13.3.1-0.20180831195745.28088f9.el7ost.noarch ansible-role-tripleo-modify-image-1.0.1-0.20180915144057.cb535e9.el7ost.noarch python-heat-agent-docker-cmd-1.7.1-0.20180907213355.476aae2.el7ost.noarch python-heat-agent-ansible-1.7.1-0.20180907213355.476aae2.el7ost.noarch python2-mistralclient-3.7.0-0.20180810140142.f0ee48f.el7ost.noarch openstack-heat-engine-12.0.0-0.20180604085325.7d878a8.el7ost.noarch ansible-2.5.7-1.el7ae.noarch ansible-role-container-registry-1.0.1-0.20180907005806.b33f893.el7ost.noarch puppet-mistral-13.3.1-0.20180831192741.bb0e35e.el7ost.noarch python-heat-agent-puppet-1.7.1-0.20180907213355.476aae2.el7ost.noarch openstack-heat-agents-1.7.1-0.20180907213355.476aae2.el7ost.noarch openstack-heat-monolith-12.0.0-0.20180604085325.7d878a8.el7ost.noarch ansible-tripleo-ipsec-9.0.1-0.20180827143021.d2b9234.el7ost.noarch heat-cfntools-1.3.0-2.el7ost.noarch Steps to reproduce: Try to delete overcloud: (undercloud) [stack@undercloud ~]$ openstack overcloud delete overcloud Are you sure you want to delete this overcloud [y/N]? y Deleting stack overcloud... Waiting for messages on queue 'tripleo' with no timeout. Error occurred during stack delete None (undercloud) [stack@undercloud ~]$ Expected result: Successful deletion of overcloud.
Note that despite the error, the deletion does start: (undercloud) [stack@undercloud ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | 4ae9d0a5-0f84-4754-951e-50049c1bbb1a | overcloud | abadfa7e2fcc4f5489c4e8ac2d9b0a0d | CREATE_COMPLETE | 2018-10-03T15:25:55Z | None | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ (undercloud) [stack@undercloud ~]$ openstack overcloud status --plan overcloud +-----------+---------------------+---------------------+-------------------+ | Plan Name | Created | Updated | Deployment Status | +-----------+---------------------+---------------------+-------------------+ | overcloud | 2018-10-03 15:51:58 | 2018-10-03 15:51:58 | DEPLOY_SUCCESS | +-----------+---------------------+---------------------+-------------------+ (undercloud) [stack@undercloud ~]$ openstack overcloud delete overcloud Are you sure you want to delete this overcloud [y/N]? y Deleting stack overcloud... Waiting for messages on queue 'tripleo' with no timeout. Error occurred during stack delete None (undercloud) [stack@undercloud ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+ | 4ae9d0a5-0f84-4754-951e-50049c1bbb1a | overcloud | abadfa7e2fcc4f5489c4e8ac2d9b0a0d | DELETE_IN_PROGRESS | 2018-10-03T15:25:55Z | 2018-10-03T17:13:51Z | +--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+
I haven't been able to reproduce this, and the only obvious error I see in the mistral logs is https://bugzilla.redhat.com/show_bug.cgi?id=1628319. Can you try restarting all of the mistral containers on the undercloud before running the overcloud delete and see if you can still reproduce?
sorry, should have said "the only obvious error I see IS in the mistral logs..."
I did actually see this today, and I happened to have two stacks deployed. After the first stack delete gave "Error occurred during stack delete None", I restarted the undercloud's mistral containers. Immediately after that, deleting the 2nd stack still gave the message, so likely not actually related then.
Created attachment 1490744 [details] debug output
In stack_management.py, there's a call to base.wait_for_messages which is checking the status of the mistral execution of tripleo.stack.v1._heat_stacks_list. At the time that this is called the status is RUNNING, and not SUCCESS. The status will change to SUCCESS within ~5-10s in the tests I've done. But because it isn't yet SUCCESS, stack_management.delete_stack raises InvalidConfiguration[0] back to overcloud_delete._stack_delete, which raises the CommandError being sent to stdout. The payload[message] is None, because there is no message - the workflow is still RUNNING. Attaching a log with output of some debugging. Unfortunately I wasn't able to pin down a fix, and I'll be out for the next few weeks, hopefully this data helps. [0] https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/workflows/stack_management.py#L44
this is an issue in how the messages are being handled coming back from the workflow in tripleoclient. Could be a race condition or another status besides 'SUCCESS' needs to be handled.
Any news about that issue?
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0446