Description of problem: I have an existing OSP10 overcloud stack and ran "openstack overcloud deploy" to modify it (testing Controller replacement). However, it comes to the part where it removes the existing plan and uploads the new plan via the tripleo.plan_management.v1.update_deployment_plan execution. The execution fails and "mistral execution-get" reports a traceback in the "State info" field. The traceback ends with the following: YaqlEvaluationException: Can not evaluate YAQL expression: $.status , data = {}; error: u'status' I'll attach a copy of the "mistral execution-get" output to this BZ Version-Release number of selected component (if applicable): OSP 10 openstack-mistral-api.noarch 3.0.0-0.20160929083341.c0a4501.el7ost openstack-mistral-common.noarch 3.0.0-0.20160929083341.c0a4501.el7ost openstack-mistral-engine.noarch 3.0.0-0.20160929083341.c0a4501.el7ost openstack-mistral-executor.noarch 3.0.0-0.20160929083341.c0a4501.el7ost puppet-mistral.noarch 9.4.0-0.20160928210707.3738e95.el7ost python-mistral-tests.noarch 3.0.0-0.20160929083341.c0a4501.el7ost python-mistralclient.noarch 2.1.1-0.20160915221844.a37b5fe.el7ost python-openstack-mistral.noarch 3.0.0-0.20160929083341.c0a4501.el7ost How reproducible: Always Steps to Reproduce: 1. Run "openstack overcloud deploy" and wait for a successful deployment 2. Update the Heat templates 3. Rerun "openstack overcloud deploy" and wait for the tripleo.plan_management.v1.update_deployment_plan to fail Actual results: Expected results: Additional info:
Created attachment 1207255 [details] Output from "mistral execution-get"
Had a look at mistral's executor.log and found this around the same time as the execution attempt: 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor [-] Failed to run action [action_ex_id=277e5da7-701e-4b15-add0-690e45d99e2a, action_cls='<class 'mistral.actions.action_factory.ZaqarAction'>', attributes='{u'client_method_name': u'queue_post'}', params='{u'queue_name': u'8d8c79ec-198e-4425-baf3-ceeccd5e7dfa', u'messages': {u'body': {u'type': u'tripleo.plan_management.v1.update_deployment_plan', u'payload': {u's tatus': u'SUCCESS', u'message': u'', u'execution': {u'input': {u'queue_name': u'8d8c79ec-198e-4425-baf3-ceeccd5e7dfa', u'container': u'overcloud '}, u'spec': {u'input': [u'container', {u'queue_name': u'tripleo'}], u'tasks': {u'update_plan': {u'name': u'update_plan', u'on-error': u'update_ plan_set_status_failed', u'on-success': u'ensure_passwords_exist', u'version': u'2.0', u'action': u'tripleo.plan.update container=<% $.container %>', u'type': u'direct'}, u'ensure_passwords_exist': {u'name': u'ensure_passwords_exist', u'on-error': u'ensure_passwords_exist_set_status_fail ed', u'on-success': u'process_templates', u'version': u'2.0', u'action': u'tripleo.parameters.generate_passwords container=<% $.container %>', u 'type': u'direct'}, u'set_status_success': {u'version': u'2.0', u'type': u'direct', u'name': u'set_status_success', u'publish': {u'status': u'SU CCESS', u'message': u'<% task(update_plan).result %>'}, u'on-success': u'notify_zaqar'}, u'update_plan_set_status_failed': {u'version': u'2.0', u'type': u'direct', u'name': u'update_plan_set_status_failed', u'publish': {u'status': u'FAILED', u'message': u'<% task(update_plan).result %>'} , u'on-success': u'notify_zaqar'}, u'process_templates_set_status_failed': {u'version': u'2.0', u'type': u'direct', u'name': u'process_templates _set_status_failed', u'publish': {u'status': u'FAILED', u'message': u'<% task(process_templates).result %>'}, u'on-success': u'notify_zaqar'}, u 'notify_zaqar': {u'retry': u'count=5 delay=1', u'name': u'notify_zaqar', u'version': u'2.0', u'action': u'zaqar.queue_post', u'input': {u'queue_ name': u'<% $.queue_name %>', u'messages': {u'body': {u'type': u'tripleo.plan_management.v1.update_deployment_plan', u'payload': {u'status': u'< % $.status %>', u'message': u"<% $.message or '' %>", u'execution': u'<% execution() %>'}}}}, u'type': u'direct'}, u'process_templates': {u'name ': u'process_templates', u'on-error': u'process_templates_set_status_failed', u'on-success': u'set_status_success', u'version': u'2.0', u'action ': u'tripleo.templates.process container=<% $.container %>', u'type': u'direct'}, u'ensure_passwords_exist_set_status_failed': {u'version': u'2. 0', u'type': u'direct', u'name': u'ensure_passwords_exist_set_status_failed', u'publish': {u'status': u'FAILED', u'message': u'<% task(ensure_pa sswords_exist).result %>'}, u'on-success': u'notify_zaqar'}}, u'name': u'update_deployment_plan', u'version': u'2.0'}, u'params': {}, u'id': u'94912446-aabf-43b4-a7ed-8622e68721db', u'name': u'tripleo.plan_management.v1.update_deployment_plan'}}}}}'] ZaqarAction.queue_post failed: <class 'requests.exceptions.ConnectionError'>: HTTPConnectionPool(host='192.168.201.5', port=8888): Max retries exceeded with url: /v2/queues/8d8c79ec-198e-4425-baf3-ceeccd5e7dfa/messages (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x6d90dd0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)) 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor Traceback (most recent call last): 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor File "/usr/lib/python2.7/site-packages/mistral/engine/default_executor.py", line 90, in run_action 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor result = action.run() 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 142, in run 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor (self.__class__.__name__, self.client_method_name, e_str) 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor ActionException: ZaqarAction.queue_post failed: <class 'requests.exceptions.ConnectionError'>: HTTPConnectionPool(host='192.168.201.5', port=8888): Max retries exceeded with url: /v2/queues/8d8c79ec-198e-4425-baf3-ceeccd5e7dfa/messages (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x6d90dd0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',)) 2016-10-05 01:33:47.596 1388 ERROR mistral.engine.default_executor It seems to be an issue with connecting to zaqar. I restarted both the mistral and zaqar services and the deployment went ahead with not problems.
Okay, this appears to be a problem with zaqar. I experienced a Broken Pipe error. I've noticed several Broken Pipe errors at certain points, which explains why I've been having other deployment issues with OSP 10. All the broken pipe errors follow the same format: 1) Waring: Failed to increment the message counter for queue 2) Broken Pipe For example: 2016-10-04 15:20:11.532 1404 WARNING zaqar.storage.mongodb.queues [(None,) fa56eb6a60fb4291a0242e482b8d7c0d e1b8d2ae29b947cc946383a45f5a680c - - -] [project_id:e1b8d2ae29b947cc946383a45f5a680c] Failed to increment the message counter for queue a585b0fc-8a93-441f-8e16-7c8218f34190 and pro ject e1b8d2ae29b947cc946383a45f5a680c 2016-10-04 15:20:15.405 1404 CRITICAL zaqar [(None,) fa56eb6a60fb4291a0242e482b8d7c0d e1b8d2ae29b947cc946383a45f5a680c - - -] [project_id:e1b8d2 ae29b947cc946383a45f5a680c] IOError: [Errno 32] Broken pipe 2016-10-04 15:20:15.405 1404 ERROR zaqar Traceback (most recent call last): 2016-10-04 15:20:15.405 1404 ERROR zaqar File "/usr/bin/zaqar-server", line 10, in <module> 2016-10-04 15:20:15.405 1404 ERROR zaqar sys.exit(run()) 2016-10-04 15:20:15.405 1404 ERROR zaqar File "/usr/lib/python2.7/site-packages/zaqar/common/cli.py", line 58, in _wrapper 2016-10-04 15:20:15.405 1404 ERROR zaqar _fail(1, ex) 2016-10-04 15:20:15.405 1404 ERROR zaqar File "/usr/lib/python2.7/site-packages/zaqar/common/cli.py", line 36, in _fail 2016-10-04 15:20:15.405 1404 ERROR zaqar print(ex, file=sys.stderr) 2016-10-04 15:20:15.405 1404 ERROR zaqar IOError: [Errno 32] Broken pipe 2016-10-04 15:20:15.405 1404 ERROR zaqar I'll attach the full zaqar.log file to this BZ.
Followed: https://github.com/systemd/systemd/issues/2236 performed systemctl restart systemd-journald No mistral errors in logs nor on overcloud deploy command
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2653