Bug 1647514
Summary: | [UPGRADES][14] Failed to re-run upgrade: Connection is already closed.: WebSocketConnectionClosedException | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Yurii Prokulevych <yprokule> |
Component: | openstack-tripleo-heat-templates | Assignee: | Emilien Macchi <emacchi> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Gurenko Alex <agurenko> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 14.0 (Rocky) | CC: | beth.white, hbrock, jfrancoa, jslagle, jstransk, mbracho, mburns, mcornea, therve |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-27 13:41:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yurii Prokulevych
2018-11-07 16:27:37 UTC
From /var/log/containers/mistral/engine.log: ... 2018-11-07 10:27:28.794 1 INFO workflow_trace [req-e0566c49-1913-41cf-b482-e3cee408f5d3 - - - - -] Workflow 'tripleo.package_update.v1.update_nodes' [RUNNING -> ERROR, msg=Failed to run task [error=Can not evalu ate YAQL expression [expression=execution().name, error=WorkflowExecution not found [id=bb5827dc-7643-49e4-85ec-f89a956c2a47], data={}], wf=tripleo.package_update.v1.update_nodes, task=send_message]: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 63, in run_task task.run() File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper result = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 390, in run self._run_new() File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper result = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 419, in _run_new self._schedule_actions() File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 483, in _schedule_actions input_dict = self._get_action_input() File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper result = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 514, in _get_action_input input_dict = self._evaluate_expression(self.task_spec.get_input(), ctx) File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 540, in _evaluate_expression ctx_view File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively data[key] = _evaluate_item(data[key], context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 79, in _evaluate_item return evaluate(item, context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 71, in evaluate return evaluator.evaluate(expression, context) File "/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py", line 159, in evaluate cls).evaluate(trim_expr, data_context) File "/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py", line 113, in evaluate ", data=%s]" % (expression, str(e), data_context) YaqlEvaluationException: Can not evaluate YAQL expression [expression=execution().name, error=WorkflowExecution not found [id=bb5827dc-7643-49e4-85ec-f89a956c2a47], data={}] ] (execution_id=bb5827dc-7643-49e4-85ec-f89a956c2a47) Most of the errors reported there are unfortunately unrelated to the problem. The websocket connection error should probably caught and ignored. The "WorkflowExecution not found" is due to the mistral executor being restarted while the ansible action is still running. The core issue is that the ansible action is not reporting error. Looking at package_update.log which seems to be its output, the last lines say: TASK [Ensure all online data migrations for Cinder have been applied] It's supposed to run "cinder-manage db online_data_migrations" on controller-0, but nothing happens. We managed to reproduce the same issue running an upgrade from OSP13 to OSP14: u'TASK [Ensure all online data migrations for Cinder have been applied] **********', u'Friday 16 November 2018 11:18:12 +0000 (0:00:00.393) 0:00:38.246 ******* ', u'skipping: [overcloud-controller-2] => {"changed": false, "skip_reason": "Conditional result was False"}', u'skipping: [overcloud-controller-1] => {"changed": false, "skip_reason": "Conditional result was False"}'] [u'Failed to update nodes - Controller, please see the logs.'] 2018-11-16 13:22:15.373 338774 ERROR openstack [-] Connection is already closed.: WebSocketConnectionClosedException: Connection is already closed.^[[00m 2018-11-16 13:22:15.376 338774 INFO osc_lib.shell [-] END return value: 1^[[00m When investigating some of the containers in the overcloud controller nodes, it was observed that they lost connectivity with mysql. Also, the cluster was down. This caused the following ansible task to hung for the bootstrap controller node: https://github.com/openstack/tripleo-heat-templates/blob/89d13583f7c706d5c7fbd5052fbedec53474f248/docker/services/cinder-api.yaml#L263 As ansible-playbook process blocked, mistral ended timeouting and we got the WebSocketConnectionClosedException exception. So, we should ensure the cluster status is healthy before running the upgrade. When we tried to bring the cluster back via a start, we observe that some of the resources didn't succeed: Failed Actions: * rabbitmq-bundle-docker-1_monitor_60000 on overcloud-controller-1 'not running' (7): call=84, status=complete, exitreason='', last-rc-change='Fri Nov 16 14:53:31 2018', queued=0ms, exec=0ms * galera-bundle-docker-1_start_0 on overcloud-controller-1 'unknown error' (1): call=80, status=complete, exitreason='failed to pull image 172.31.0.1:8787/rhosp14/openstack- mariadb:pcmklatest', last-rc-change='Fri Nov 16 14:52:30 2018', queued=0ms, exec=367ms * galera-bundle-docker-2_start_0 on overcloud-controller-1 'unknown error' (1): call=94, status=complete, exitreason='failed to pull image 172.31.0.1:8787/rhosp14/openstack- mariadb:pcmklatest', last-rc-change='Fri Nov 16 14:52:31 2018', queued=0ms, exec=249ms * redis-bundle-docker-1_monitor_60000 on overcloud-controller-1 'not running' (7): call=89, status=complete, exitreason='', last-rc-change='Fri Nov 16 14:53:31 2018', queued=0ms, exec=0ms * rabbitmq-bundle-1_monitor_60000 on overcloud-controller-1 'unknown error' (1): call=10, status=Error, exitreason='', last-rc-change='Fri Nov 16 14:53:02 2018', queued=0ms, exec=0ms * redis-bundle-1_monitor_60000 on overcloud-controller-1 'unknown error' (1): call=13, status=Error, exitreason='', last-rc-change='Fri Nov 16 14:53:17 2018', queued=0ms, exec=0ms * rabbitmq_start_0 on rabbitmq-bundle-0 'unknown error' (1): call=15, status=complete, exitreason='', last-rc-change='Fri Nov 16 14:52:50 2018', queued=0ms, exec=25797ms * rabbitmq-bundle-docker-2_monitor_60000 on overcloud-controller-2 'not running' (7): call=85, status=complete, exitreason='', last-rc-change='Fri Nov 16 14:53:31 2018', queued=0ms, exec=0ms * galera-bundle-docker-1_start_0 on overcloud-controller-2 'unknown error' (1): call=94, status=complete, exitreason='failed to pull image 172.31.0.1:8787/rhosp14/openstack- mariadb:pcmklatest', last-rc-change='Fri Nov 16 14:52:31 2018', queued=0ms, exec=277ms * galera-bundle-docker-2_start_0 on overcloud-controller-2 'unknown error' (1): call=80, status=complete, exitreason='failed to pull image 172.31.0.1:8787/rhosp14/openstack- mariadb:pcmklatest', last-rc-change='Fri Nov 16 14:52:30 2018', queued=0ms, exec=333ms * redis-bundle-docker-2_monitor_60000 on overcloud-controller-2 'not running' (7): call=90, status=complete, exitreason='', last-rc-change='Fri Nov 16 14:53:31 2018', queued=0ms, exec=0ms * rabbitmq-bundle-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=10, status=Error, exitreason='', last-rc-change='Fri Nov 16 14:53:02 2018', queued=0ms, exec=0ms * redis-bundle-2_monitor_60000 on overcloud-controller-2 'unknown error' (1): call=13, status=Error, exitreason='', last-rc-change='Fri Nov 16 14:53:17 2018', queued=0ms, exec=0ms The galera-bundle resources are failing to initiate the mariadb container even though the pcmklatest tagged image is present locally in both controllers: [root@overcloud-controller-2 ~]# docker image list | grep pcmklatest 172.31.0.1:8787/rhosp14/openstack-cinder-volume pcmklatest 55d9bcaff11d 9 days ago 1.02 GB 172.31.0.1:8787/rhosp14/openstack-redis pcmklatest ec1e56a5d997 9 days ago 606 MB 172.31.0.1:8787/rhosp14/openstack-rabbitmq pcmklatest a0971285d430 10 days ago 641 MB 172.31.0.1:8787/rhosp14/openstack-haproxy pcmklatest 05e996dd8d0e 10 days ago 604 MB 172.31.0.1:8787/rhosp14/openstack-mariadb pcmklatest 1b590b262ae3 10 days ago 762 MB [root@overcloud-controller-1 ~]# docker image list | grep pcmklatest 172.31.0.1:8787/rhosp14/openstack-cinder-volume pcmklatest 55d9bcaff11d 9 days ago 1.02 GB 172.31.0.1:8787/rhosp14/openstack-redis pcmklatest ec1e56a5d997 9 days ago 606 MB 172.31.0.1:8787/rhosp14/openstack-rabbitmq pcmklatest a0971285d430 10 days ago 641 MB 172.31.0.1:8787/rhosp14/openstack-haproxy pcmklatest 05e996dd8d0e 10 days ago 604 MB 172.31.0.1:8787/rhosp14/openstack-mariadb pcmklatest 1b590b262ae3 10 days ago 762 MB Instead of taking the image which is present locally, it tries to retrieve it from the registry in the undercloud, where it's not present: (undercloud) [stack@c01-h01-r620 ~]$ docker image list | grep pcmklatest (undercloud) [stack@c01-h01-r620 ~]$ The first upgrade attempt fails on task: TASK [take new os-net-config parameters into account now] This probably created a networking issue on the controllers, and right after that this task failed due to machines not being reachable: 2018-11-15 15:00:14,525 p=24270 u=mistral | TASK [Update all packages] ***************************************************** 2018-11-15 15:00:14,525 p=24270 u=mistral | Thursday 15 November 2018 15:00:14 +0000 (0:00:02.148) 0:29:09.822 ***** 2018-11-15 15:06:59,449 p=24270 u=mistral | fatal: [overcloud-controller-2]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"172.31.0.9\". Make sure this host can be reached over ssh", "unreachable": true} 2018-11-15 15:06:59,557 p=24270 u=mistral | fatal: [overcloud-controller-1]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"172.31.0.15\". Make sure this host can be reached over ssh", "unreachable": true} 2018-11-15 15:07:17,498 p=24270 u=mistral | fatal: [overcloud-controller-0]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"172.31.0.14\". Make sure this host can be reached over ssh", "unreachable": true} This left the overcloud with stopped containers and we proceded retrying with the same command. When the services on the overcloud are already stopped and we want to recover, we must skip validations and pre-upgrade tasks. So we should re-run like this: openstack overcloud upgrade run --roles Controller --skip-tags validation,pre-upgrade I got once more this error during the Controller node upgrade run. The log from the command shows: u'TASK [Update all packages] *****************************************************', u'Thursday 22 November 2018 10:00:01 +0000 (0:00:05.012) 2:38:45.134 ***** ', <LOOONG LOG FOR THE TASK> 2018-11-22 11:07:10.319 839625 ERROR openstack [-] Connection is already closed.: WebSocketConnectionClosedException: Connection is already closed.^[[00m 2018-11-22 11:07:10.324 839625 INFO osc_lib.shell [-] END return value: 1^[[00m But, when checking the package_update.log, Ansible continued running tasks much further (I can't paste the log, as it got rewritten by a second upgrade re-try) Right after 11:07:10.31 we can see the following logs in mistral executor: 2018-11-22 11:21:13.898 1 WARNING mistral.executors.default_executor [req-a1249437-143f-4cd3-b5cf-3b65a8141a3c b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] The action raised an exception [action_ex_id=4aeac79c-0830-4efc-9cde-2492166b1b87, action_cls='<class 'mistral.actions.action_factory.AnsiblePlaybookAction'>', attributes='{}', params='{u'remote_user': u'tripleo-admin', u'tags': u'', u'become_user': u'root', u'inventory': u'Undercloud:\n hosts:\n undercloud: {}\n vars:\n ansible_connection: local\n ansible_host: localhost\n ansible_remote_tmp: /tmp/ansible-${USER}\n auth_url: https://172.31.0.2:13000/\n cacert: null\n os_auth_token: gAAAAABb9ljM6pq3ld-AsQCJUohZwkd8WzJtS_V4fXNZF0GM_PjsmyfOdoAOi0lHdpZlA9hLxCOvQZLSw1Ytd85q_HoFW0zifPPBsbsc3vyR0ovOQUriS_7cb-z2jWJ-U2YCpn2ppQhOzfvH4GFZUGgcJQYJdYuL5b23ONI3rQoo5EXNYxPWdxE\n overcloud_admin_password: nt4PAG86Wj2XFtyWbhY2attVZ\n overcloud_horizon_url: http://172.21.0.10:80/dashboard\n overcloud_keystone_url: http://172.21.0.10:5000\n plan: overcloud\n project_name: admin\n undercloud_service_list: [openstack-nova-compute, openstack-heat-engine, openstack-ironic-conductor,\n openstack-swift-container, openstack-swift-object, openstack-mistral-engine]\n undercloud_swift_url: https://172.31.0.2:13808/v1/AUTH_fce1111634fc4bd4b4e91236abc9838d\n username: admin\nR620Compute:\n hosts:\n overcloud-620rcompute-0:\n ansible_host: 172.31.0.28\n ctlplane_ip: 172.31.0.28\n deploy_server_id: 32bb82a1-0511-40c5-af67-c279876a2293\n enabled_networks: [management, storage, ctlplane, external, internal_api, storage_mgmt,\n tenant]\n external_ip: 172.31.0.28\n internal_api_ip: 172.26.0.24\n management_ip: 172.31.0.28\n storage_ip: 172.28.0.8\n storage_mgmt_ip: 172.31.0.28\n tenant_ip: 172.27.0.8\n ....more log... Error response from Zaqar. Code: 401. Text: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}.: UnauthorizedError: Error response from Zaqar. Code: 401. Text: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}. 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor Traceback (most recent call last): 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor result = action.run(action_ctx) 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/ansible.py", line 562, in run 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor self.post_message(queue, ''.join(lines)) 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/ansible.py", line 431, in post_message 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor queue.post(self.format_message(message_part)) 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/zaqarclient/queues/v1/queues.py", line 170, in post 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor self._name, messages) 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/zaqarclient/queues/v1/core.py", line 242, in message_post 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor resp = transport.send(request) 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/zaqarclient/transport/http.py", line 114, in send 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor raise self.http_to_zaqar[resp.status_code](**kwargs) 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor UnauthorizedError: Error response from Zaqar. Code: 401. Text: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}. 2018-11-22 11:21:13.898 1 ERROR mistral.executors.default_executor 2018-11-22 11:21:14.084 1 INFO mistral.executors.executor_server [req-a1249437-143f-4cd3-b5cf-3b65a8141a3c b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] Received RPC request 'run_action'[action_ex_id=9597ee02-2f78-42d2-a088-2d04e96c498b, action_cls_str=mistral.actions.std_actions.NoOpAction, action_cls_attrs={}, params={}, timeout=None] ---few more logs--- 2018-11-22 11:21:15.011 1 INFO mistral.executors.executor_server [req-a1249437-143f-4cd3-b5cf-3b65a8141a3c b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] Received RPC request 'run_action'[action_ex_id=c5c07141-f33a-49fb-ba72-7514c29cf59d, action_cls_str=mistral.actions.openstack.actions.ZaqarAction, action_cls_attrs={u'client_method_name': u'queue_post'}, params={queue_name: upgrade, messages: {u'body': {u'type': u'tripleo.package_update.v1.update_nodes', u'..., timeout=None] 2018-11-22 11:21:15.014 1 INFO mistral.actions.std_actions [req-aa8744bf-730c-4cc7-a46b-970c93e282e1 b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - - -] Running no-op action 2018-11-22 11:21:15.989 1 WARNING mistral.actions.openstack.base [req-a1249437-143f-4cd3-b5cf-3b65a8141a3c b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 117, in run result = method(**self._kwargs_for_run) File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/actions.py", line 472, in wrap return method(client, *args, **kwargs) File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/actions.py", line 518, in queue_post return queue.post(messages) File "/usr/lib/python2.7/site-packages/zaqarclient/queues/v1/queues.py", line 170, in post self._name, messages) File "/usr/lib/python2.7/site-packages/zaqarclient/queues/v1/core.py", line 242, in message_post resp = transport.send(request) File "/usr/lib/python2.7/site-packages/zaqarclient/transport/http.py", line 114, in send raise self.http_to_zaqar[resp.status_code](**kwargs) UnauthorizedError: Error response from Zaqar. Code: 401. Text: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}. And similar log in mistral engine log: 2018-11-22 11:21:13.991 1 INFO mistral.engine.engine_server [req-e9c6ee09-7468-4b4d-ad3b-f339e0360b7a b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] Received RPC request 'on_action_complete'[action_ex_id=4aeac79c-0830-4efc-9cde-2492166b1b87, result=Result [data=None, error=The action raised an exception [action_ex_id=4aeac79c-0830-4efc-9cde-2492166b1b87, action_cls='<clas..., cancel=False]] 2018-11-22 11:21:14.011 1 INFO workflow_trace [req-e9c6ee09-7468-4b4d-ad3b-f339e0360b7a b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] Action 'tripleo.ansible-playbook' (4aeac79c-0830-4efc-9cde-2492166b1b87)(task=node_update) [RUNNING -> ERROR, error = The action raised an exception [action_ex_id=4aeac79c-0830-4efc-9cde-2492166b1b87, action_cls='<clas...] 2018-11-22 11:21:14.024 1 INFO workflow_trace [req-e9c6ee09-7468-4b4d-ad3b-f339e0360b7a b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] Task 'node_update' (36c11201-6f04-401e-8b10-e375a855421b) [RUNNING -> ERROR, msg=The action raised an exception [action_ex_id=4aeac79c-0830-4efc-9cde-2492166b1b87, action_cls='<class 'mistral.actions.action_factory.AnsiblePlaybookAction'>', attributes='{}', params='{u'remote_user': u'tripleo-admin', u'tags': u'', u'become_user': u'root', u'inventory': u'Undercloud:\n hosts:\n undercloud: {}\n vars:\n ansible_connection: local\n ansible_host: localhost\n ansible_remote_tmp: /tmp/ansible-${USER}\n auth_url: https://172.31.0.2:13000/\n cacert: null\n os_auth_token: gAAAAABb9ljM6pq3ld-AsQCJUohZwkd8WzJtS_V4fXNZF0GM_PjsmyfOdoAOi0lHdpZlA9hLxCOvQZLSw1Ytd85q_HoFW0zifPPBsbsc3vyR0ovOQUriS_7cb-z2jWJ-U2YCpn2ppQhOzfvH4GFZUGgcJQYJdYuL5b23ONI3rQoo5EXNYxPWdxE\n overcloud_admin_password: nt4PAG86Wj2XFtyWbhY2attVZ\n overcloud_horizon_url: http://172.21.0.10:80/dashboard\n overcloud_keystone_url: http://172.21.0.10:5000\n plan: overcloud\n project_name: admin\n undercloud_service_list: [openstack-nova- ---more of this log--- Error response from Zaqar. Code: 401. Text: {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}.] (execution_id=af0e9156-1481-4f45-844e-bc9b7303c666) 2018-11-22 11:21:14.095 1 INFO mistral.engine.engine_server [req-e9c6ee09-7468-4b4d-ad3b-f339e0360b7a b6dab204393e4546b90cfa7de1abbae0 fce1111634fc4bd4b4e91236abc9838d - default default] Received RPC request 'report_running_actions'[action_ex_ids=[u'9597ee02-2f78-42d2-a088-2d04e96c498b']] If checking zaqar-server logs at that very same time: 2018-11-22 07:20:53.124 1 INFO zaqar.transport.websocket.protocol [-] Response: API v2 txt, 200. Request: action "authenticate", body {}. 2018-11-22 07:20:54.494 1 INFO zaqar.transport.websocket.protocol [-] Response: API v2 txt, 204. Request: action "queue_create", body {"queue_name": "upgrade"}. 2018-11-22 07:20:54.668 1 INFO zaqar.transport.websocket.protocol [-] Response: API v2 txt, 201. Request: action "subscription_create", body {"queue_name": "upgrade", "ttl": 10000}. 2018-11-22 11:07:10.868 1 INFO zaqar.transport.websocket.protocol [-] WebSocket connection closed: connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake) 2018-11-22 16:27:48.374 1 INFO zaqar.transport.websocket.protocol [-] Client connecting: tcp:172.31.0.1:53348 2018-11-22 16:27:48.374 1 INFO zaqar.transport.websocket.protocol [-] WebSocket connection open. I got also this log in a second retry I ran: Error response from Zaqar. Code: 503. Title: Service temporarily unavailable. Description: Messages could not be enqueued. Please try again in a few seconds..: ServiceUnavailableError: Error response from Zaqar. Code: 503. Title: Service temporarily unavailable. Description: Messages could not be enqueued. Please try again in a few seconds.. 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor Traceback (most recent call last): 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor result = action.run(action_ctx) 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/ansible.py", line 562, in run 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor self.post_message(queue, ''.join(lines)) 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/ansible.py", line 431, in post_message 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor queue.post(self.format_message(message_part)) 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/zaqarclient/queues/v1/queues.py", line 170, in post 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor self._name, messages) 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/zaqarclient/queues/v1/core.py", line 242, in message_post 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor resp = transport.send(request) 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/zaqarclient/transport/http.py", line 114, in send 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor raise self.http_to_zaqar[resp.status_code](**kwargs) 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor ServiceUnavailableError: Error response from Zaqar. Code: 503. Title: Service temporarily unavailable. Description: Messages could not be enqueued. Please try again in a few seconds.. 2018-11-23 14:06:10.200 1 ERROR mistral.executors.default_executor (In reply to Jose Luis Franco from comment #8) > I got also this log in a second retry I ran: Anything in the zaqar logs? The system was running out of space and swift had troubles. This casued zaqar to fail. Some space was freed up and it's being retried again. The system was running out of space and that caused unexpected behavior in swift which impacted in zaqar. As this was corner case, let's close it and reopen if it's faced again. |