Bug 1705694 - openstack overcloud deploy command fails with socket.timeout: timed out
Summary: openstack overcloud deploy command fails with socket.timeout: timed out
Keywords:
Status: CLOSED DUPLICATE of bug 1700096
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-02 19:06 UTC by Sai Sindhur Malleni
Modified: 2019-05-10 20:50 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-06 20:24:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sai Sindhur Malleni 2019-05-02 19:06:39 UTC
Description of problem:
Trying to deploy an overcloud with the following command, 
time openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e ~/templates/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml -e ~/containers-prepare-parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml -e ~/templates/osp15.yaml --ntp-server clock.redhat.com


After about 20 minutes it fails with the following error:
Creating Swift container to store the plan
Creating plan from template files in: /tmp/tripleoclient-9lci373i/tripleo-heat-templates
Timed out waiting for messages from Execution (ID: 75f7f917-67ab-4b9c-8ad8-210f16660c99, State: ERROR). The Workflow errored and no messages were received.
Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/websocket/_socket.py", line 81, in recv
    bytes_ = sock.recv(bufsize)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 153, in wait_for_messages
    message = self.recv()
  File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 131, in recv
    return json.loads(self._ws.recv())
  File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 310, in recv
    opcode, data = self.recv_data()
  File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 327, in recv_data
    opcode, frame = self.recv_data_frame(control_frame)
  File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 340, in recv_data_frame
    frame = self.recv_frame()
  File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 374, in recv_frame
    return self.frame_buffer.recv_frame()
  File "/usr/lib/python3.6/site-packages/websocket/_abnf.py", line 361, in recv_frame
    self.recv_header()
  File "/usr/lib/python3.6/site-packages/websocket/_abnf.py", line 309, in recv_header
    header = self.recv_strict(2)
  File "/usr/lib/python3.6/site-packages/websocket/_abnf.py", line 396, in recv_strict
    bytes_ = self.recv(min(16384, shortage))
  File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 449, in _recv
    return recv(self.sock, bufsize)
  File "/usr/lib/python3.6/site-packages/websocket/_socket.py", line 84, in recv
    raise WebSocketTimeoutException(message)
websocket._exceptions.WebSocketTimeoutException: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 919, in take_action
    self._deploy_tripleo_heat_templates_tmpdir(stack, parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 374, in _deploy_tripleo_heat_templates_tmpdir
    new_tht_root, tht_root)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 407, in _deploy_tripleo_heat_templates
    validate_stack=False)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/plan_management.py", line 174, in create_plan_from_templates
    validate_stack=validate_stack)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/plan_management.py", line 87, in create_deployment_plan
    **workflow_input)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/plan_management.py", line 77, in _create_update_deployment_plan
    _WORKFLOW_TIMEOUT):
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/base.py", line 61, in wait_for_messages
    for payload in websocket.wait_for_messages(timeout=timeout):
  File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 158, in wait_for_messages
    raise exceptions.WebSocketTimeout()
tripleoclient.exceptions.WebSocketTimeout


Version-Release number of selected component (if applicable):
OSP15
(undercloud) [stack@f16-h10-000-1029p ~]$ sudo rpm -qa | grep tripleo
openstack-tripleo-puppet-elements-10.3.1-0.20190420090433.9ba1438.el8ost.noarch
openstack-tripleo-image-elements-10.4.1-0.20190420043237.7d6edd9.el8ost.noarch
python3-tripleoclient-heat-installer-11.4.1-0.20190423085110.290ac95.el8ost.noarch
openstack-tripleo-validations-10.4.1-0.20190420030347.9d08e89.el8ost.noarch
python3-tripleo-common-10.7.1-0.20190423083511.2199eeb.el8ost.noarch
python3-tripleoclient-11.4.1-0.20190423085110.290ac95.el8ost.noarch
ansible-tripleo-ipsec-9.1.1-0.20190422122014.8c1fdab.el8ost.noarch
ansible-role-tripleo-modify-image-1.0.1-0.20190422122515.f1dfdc6.el8ost.noarch
openstack-tripleo-common-10.7.1-0.20190423083511.2199eeb.el8ost.noarch
openstack-tripleo-heat-templates-10.5.1-0.20190423085106.3f148c4.el8ost.noarch
openstack-tripleo-common-containers-10.7.1-0.20190423083511.2199eeb.el8ost.noarch
puppet-tripleo-10.4.1-0.20190420063733.7fc5500.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy undercloud and introspect overcloud nodes
2. Run overcloud deplyo command
3.

Actual results:
Command exits with failure

Expected results:
Deploy should succeed

Additional info:
Looking at mistral engine logs on undercloud, I see
2019-05-02 18:40:15.028 1 ERROR mistral.engine.task_handler [req-6a5a1e26-0287-4424-b5b0-9485fc25152e a76551fbe21c42dd8ea80ac74eeedd76 5018fa8b4e8144dc901c4e04cd0a624b - default default] Failed to run task [error=Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=['validate']], wf=tripleo.plan_management.v1.create_deployment_plan, task=add_root_stack_name]:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/mistral/engine/task_handler.py", line 63, in run_task
    task.run()
  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 453, in run
    self._run_new()
  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 485, in _run_new
    self._schedule_actions()
  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 563, in _schedule_actions
    action.validate_input(input_dict)
  File "/usr/lib/python3.6/site-packages/mistral/engine/actions.py", line 336, in validate_input
    self.action_def.action_class
  File "/usr/lib/python3.6/site-packages/mistral/engine/utils.py", line 66, in validate_input
    raise exc.InputException(msg % tuple(msg_props))
mistral.exceptions.InputException: Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=['validate']]
: mistral.exceptions.InputException: Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=['validate']]

Comment 1 Alex Schultz 2019-05-06 20:24:23 UTC
I believe this is a duplicate of Bug 1700044. Please let us know if it's still occurring after the fix for 1700044 has been applied.

*** This bug has been marked as a duplicate of bug 1700044 ***

Comment 2 Sai Sindhur Malleni 2019-05-06 21:37:35 UTC
Hi Alex,

To apply the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1700044 please advise if the following two steps are enough,

1. On undercloud, install python3-oslo-rootwrap using dnf install python3-oslo-rootwrap
2. Patch tripleo-common on undercloud at /usr/lib/python3.6/site-packages/tripleo_common/actions/ansible.py

Comment 3 Alex Schultz 2019-05-06 22:47:52 UTC
No you have to patch the mistral container. It needs to be updated in the mistral-engine container and then the container needs to be restarted.

Comment 4 Sai Sindhur Malleni 2019-05-10 18:12:04 UTC
Hi Alex.
So I patched the mistral container with https://review.opendev.org/#/c/657090/1/tripleo_common/actions/ansible.py and ran podman restart mistral_engine.

Now also I see the overcloud deploy failing, but much faster

(undercloud) [stack@f16-h10-000-1029p ~]$ time openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e ~/templates/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovn-ha.yaml -e ~/containers-prepare-parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml -e ~/templates/osp15.yaml --ntp-server clock.redhat.com
Removing the current plan files
Uploading new plan files
{'result': 'Failed to run task [error=Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=[\'validate\']], wf=tripleo.swift_backup.v1.create_swift_backup_container_plan, task=set_tempurl]:\nTraceback (most recent call last):\n  File "/usr/lib/python3.6/site-packages/mistral/engine/task_handler.py", line 63, in run_task\n    task.run()\n  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper\n    result = f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 453, in run\n    self._run_new()\n  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper\n    result = f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 485, in _run_new\n    self._schedule_actions()\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 563, in _schedule_actions\n    action.validate_input(input_dict)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/actions.py", line 336, in validate_input\n    self.action_def.action_class\n  File "/usr/lib/python3.6/site-packages/mistral/engine/utils.py", line 66, in validate_input\n    raise exc.InputException(msg % tuple(msg_props))\nmistral.exceptions.InputException: Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=[\'validate\']]\n'}
Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 919, in take_action
    self._deploy_tripleo_heat_templates_tmpdir(stack, parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 374, in _deploy_tripleo_heat_templates_tmpdir
    new_tht_root, tht_root)
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 400, in _deploy_tripleo_heat_templates
    validate_stack=False)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/plan_management.py", line 238, in update_plan_from_templates
    validate_stack=validate_stack)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/plan_management.py", line 122, in update_deployment_plan
    'Exception updating plan: {}'.format(payload['message']))
tripleoclient.exceptions.WorkflowServiceError: Exception updating plan: {'result': 'Failed to run task [error=Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=[\'validate\']], wf=tripleo.swift_backup.v1.create_swift_backup_container_plan, task=set_tempurl]:\nTraceback (most recent call last):\n  File "/usr/lib/python3.6/site-packages/mistral/engine/task_handler.py", line 63, in run_task\n    task.run()\n  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper\n    result = f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 453, in run\n    self._run_new()\n  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper\n    result = f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 485, in _run_new\n    self._schedule_actions()\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 563, in _schedule_actions\n    action.validate_input(input_dict)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/actions.py", line 336, in validate_input\n    self.action_def.action_class\n  File "/usr/lib/python3.6/site-packages/mistral/engine/utils.py", line 66, in validate_input\n    raise exc.InputException(msg % tuple(msg_props))\nmistral.exceptions.InputException: Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=[\'validate\']]\n'}
Exception updating plan: {'result': 'Failed to run task [error=Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=[\'validate\']], wf=tripleo.swift_backup.v1.create_swift_backup_container_plan, task=set_tempurl]:\nTraceback (most recent call last):\n  File "/usr/lib/python3.6/site-packages/mistral/engine/task_handler.py", line 63, in run_task\n    task.run()\n  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper\n    result = f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 453, in run\n    self._run_new()\n  File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper\n    result = f(*args, **kwargs)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 485, in _run_new\n    self._schedule_actions()\n  File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 563, in _schedule_actions\n    action.validate_input(input_dict)\n  File "/usr/lib/python3.6/site-packages/mistral/engine/actions.py", line 336, in validate_input\n    self.action_def.action_class\n  File "/usr/lib/python3.6/site-packages/mistral/engine/utils.py", line 66, in validate_input\n    raise exc.InputException(msg % tuple(msg_props))\nmistral.exceptions.InputException: Invalid input [name=tripleo.parameters.update, class=tripleo_common.actions.parameters.UpdateParametersAction, unexpected=[\'validate\']]\n'}

real	0m24.002s
user	0m4.088s
sys	0m6.133s

Comment 5 Alex Schultz 2019-05-10 19:36:17 UTC
That error points to a mismatch in containers & tripleo-common on the undercloud. What containers are you using?  See Bug 1700096

*** This bug has been marked as a duplicate of bug 1700096 ***

Comment 6 Sai Sindhur Malleni 2019-05-10 20:20:35 UTC
Tag is 20190306.1 (passed_phase1)

Comment 7 Sai Sindhur Malleni 2019-05-10 20:20:58 UTC
Tag is 20190306.1 (passed_phase1)

Comment 8 Alex Schultz 2019-05-10 20:50:04 UTC
That's way old. you need to use a newer version of the containers that goes with the tripleo-common you have installed. We should have containers from May 9th at least available (the most recent pass of phase1)


Note You need to log in before you can comment on or make changes to this bug.