DescriptionSai Sindhur Malleni
2019-08-29 17:48:35 UTC
Description of problem:
Trying to scale up a 103 node overcloud to 207 nodes, on a multiple attempts we see the heat stack update finish successfully, but the deploy fail due to timeout of ssh enablement workflow.
ssh admin enabssh admin enablement workflow - TIMED OUT.
We saw the same result even on bumping ENABLE_SSH_ADMIN_TIMEOUT and ENABLE_SSH_ADMIN_SSH_PORT_TIMEOUT to 600 from 300
James Slagle Looked at the ansible log file, andit appears ansible itself succeeded but in the logs we see
2019-08-29 16:40:59.867 409069 ERROR mistral.db.utils [req-704d2539-505b-4259-b8a3-9f82c1ffe4da 2a6d10bddc274e00b00ad4d4adeffda5 c67ce78faf0643708bc7b067eb7525bd - default default] DB error detected, operation will be retried: <function on_action_complete at 0x7f1332047140>: DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(32, 'Broken pipe'))") [SQL: u'UPDATE action_executions_v2 SET updated_at=%(updated_at)s, state=%(state)s, accepted=%(accepted)s, output=%(output)s WHERE action_executions_v2.id = %(action_executions_v2_id)s'] [parameters: {'output': '{"result": {"log_path": "/tmp/ansible-mistral-actionwGCOhN/ansible.log", "stderr": "ansible-playbook 2.6.11\\n config file = /tmp/ansible-mistral-ac ... (24673890 characters truncated) ... +0000 (0:00:20.791) 0:02:14.533 ******* \\n=============================================================================== \\n", "stdout": ""}}', 'state': 'SUCCESS', 'accepted': 1, 'updated_at': datetime.datetime(2019, 8, 29, 16, 40, 59), 'action_executions_v2_id': u'6712d0f7-0c20-4239-b03d-b4560193bf46'}] (Background on this error at: http://sqlalche.me/e/e3q8)
James feels this could be related to the stdout geenrated by the command.
Version-Release number of selected component (if applicable):
13
How reproducible:
100% on an overcloud of this size
Steps to Reproduce:
1. deploy a large overcloud using config-donwload
2.
3.
Actual results:
Deploy fails after successful heat stack create/update but fails during ssh enablement workflow.
Expected results:
SSh enablement should succeed as well as overcloud deployment.
Additional info:
Comment 1Sai Sindhur Malleni
2019-08-29 18:10:16 UTC