Description of problem: Trying to scale up a 103 node overcloud to 207 nodes, on a multiple attempts we see the heat stack update finish successfully, but the deploy fail due to timeout of ssh enablement workflow. ssh admin enabssh admin enablement workflow - TIMED OUT. We saw the same result even on bumping ENABLE_SSH_ADMIN_TIMEOUT and ENABLE_SSH_ADMIN_SSH_PORT_TIMEOUT to 600 from 300 James Slagle Looked at the ansible log file, andit appears ansible itself succeeded but in the logs we see 2019-08-29 16:40:59.867 409069 ERROR mistral.db.utils [req-704d2539-505b-4259-b8a3-9f82c1ffe4da 2a6d10bddc274e00b00ad4d4adeffda5 c67ce78faf0643708bc7b067eb7525bd - default default] DB error detected, operation will be retried: <function on_action_complete at 0x7f1332047140>: DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(32, 'Broken pipe'))") [SQL: u'UPDATE action_executions_v2 SET updated_at=%(updated_at)s, state=%(state)s, accepted=%(accepted)s, output=%(output)s WHERE action_executions_v2.id = %(action_executions_v2_id)s'] [parameters: {'output': '{"result": {"log_path": "/tmp/ansible-mistral-actionwGCOhN/ansible.log", "stderr": "ansible-playbook 2.6.11\\n config file = /tmp/ansible-mistral-ac ... (24673890 characters truncated) ... +0000 (0:00:20.791) 0:02:14.533 ******* \\n=============================================================================== \\n", "stdout": ""}}', 'state': 'SUCCESS', 'accepted': 1, 'updated_at': datetime.datetime(2019, 8, 29, 16, 40, 59), 'action_executions_v2_id': u'6712d0f7-0c20-4239-b03d-b4560193bf46'}] (Background on this error at: http://sqlalche.me/e/e3q8) James feels this could be related to the stdout geenrated by the command. Version-Release number of selected component (if applicable): 13 How reproducible: 100% on an overcloud of this size Steps to Reproduce: 1. deploy a large overcloud using config-donwload 2. 3. Actual results: Deploy fails after successful heat stack create/update but fails during ssh enablement workflow. Expected results: SSh enablement should succeed as well as overcloud deployment. Additional info:
Created attachment 1609627 [details] ansible-log
https://review.opendev.org/#/c/679481/ should also be backported to queens
*** Bug 1746953 has been marked as a duplicate of this bug. ***
*** Bug 1703618 has been marked as a duplicate of this bug. ***
According to our records, this should be resolved by openstack-tripleo-common-8.6.8-16.el7ost. This build is available now.
*** Bug 1780687 has been marked as a duplicate of this bug. ***
*** Bug 1786063 has been marked as a duplicate of this bug. ***