Description of problem: FFU: openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck and doesn't exit properly: It appears that the ansible playbook commands finished successfully but the client doesn't exit hence it breaks automation: u'TASK [Debug output for task which failed: Run docker-puppet tasks (bootstrap tasks) for step 5] ***', u'skipping: [192.168.24.18] => {"changed": false, "skip_reason": "Conditional result was False"}', u'skipping: [192.168.24.13] => {"changed": false, "skip_reason": "Conditional result was False"}', u'skipping: [192.168.24.20] => {"changed": false, "skip_reason": "Conditional result was False"}', u'', u'PLAY [Server Post Deployments] *************************************************', u'', u'TASK [include] *****************************************************************', u'', u'TASK [include] *****************************************************************', u'', u'TASK [include] *****************************************************************', u'', u'TASK [include] *****************************************************************', u'', u'TASK [include] *****************************************************************', u'', u'PLAY [External deployment Post Deploy tasks] ***********************************', u'skipping: no hosts matched', u'', u'PLAY RECAP *********************************************************************', u'192.168.24.13 : ok=153 changed=42 unreachable=0 failed=0 ', u'192.168.24.18 : ok=153 changed=42 unreachable=0 failed=0 ', u'192.168.24.20 : ok=156 changed=43 unreachable=0 failed=0 ', u''] Version-Release number of selected component (if applicable): python-tripleoclient-9.2.1-3.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with 3 controllers + 2 compute + 3 ceph osd nodes 2. Run FFU procedure Actual results: openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck after the playbooks run and never exits Expected results: Command exits with the correct return code and doesn't get stuck Additional info: Attaching sosreport and output of openstack overcloud upgrade run --roles Controller --skip-tags validation.
Created attachment 1429184 [details] overcloud_upgrade_Controller.log
/var/log/mistral/engine.log shows several messages like: 2018-05-02 14:46:26.665 1382 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1118, u'The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size.') 2018-05-02 14:46:26.665 1382 ERROR oslo_db.sqlalchemy.exc_filters 2018-05-02 14:46:26.709 1382 ERROR oslo_messaging.rpc.server [req-b39b46d0-a3c6-4cff-951b-571b1a84f541 e192aaec52134496a20a46a326c791b4 0af62c87be6a411587ac86644fcd6134 - - -] Exception during message handling: DBError: (pymysql.err.InternalError) (1118, u'The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size.') [SQL: u'UPDATE action_executions_v2 SET updated_at=%(updated_at)s, state=%(state)s, accepted=%(accepted)s, output=%(output)s WHERE action_executions_v2.id = %(action_executions_v2_id)s'] [parameters: {'output': '{"result": {"returncode": 0, "stderr": "", "stdout": "Using /tmp/ansible-mistral-actionR8Ak5T/ansible.cfg as config file\\n [WARNING]: Skipping unexp ... (12696382 characters truncated) ... : ok=163 changed=42 unreachable=0 failed=0 \\n192.168.24.19 : ok=163 changed=42 unreachable=0 failed=0 \\n\\n"}}', 'state': 'SUCCESS', 'accepted': 1, 'updated_at': datetime.datetime(2018, 5, 2, 18, 46, 26), 'action_executions_v2_id': u'27418c8b-fd94-4192-8a97-58aed3720586'}] (Background on this error at: http://sqlalche.me/e/2j85)
Created attachment 1430251 [details] mistral.tar.gz
as mentioned on IRC, the DB error here is straightforward, so we either need to change that setting, or reduce the size of the stdout value being inserted into that row.
can you please check if review fixes or if we need something more
(In reply to Marios Andreou from comment #6) > can you please check if review fixes or if we need something more I tested the attached patch and I wasn't able to reproduce the initial issue so i think we're good to go, we just need the change in the downstream build.
I have cherry-picked the change back to stable/queens.
*** Bug 1579500 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086