Bug 1573496 - FFU: openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck and doesn't exit properly
Summary: FFU: openstack overcloud upgrade run --roles Controller --skip-tags validatio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: 13.0 (Queens)
Assignee: Brad P. Crochet
QA Contact: Marius Cornea
URL:
Whiteboard:
: 1579500 (view as bug list)
Depends On:
Blocks: 1561169
TreeView+ depends on / blocked
 
Reported: 2018-05-01 14:17 UTC by Marius Cornea
Modified: 2018-06-27 13:55 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-common-8.6.1-14.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 13:54:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
overcloud_upgrade_Controller.log (14.02 MB, text/plain)
2018-05-01 14:20 UTC, Marius Cornea
no flags Details
mistral.tar.gz (5.21 MB, application/x-gzip)
2018-05-02 19:08 UTC, Marius Cornea
no flags Details


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 565900 None MERGED Add a 'trash_output' flag to ansible playbook action 2020-03-30 18:57:58 UTC
OpenStack gerrit 566884 None MERGED Add a 'trash_output' flag to ansible playbook action 2020-03-30 18:57:58 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:55:51 UTC

Description Marius Cornea 2018-05-01 14:17:36 UTC
Description of problem:
FFU: openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck and doesn't exit properly:

It appears that the ansible playbook commands finished successfully but the client doesn't exit hence it breaks automation:

 u'TASK [Debug output for task which failed: Run docker-puppet tasks (bootstrap tasks) for step 5] ***',
 u'skipping: [192.168.24.18] => {"changed": false, "skip_reason": "Conditional result was False"}',
 u'skipping: [192.168.24.13] => {"changed": false, "skip_reason": "Conditional result was False"}',
 u'skipping: [192.168.24.20] => {"changed": false, "skip_reason": "Conditional result was False"}',
 u'',
 u'PLAY [Server Post Deployments] *************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'PLAY [External deployment Post Deploy tasks] ***********************************',
 u'skipping: no hosts matched',
 u'',
 u'PLAY RECAP *********************************************************************',
 u'192.168.24.13              : ok=153  changed=42   unreachable=0    failed=0   ',
 u'192.168.24.18              : ok=153  changed=42   unreachable=0    failed=0   ',
 u'192.168.24.20              : ok=156  changed=43   unreachable=0    failed=0   ',
 u'']


Version-Release number of selected component (if applicable):
python-tripleoclient-9.2.1-3.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controllers + 2 compute + 3 ceph osd nodes
2. Run FFU procedure

Actual results:
openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck after the playbooks run and never exits

Expected results:
Command exits with the correct return code and doesn't get stuck

Additional info:
Attaching sosreport and output of openstack overcloud upgrade run --roles Controller --skip-tags validation.

Comment 1 Marius Cornea 2018-05-01 14:20:45 UTC
Created attachment 1429184 [details]
overcloud_upgrade_Controller.log

Comment 3 Marius Cornea 2018-05-02 19:03:14 UTC
/var/log/mistral/engine.log shows several messages like:

2018-05-02 14:46:26.665 1382 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1118, u'The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size.')
2018-05-02 14:46:26.665 1382 ERROR oslo_db.sqlalchemy.exc_filters 
2018-05-02 14:46:26.709 1382 ERROR oslo_messaging.rpc.server [req-b39b46d0-a3c6-4cff-951b-571b1a84f541 e192aaec52134496a20a46a326c791b4 0af62c87be6a411587ac86644fcd6134 - - -] Exception during message handling: DBError: (pymysql.err.InternalError) (1118, u'The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size.') [SQL: u'UPDATE action_executions_v2 SET updated_at=%(updated_at)s, state=%(state)s, accepted=%(accepted)s, output=%(output)s WHERE action_executions_v2.id = %(action_executions_v2_id)s'] [parameters: {'output': '{"result": {"returncode": 0, "stderr": "", "stdout": "Using /tmp/ansible-mistral-actionR8Ak5T/ansible.cfg as config file\\n [WARNING]: Skipping unexp ... (12696382 characters truncated) ...         : ok=163  changed=42   unreachable=0    failed=0   \\n192.168.24.19              : ok=163  changed=42   unreachable=0    failed=0   \\n\\n"}}', 'state': 'SUCCESS', 'accepted': 1, 'updated_at': datetime.datetime(2018, 5, 2, 18, 46, 26), 'action_executions_v2_id': u'27418c8b-fd94-4192-8a97-58aed3720586'}] (Background on this error at: http://sqlalche.me/e/2j85)

Comment 4 Marius Cornea 2018-05-02 19:08:30 UTC
Created attachment 1430251 [details]
mistral.tar.gz

Comment 5 Michael Bayer 2018-05-02 19:14:32 UTC
as mentioned on IRC, the DB error here is straightforward, so we either need to change that setting, or reduce the size of the stdout value being inserted into that row.

Comment 6 Marios Andreou 2018-05-07 12:42:01 UTC
can you please check if review fixes or if we need something more

Comment 7 Marius Cornea 2018-05-07 18:03:23 UTC
(In reply to Marios Andreou from comment #6)
> can you please check if review fixes or if we need something more

I tested the attached patch and I wasn't able to reproduce the initial issue so i think we're good to go, we just need the change in the downstream build.

Comment 8 Brad P. Crochet 2018-05-08 13:21:56 UTC
I have cherry-picked the change back to stable/queens.

Comment 9 James Slagle 2018-05-17 19:43:43 UTC
*** Bug 1579500 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2018-06-27 13:54:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.