Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1573496

Summary:

FFU: openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck and doesn't exit properly

Product:

Red Hat OpenStack

Reporter:

Marius Cornea <mcornea>

Component:

openstack-tripleo-common

Assignee:

Brad P. Crochet <brad>

Status:

CLOSED ERRATA

QA Contact:

Marius Cornea <mcornea>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

13.0 (Queens)

CC:

agurenko, brad, dbecker, hbrock, jschluet, jslagle, mandreou, mbayer, mbultel, mburns, mcornea, morazi, scohen, slinaber

Target Milestone:

Keywords:

Triaged

Target Release:

13.0 (Queens)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openstack-tripleo-common-8.6.1-14.el7ost

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-06-27 13:54:52 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1561169

Attachments:

Description	Flags
overcloud_upgrade_Controller.log	none
mistral.tar.gz	none

Description Marius Cornea 2018-05-01 14:17:36 UTC

Description of problem:
FFU: openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck and doesn't exit properly:

It appears that the ansible playbook commands finished successfully but the client doesn't exit hence it breaks automation:

 u'TASK [Debug output for task which failed: Run docker-puppet tasks (bootstrap tasks) for step 5] ***',
 u'skipping: [192.168.24.18] => {"changed": false, "skip_reason": "Conditional result was False"}',
 u'skipping: [192.168.24.13] => {"changed": false, "skip_reason": "Conditional result was False"}',
 u'skipping: [192.168.24.20] => {"changed": false, "skip_reason": "Conditional result was False"}',
 u'',
 u'PLAY [Server Post Deployments] *************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'TASK [include] *****************************************************************',
 u'',
 u'PLAY [External deployment Post Deploy tasks] ***********************************',
 u'skipping: no hosts matched',
 u'',
 u'PLAY RECAP *********************************************************************',
 u'192.168.24.13              : ok=153  changed=42   unreachable=0    failed=0   ',
 u'192.168.24.18              : ok=153  changed=42   unreachable=0    failed=0   ',
 u'192.168.24.20              : ok=156  changed=43   unreachable=0    failed=0   ',
 u'']


Version-Release number of selected component (if applicable):
python-tripleoclient-9.2.1-3.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controllers + 2 compute + 3 ceph osd nodes
2. Run FFU procedure

Actual results:
openstack overcloud upgrade run --roles Controller --skip-tags validation gets stuck after the playbooks run and never exits

Expected results:
Command exits with the correct return code and doesn't get stuck

Additional info:
Attaching sosreport and output of openstack overcloud upgrade run --roles Controller --skip-tags validation.

Comment 1 Marius Cornea 2018-05-01 14:20:45 UTC

Created attachment 1429184 [details]
overcloud_upgrade_Controller.log

Comment 3 Marius Cornea 2018-05-02 19:03:14 UTC

/var/log/mistral/engine.log shows several messages like:

2018-05-02 14:46:26.665 1382 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1118, u'The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size.')
2018-05-02 14:46:26.665 1382 ERROR oslo_db.sqlalchemy.exc_filters 
2018-05-02 14:46:26.709 1382 ERROR oslo_messaging.rpc.server [req-b39b46d0-a3c6-4cff-951b-571b1a84f541 e192aaec52134496a20a46a326c791b4 0af62c87be6a411587ac86644fcd6134 - - -] Exception during message handling: DBError: (pymysql.err.InternalError) (1118, u'The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size.') [SQL: u'UPDATE action_executions_v2 SET updated_at=%(updated_at)s, state=%(state)s, accepted=%(accepted)s, output=%(output)s WHERE action_executions_v2.id = %(action_executions_v2_id)s'] [parameters: {'output': '{"result": {"returncode": 0, "stderr": "", "stdout": "Using /tmp/ansible-mistral-actionR8Ak5T/ansible.cfg as config file\\n [WARNING]: Skipping unexp ... (12696382 characters truncated) ...         : ok=163  changed=42   unreachable=0    failed=0   \\n192.168.24.19              : ok=163  changed=42   unreachable=0    failed=0   \\n\\n"}}', 'state': 'SUCCESS', 'accepted': 1, 'updated_at': datetime.datetime(2018, 5, 2, 18, 46, 26), 'action_executions_v2_id': u'27418c8b-fd94-4192-8a97-58aed3720586'}] (Background on this error at: http://sqlalche.me/e/2j85)

Comment 4 Marius Cornea 2018-05-02 19:08:30 UTC

Created attachment 1430251 [details]
mistral.tar.gz

Comment 5 Michael Bayer 2018-05-02 19:14:32 UTC

as mentioned on IRC, the DB error here is straightforward, so we either need to change that setting, or reduce the size of the stdout value being inserted into that row.

Comment 6 Marios Andreou 2018-05-07 12:42:01 UTC

can you please check if review fixes or if we need something more

Comment 7 Marius Cornea 2018-05-07 18:03:23 UTC

(In reply to Marios Andreou from comment #6)
> can you please check if review fixes or if we need something more

I tested the attached patch and I wasn't able to reproduce the initial issue so i think we're good to go, we just need the change in the downstream build.

Comment 8 Brad P. Crochet 2018-05-08 13:21:56 UTC

I have cherry-picked the change back to stable/queens.

Comment 9 James Slagle 2018-05-17 19:43:43 UTC

*** Bug 1579500 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2018-06-27 13:54:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086