Description of problem: ----------------------- Running next command failed: openstack overcloud external-upgrade run \ --stack QE-Cloud-0 2>&1 ... ERROR openstack [-] Update failed with: {u'status': u'RUNNING', u'message': u'ason\\": \\"Conditional result was False\\"}", "", "TASK [ceph-osd : include common.yml] ...(log of command is attached with sosreports) Version-Release number of selected component (if applicable): ------------------------------------------------------------- python-tripleoclient-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch python-tripleoclient-heat-installer-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch Steps to Reproduce: ------------------- 1. Upgrade UC to RHOS-14 2. Upgrade OC to RHOS-14 3. Start upgrade of ceph openstack overcloud external-upgrade run Actual results: --------------- Upgrade failed with no obvious reason logged
Based on our earlier investigation it seemed like ceph-ansible finished fine, it was just the CLI command that failed. It should be fine to ignore the failure to continue with the upgrade procedure, but this needs fixing before release. I'll triage this to high/high, but i keep the blocker flag.
Yuri is this reproducible repeatedly? I ran the external-upgrade command and can't reproduce the error, the command itself seems to work fine for me. I wonder if it could be that the ceph-ansible output is too big and it breaks the Zaqar message processing somehow. That's the only sensible explanation i can think of. The task "run ceph-anisble" was successful, and then we can see an incomplete output from it, and the CLI command crashes. That looks like nothing was broken in Ansible but something went over limits when communicating the log output perhaps... I'll add DFG:Ceph too, since we'll likely want to address this in the Ceph composable service Ansible tasks. Cut up the output into smaller chunks somehow perhaps. Looking into this.
We still don't know the root cause of this bug with certainty, but if my above guess is correct, we may be able to solve it this way: https://review.openstack.org/#/c/607302/
Merged to master, merging to stable/rocky.
I'll cancel needinfo on Yurii. The patch we have now is our best shot anyway :)
Verified with: - openstack-tripleo-heat-templates-9.0.1-0.20181013060906.el7ost.noarch - ceph-ansible-3.1.10-1.el7cp.noarch - python-tripleoclient-10.6.1-0.20181010222412.8c8f259.el7ost.noarch openstack overcloud external-upgrade run \ --stack qe-Cloud-0 \ --tags ceph 2>&1 ... u'PLAY RECAP *********************************************************************', u'ceph-0 : ok=2 changed=0 unreachable=0 failed=0 ', u'ceph-1 : ok=2 changed=0 unreachable=0 failed=0 ', u'ceph-2 : ok=2 changed=0 unreachable=0 failed=0 ', u'compute-0 : ok=2 changed=0 unreachable=0 failed=0 ', u'compute-1 : ok=2 changed=0 unreachable=0 failed=0 ', u'controller-0 : ok=2 changed=0 unreachable=0 failed=0 ', u'controller-1 : ok=2 changed=0 unreachable=0 failed=0 ', u'controller-2 : ok=2 changed=0 unreachable=0 failed=0 ', u'undercloud : ok=40 changed=18 unreachable=0 failed=0 ', u'', u'Monday 17 December 2018 08:20:23 -0500 (0:00:00.024) 0:12:33.132 ******* ', u'=============================================================================== '] [u'Updated nodes - all'] Success 2018-12-17 08:20:25.807 531797 INFO tripleoclient.v1.overcloud_external_upgrade.ExternalUpgradeRun [-] Completed Overcloud External Upgrade Run.ESC[00m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045