Bug 1631382 - [UPGRADES][14] external-upgrade failed during ceph upgrade
Summary: [UPGRADES][14] external-upgrade failed during ceph upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Jiri Stransky
QA Contact: Yurii Prokulevych
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-20 13:00 UTC by Yurii Prokulevych
Modified: 2023-02-22 23:02 UTC (History)
11 users (show)

Fixed In Version: python-tripleoclient-10.6.1-0.20180929200237.1d8dcb6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-11 11:53:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1795689 0 None None None 2018-10-02 15:32:26 UTC
OpenStack gerrit 607588 0 None None None 2018-10-05 10:32:14 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:53:20 UTC

Description Yurii Prokulevych 2018-09-20 13:00:21 UTC
Description of problem:
-----------------------
Running next command failed:
openstack overcloud external-upgrade run \
    --stack QE-Cloud-0 2>&1 
...
ERROR openstack [-] Update failed with: {u'status': u'RUNNING', u'message': u'ason\\": \\"Conditional result was False\\"}", "", "TASK [ceph-osd : include common.yml]
...(log of command is attached with sosreports)


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
python-tripleoclient-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch
python-tripleoclient-heat-installer-10.5.1-0.20180901082351.6d7aa74.el7ost.noarch

Steps to Reproduce:
-------------------
1. Upgrade UC to RHOS-14
2. Upgrade OC to RHOS-14
3. Start upgrade of ceph
    openstack overcloud external-upgrade run

Actual results:
---------------
Upgrade failed with no obvious reason logged

Comment 2 Jiri Stransky 2018-10-01 12:49:12 UTC
Based on our earlier investigation it seemed like ceph-ansible finished fine, it was just the CLI command that failed.

It should be fine to ignore the failure to continue with the upgrade procedure, but this needs fixing before release. I'll triage this to high/high, but i keep the blocker flag.

Comment 4 Jiri Stransky 2018-10-02 14:22:14 UTC
Yuri is this reproducible repeatedly? I ran the external-upgrade command and can't reproduce the error, the command itself seems to work fine for me.

I wonder if it could be that the ceph-ansible output is too big and it breaks the Zaqar message processing somehow. That's the only sensible explanation i can think of. The task "run ceph-anisble" was successful, and then we can see an incomplete output from it, and the CLI command crashes. That looks like nothing was broken in Ansible but something went over limits when communicating the log output perhaps...

I'll add DFG:Ceph too, since we'll likely want to address this in the Ceph composable service Ansible tasks. Cut up the output into smaller chunks somehow perhaps. Looking into this.

Comment 5 Jiri Stransky 2018-10-02 16:18:47 UTC
We still don't know the root cause of this bug with certainty, but if my above guess is correct, we may be able to solve it this way:

https://review.openstack.org/#/c/607302/

Comment 6 Jiri Stransky 2018-10-04 14:13:38 UTC
Merged to master, merging to stable/rocky.

Comment 7 Jiri Stransky 2018-10-04 14:15:11 UTC
I'll cancel needinfo on Yurii. The patch we have now is our best shot anyway :)

Comment 14 Yurii Prokulevych 2018-12-17 14:19:48 UTC
Verified with:
- openstack-tripleo-heat-templates-9.0.1-0.20181013060906.el7ost.noarch
- ceph-ansible-3.1.10-1.el7cp.noarch
- python-tripleoclient-10.6.1-0.20181010222412.8c8f259.el7ost.noarch

openstack overcloud external-upgrade run \
    --stack qe-Cloud-0 \
    --tags ceph 2>&1
...
 u'PLAY RECAP *********************************************************************',
 u'ceph-0                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'ceph-1                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'ceph-2                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'compute-0                  : ok=2    changed=0    unreachable=0    failed=0   ',
 u'compute-1                  : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-0               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-1               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-2               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'undercloud                 : ok=40   changed=18   unreachable=0    failed=0   ',
 u'',
 u'Monday 17 December 2018  08:20:23 -0500 (0:00:00.024)       0:12:33.132 ******* ',
 u'=============================================================================== ']
[u'Updated nodes - all']
Success
2018-12-17 08:20:25.807 531797 INFO tripleoclient.v1.overcloud_external_upgrade.ExternalUpgradeRun [-] Completed Overcloud External Upgrade Run.ESC[00m

Comment 16 errata-xmlrpc 2019-01-11 11:53:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.