Bug 1642462 - [UPGRADES][14] Failed to upgrade ceph: SwiftFetchDirGetTempurl is not set
Summary: [UPGRADES][14] Failed to upgrade ceph: SwiftFetchDirGetTempurl is not set
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 14.0 (Rocky)
Assignee: John Fulton
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-24 13:09 UTC by Yurii Prokulevych
Modified: 2023-02-22 23:02 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-common-9.4.1-0.20181012010873.67bab16.el7ost
Doc Type: If docs needed, set a value
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2019-01-11 11:54:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1799945 0 None None None 2018-10-25 12:30:26 UTC
OpenStack gerrit 613373 0 None MERGED Run Mistral workflow to make temporary Swift URLs on upgrade 2020-05-04 21:15:31 UTC
OpenStack gerrit 614801 0 None MERGED Run Mistral workflow to make temporary Swift URLs on upgrade 2020-05-04 21:15:31 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:54:32 UTC

Description Yurii Prokulevych 2018-10-24 13:09:59 UTC
Description of problem:
-----------------------
Upgrade of ceph cluster failed:

openstack overcloud external-upgrade run \
    --stack qe-Cloud-0 \
    --tags ceph 2>&1
...
 u'TASK [set facts for swift back up of ceph-ansible fetch directory] *************',
 u'Wednesday 24 October 2018  08:15:57 -0400 (0:00:00.044)       0:00:35.267 ***** ',
 u'ok: [undercloud] => {"ansible_facts": {"new_ceph_ansible_tarball_name": "temporary_dir_new.tar.gz", "old_ceph_ansible_tarball_name": "temporary_dir_old.tar.gz", "swift_get_url": "", "swift_put_url": ""}, "cha
nged": false}',
 u'',
 u'TASK [attempt download of fetch directory tarball from swift backup] ***********',
 u'Wednesday 24 October 2018  08:15:57 -0400 (0:00:00.066)       0:00:35.333 ***** ',
 u' [WARNING]: Consider using the get_url or uri module rather than running curl.',
 u'If you need to use command because get_url or uri is insufficient you can add',
 u'warn=False to this command task or set command_warnings=False in ansible.cfg to',
 u'get rid of this message.',
 u'fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "curl -s -o /tmp/temporary_dir_old.tar.gz -w \'%{http_code}\' -X GET \\"\\"", "delta": "0:00:00.712603", "end": "2018-10-24 08:15:58.510813", "msg": "n
on-zero return code", "rc": 3, "start": "2018-10-24 08:15:57.798210", "stderr": "", "stderr_lines": [], "stdout": "000", "stdout_lines": ["000"]}',
 u'...ignoring',
 u'',
 u'TASK [ensure we create a new fetch_directory or use the old fetch_directory] ***',
 u'Wednesday 24 October 2018  08:15:58 -0400 (0:00:00.923)       0:00:36.257 ***** ',
 u'fatal: [undercloud]: FAILED! => {"changed": false, "msg": "Received HTTP: 000 when attempting to GET from "}',
 u'',
 u'NO MORE HOSTS LEFT *************************************************************',
 u'',
 u'PLAY RECAP *********************************************************************',
 u'ceph-0                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'ceph-1                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'ceph-2                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'compute-0                  : ok=2    changed=0    unreachable=0    failed=0   ',
 u'compute-1                  : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-0               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-1               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-2               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'undercloud                 : ok=32   changed=12   unreachable=0    failed=1   ',
 u'',
 u'Wednesday 24 October 2018  08:15:58 -0400 (0:00:00.052)       0:00:36.310 ***** ',
 u'=============================================================================== ']



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-9.0.1-0.20181013060859.ffbe879.el7ost.noarch
ceph-ansible-3.1.5-1.el7cp.noarch
python-tripleoclient-10.6.1-0.20181010222401.8c8f259.el7ost.noarch

Steps to Reproduce:
-------------------
1. Upgrade UC to RHOS-14
2. Upgrade all the overcloud nodes
3. Try to perform ceph ugprade

Actual results:
---------------
Ceph upgrade failed due to unset variable

Expected results:
-----------------
Ceph upgrade succeeds

Additional info:
----------------
Virtual environment: 3controllers + 2computes + 3ceph

Comment 1 John Fulton 2018-10-24 20:03:03 UTC
Two problems:

A. The workflow to create the SwiftFetchDirGetTempurl [1] didn't run [2]

B. Even if you run the workflow manually to generate the SwiftFetchDirGetTempurl [3], you need to do a stack update in order for the SwiftFetchDirGetTempurl to be available in the ansible playbook in order for it to be get_param'd [4] (it's in the deployment plan [5] but not the heat stack)

[1] https://review.openstack.org/#/c/597221/8/workbooks/plan_management.yaml
[2] http://paste.openstack.org/show/732982/
[3] http://paste.openstack.org/show/732978/ (workaround attempt)
[4] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/ceph-ansible/ceph-base.yaml#L502
[5] http://paste.openstack.org/show/732981/

Comment 2 Jiri Stransky 2018-10-25 10:54:36 UTC
Regarding problem B, we should be good on that aspect because after the plan is updated, we run another workflow to update the stack outputs:

https://github.com/openstack/tripleo-common/blob/7ff0d42c001e028f14b4d57a6471b3841830dbc5/workbooks/package_update.yaml#L8-L57

Comment 4 John Fulton 2018-10-26 16:25:30 UTC
Testing indicates that the patch achieved the desired effect in that it caused the workflow to be executed during upgrade. However, the workflow itself failed [1] [2] when it called the rename workflow [3].

[1] http://ix.io/1q5s 
[2] http://paste.openstack.org/show/733147/
[3] https://github.com/openstack/tripleo-common/commit/9cb8175139cfe29e83a9273705de9be297414a7d

Comment 12 Yogev Rabl 2018-12-13 14:39:29 UTC
Verified

Comment 15 errata-xmlrpc 2019-01-11 11:54:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.