1642462 – [UPGRADES][14] Failed to upgrade ceph: SwiftFetchDirGetTempurl is not set

Bug 1642462 - [UPGRADES][14] Failed to upgrade ceph: SwiftFetchDirGetTempurl is not set

Summary: [UPGRADES][14] Failed to upgrade ceph: SwiftFetchDirGetTempurl is not set

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-common
Sub Component:
Version:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	beta
Target Release:	14.0 (Rocky)
Assignee:	John Fulton
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-24 13:09 UTC by Yurii Prokulevych
Modified:	2023-02-22 23:02 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-tripleo-common-9.4.1-0.20181012010873.67bab16.el7ost
Doc Type:	If docs needed, set a value
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2019-01-11 11:54:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1799945	None	None	None	2018-10-25 12:30:26 UTC
OpenStack gerrit	613373	None	MERGED	Run Mistral workflow to make temporary Swift URLs on upgrade	2020-05-04 21:15:31 UTC
OpenStack gerrit	614801	None	MERGED	Run Mistral workflow to make temporary Swift URLs on upgrade	2020-05-04 21:15:31 UTC
Red Hat Product Errata	RHEA-2019:0045	None	None	None	2019-01-11 11:54:32 UTC

Description Yurii Prokulevych 2018-10-24 13:09:59 UTC

Description of problem:
-----------------------
Upgrade of ceph cluster failed:

openstack overcloud external-upgrade run \
    --stack qe-Cloud-0 \
    --tags ceph 2>&1
...
 u'TASK [set facts for swift back up of ceph-ansible fetch directory] *************',
 u'Wednesday 24 October 2018  08:15:57 -0400 (0:00:00.044)       0:00:35.267 ***** ',
 u'ok: [undercloud] => {"ansible_facts": {"new_ceph_ansible_tarball_name": "temporary_dir_new.tar.gz", "old_ceph_ansible_tarball_name": "temporary_dir_old.tar.gz", "swift_get_url": "", "swift_put_url": ""}, "cha
nged": false}',
 u'',
 u'TASK [attempt download of fetch directory tarball from swift backup] ***********',
 u'Wednesday 24 October 2018  08:15:57 -0400 (0:00:00.066)       0:00:35.333 ***** ',
 u' [WARNING]: Consider using the get_url or uri module rather than running curl.',
 u'If you need to use command because get_url or uri is insufficient you can add',
 u'warn=False to this command task or set command_warnings=False in ansible.cfg to',
 u'get rid of this message.',
 u'fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "curl -s -o /tmp/temporary_dir_old.tar.gz -w \'%{http_code}\' -X GET \\"\\"", "delta": "0:00:00.712603", "end": "2018-10-24 08:15:58.510813", "msg": "n
on-zero return code", "rc": 3, "start": "2018-10-24 08:15:57.798210", "stderr": "", "stderr_lines": [], "stdout": "000", "stdout_lines": ["000"]}',
 u'...ignoring',
 u'',
 u'TASK [ensure we create a new fetch_directory or use the old fetch_directory] ***',
 u'Wednesday 24 October 2018  08:15:58 -0400 (0:00:00.923)       0:00:36.257 ***** ',
 u'fatal: [undercloud]: FAILED! => {"changed": false, "msg": "Received HTTP: 000 when attempting to GET from "}',
 u'',
 u'NO MORE HOSTS LEFT *************************************************************',
 u'',
 u'PLAY RECAP *********************************************************************',
 u'ceph-0                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'ceph-1                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'ceph-2                     : ok=2    changed=0    unreachable=0    failed=0   ',
 u'compute-0                  : ok=2    changed=0    unreachable=0    failed=0   ',
 u'compute-1                  : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-0               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-1               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'controller-2               : ok=2    changed=0    unreachable=0    failed=0   ',
 u'undercloud                 : ok=32   changed=12   unreachable=0    failed=1   ',
 u'',
 u'Wednesday 24 October 2018  08:15:58 -0400 (0:00:00.052)       0:00:36.310 ***** ',
 u'=============================================================================== ']



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-9.0.1-0.20181013060859.ffbe879.el7ost.noarch
ceph-ansible-3.1.5-1.el7cp.noarch
python-tripleoclient-10.6.1-0.20181010222401.8c8f259.el7ost.noarch

Steps to Reproduce:
-------------------
1. Upgrade UC to RHOS-14
2. Upgrade all the overcloud nodes
3. Try to perform ceph ugprade

Actual results:
---------------
Ceph upgrade failed due to unset variable

Expected results:
-----------------
Ceph upgrade succeeds

Additional info:
----------------
Virtual environment: 3controllers + 2computes + 3ceph

Comment 1 John Fulton 2018-10-24 20:03:03 UTC

Two problems:

A. The workflow to create the SwiftFetchDirGetTempurl [1] didn't run [2]

B. Even if you run the workflow manually to generate the SwiftFetchDirGetTempurl [3], you need to do a stack update in order for the SwiftFetchDirGetTempurl to be available in the ansible playbook in order for it to be get_param'd [4] (it's in the deployment plan [5] but not the heat stack)

[1] https://review.openstack.org/#/c/597221/8/workbooks/plan_management.yaml
[2] http://paste.openstack.org/show/732982/
[3] http://paste.openstack.org/show/732978/ (workaround attempt)
[4] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/ceph-ansible/ceph-base.yaml#L502
[5] http://paste.openstack.org/show/732981/

Comment 2 Jiri Stransky 2018-10-25 10:54:36 UTC

Regarding problem B, we should be good on that aspect because after the plan is updated, we run another workflow to update the stack outputs:

https://github.com/openstack/tripleo-common/blob/7ff0d42c001e028f14b4d57a6471b3841830dbc5/workbooks/package_update.yaml#L8-L57

Comment 4 John Fulton 2018-10-26 16:25:30 UTC

Testing indicates that the patch achieved the desired effect in that it caused the workflow to be executed during upgrade. However, the workflow itself failed [1] [2] when it called the rename workflow [3].

[1] http://ix.io/1q5s 
[2] http://paste.openstack.org/show/733147/
[3] https://github.com/openstack/tripleo-common/commit/9cb8175139cfe29e83a9273705de9be297414a7d

Comment 12 Yogev Rabl 2018-12-13 14:39:29 UTC

Verified

Comment 15 errata-xmlrpc 2019-01-11 11:54:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045

Note You need to log in before you can comment on or make changes to this bug.