Description of problem: ----------------------- Upgrade of ceph cluster failed: openstack overcloud external-upgrade run \ --stack qe-Cloud-0 \ --tags ceph 2>&1 ... u'TASK [set facts for swift back up of ceph-ansible fetch directory] *************', u'Wednesday 24 October 2018 08:15:57 -0400 (0:00:00.044) 0:00:35.267 ***** ', u'ok: [undercloud] => {"ansible_facts": {"new_ceph_ansible_tarball_name": "temporary_dir_new.tar.gz", "old_ceph_ansible_tarball_name": "temporary_dir_old.tar.gz", "swift_get_url": "", "swift_put_url": ""}, "cha nged": false}', u'', u'TASK [attempt download of fetch directory tarball from swift backup] ***********', u'Wednesday 24 October 2018 08:15:57 -0400 (0:00:00.066) 0:00:35.333 ***** ', u' [WARNING]: Consider using the get_url or uri module rather than running curl.', u'If you need to use command because get_url or uri is insufficient you can add', u'warn=False to this command task or set command_warnings=False in ansible.cfg to', u'get rid of this message.', u'fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "curl -s -o /tmp/temporary_dir_old.tar.gz -w \'%{http_code}\' -X GET \\"\\"", "delta": "0:00:00.712603", "end": "2018-10-24 08:15:58.510813", "msg": "n on-zero return code", "rc": 3, "start": "2018-10-24 08:15:57.798210", "stderr": "", "stderr_lines": [], "stdout": "000", "stdout_lines": ["000"]}', u'...ignoring', u'', u'TASK [ensure we create a new fetch_directory or use the old fetch_directory] ***', u'Wednesday 24 October 2018 08:15:58 -0400 (0:00:00.923) 0:00:36.257 ***** ', u'fatal: [undercloud]: FAILED! => {"changed": false, "msg": "Received HTTP: 000 when attempting to GET from "}', u'', u'NO MORE HOSTS LEFT *************************************************************', u'', u'PLAY RECAP *********************************************************************', u'ceph-0 : ok=2 changed=0 unreachable=0 failed=0 ', u'ceph-1 : ok=2 changed=0 unreachable=0 failed=0 ', u'ceph-2 : ok=2 changed=0 unreachable=0 failed=0 ', u'compute-0 : ok=2 changed=0 unreachable=0 failed=0 ', u'compute-1 : ok=2 changed=0 unreachable=0 failed=0 ', u'controller-0 : ok=2 changed=0 unreachable=0 failed=0 ', u'controller-1 : ok=2 changed=0 unreachable=0 failed=0 ', u'controller-2 : ok=2 changed=0 unreachable=0 failed=0 ', u'undercloud : ok=32 changed=12 unreachable=0 failed=1 ', u'', u'Wednesday 24 October 2018 08:15:58 -0400 (0:00:00.052) 0:00:36.310 ***** ', u'=============================================================================== '] Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-tripleo-heat-templates-9.0.1-0.20181013060859.ffbe879.el7ost.noarch ceph-ansible-3.1.5-1.el7cp.noarch python-tripleoclient-10.6.1-0.20181010222401.8c8f259.el7ost.noarch Steps to Reproduce: ------------------- 1. Upgrade UC to RHOS-14 2. Upgrade all the overcloud nodes 3. Try to perform ceph ugprade Actual results: --------------- Ceph upgrade failed due to unset variable Expected results: ----------------- Ceph upgrade succeeds Additional info: ---------------- Virtual environment: 3controllers + 2computes + 3ceph
Two problems: A. The workflow to create the SwiftFetchDirGetTempurl [1] didn't run [2] B. Even if you run the workflow manually to generate the SwiftFetchDirGetTempurl [3], you need to do a stack update in order for the SwiftFetchDirGetTempurl to be available in the ansible playbook in order for it to be get_param'd [4] (it's in the deployment plan [5] but not the heat stack) [1] https://review.openstack.org/#/c/597221/8/workbooks/plan_management.yaml [2] http://paste.openstack.org/show/732982/ [3] http://paste.openstack.org/show/732978/ (workaround attempt) [4] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/ceph-ansible/ceph-base.yaml#L502 [5] http://paste.openstack.org/show/732981/
Regarding problem B, we should be good on that aspect because after the plan is updated, we run another workflow to update the stack outputs: https://github.com/openstack/tripleo-common/blob/7ff0d42c001e028f14b4d57a6471b3841830dbc5/workbooks/package_update.yaml#L8-L57
Testing indicates that the patch achieved the desired effect in that it caused the workflow to be executed during upgrade. However, the workflow itself failed [1] [2] when it called the rename workflow [3]. [1] http://ix.io/1q5s [2] http://paste.openstack.org/show/733147/ [3] https://github.com/openstack/tripleo-common/commit/9cb8175139cfe29e83a9273705de9be297414a7d
Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045