Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1769719

Summary: remove ceph-ansible fetch directory management
Product: Red Hat OpenStack Reporter: Federico Iezzi <fiezzi>
Component: openstack-tripleo-heat-templatesAssignee: John Fulton <johfulto>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: low Docs Contact:
Priority: low    
Version: 15.0 (Stein)CC: chrisbro, fpantano, gfidente, johfulto, mburns
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200428015016.d5442cd.el8ost.noarch.rpm, tripleo-ansible-0.5.1-0.20200425093424.add11b7.el8ost.noarch.rpm Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1809602 (view as bug list) Environment:
Last Closed: 2020-05-14 12:15:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1622688    
Bug Blocks: 1809602    

Description Federico Iezzi 2019-11-07 09:56:19 UTC
Description of problem:

In a context when Ceph-Ansible is executed first by Mistral and then by Config-Download without a Mistral workflow, the later fails due to permission denied on the originally fetch director tarball.

See the following extract

############
TASK [attempt download of fetch directory tarball from swift backup] ***************************************************************************************************************************************************
[WARNING]: Consider using the get_url or uri module rather than running 'curl'.  If you need to use command because get_url or uri is insufficient you can add 'warn: false' to this command task or set
'command_warnings=False' in ansible.cfg to get rid of this message.

fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "curl -s -o /tmp/temporary_dir_old.tar.gz -w '%{http_code}' -X GET \"https://192.168.111.2:13808/v1/AUTH_459a22f0d4d44957a587c514db597ff6/edge0-compute_ceph_ansible_fetch_dir/temporary_dir.tar.gz?temp_url_sig=357745cf28f026deb6c36f5cb94065bce6ed4f40&temp_url_expires=1573201513\"", "delta": "0:00:00.066594", "end": "2019-11-07 03:35:48.000145", "msg": "non-zero return code", "rc": 23, "start": "2019-11-07 03:35:47.933551", "stderr": "", "stderr_lines": [], "stdout": "200", "stdout_lines": ["200"]}
...ignoring

TASK [ensure we create a new fetch_directory or use the old fetch_directory] *******************************************************************************************************************************************
skipping: [undercloud]

TASK [unpack downloaded ceph-ansible fetch tarball to fetch directory] *************************************************************************************************************************************************
fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "/usr/bin/gtar --gzip --extract --file /tmp/temporary_dir_old.tar.gz -C /home/stack/config-download/ceph-ansible/fetch_dir", "delta": "0:00:00.009019", "end": "2019-11-07 03:35:48.462157", "msg": "non-zero return code", "rc": 2, "start": "2019-11-07 03:35:48.453138", "stderr": "\ngzip: stdin: not in gzip format\n/usr/bin/gtar: Child returned status 1\n/usr/bin/gtar: Error is not recoverable: exiting now", "stderr_lines": ["", "gzip: stdin: not in gzip format", "/usr/bin/gtar: Child returned status 1", "/usr/bin/gtar: Error is not recoverable: exiting now"], "stdout": "", "stdout_lines": []}
############

As per the above log, CURL returns exit status 23 which stands for “Write error. Curl couldn't write data to a local filesystem or similar.”


Version-Release number of selected component (if applicable):
OSP15

How reproducible:
 - Deploy an OSP15 executing config-download by Mistral
 - Re-run Config Download using straight Ansible [1]

[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/ansible_config_download.html#manual-config-download

Comment 2 John Fulton 2019-11-13 14:23:40 UTC
This will be fixed by removing the fetch directory management code after we verify 1622688

Comment 3 John Fulton 2020-02-18 22:39:32 UTC
As per bz 1622688 the fetch directory is no longer necessary to replace a ceph monitor on any controller node. 
The fixed in of bz 1622688 is ceph-ansible-4.0.7-1.el8cp.
The released ceph-ansible 4 versions containing the fix are 4.0.14-1.el8cp, 4.0.14-1.el7cp (the fix is NOT in 4.0.0-0.1.rc9.el8cp)
OSP15/16 customers should use 4.0.14-1.el8cp or newer. 

We'll track fetch-directory management code removal in TripleO upstream with:

 https://bugs.launchpad.net/tripleo/+bug/1863809

Comment 8 John Fulton 2020-03-01 17:01:29 UTC
- Patches in master landed
- now for this bug, targeted at 16.1, to deliver the fix the master patches which have since been cherry picked into train need to merge before this can go into post

https://review.opendev.org/#/q/topic:no_fetch_dir

Comment 9 John Fulton 2020-03-03 13:19:43 UTC
The bug will be fixed in a 16 update as tracked by 1769719.
The bug will be fixed in a 15 update as tracked by 1809602.

Comment 10 John Fulton 2020-03-10 17:23:15 UTC
*** Bug 1613742 has been marked as a duplicate of this bug. ***

Comment 11 John Fulton 2020-05-11 14:21:30 UTC
*** Bug 1831379 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2020-05-14 12:15:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2114