Created attachment 1553598 [details] ansible.log Description of problem: During OSP14 deployment, mistral fails during ceph-ansible execution on task ... "TASK [create temporary ceph-ansible fetch directory tarball for swift backup]" fatal: [undercloud]: FAILED! => {"changed": false, "expanded_paths": "", "msg": "Error, no source paths were found", "path": "/var/lib/mistral/overcloud/ceph-ansible/fetch_dir/*", "state": "absent"} which makes whole overcloud fail to deploy. # ansible.log attached Version-Release number of selected component (if applicable): OSP14, happening since puddle 2019-04-05.1 only on ceph-external topology How reproducible: 100% Steps to Reproduce: 1. Deploy OSP14 using InfraRed, topology including external ceph instance as storage (-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible-external.yaml included in overcloud deployment script, also attached privately) 2. Overcloud deployment fails Additional info: # docker ab7023b6fa20 192.168.24.1:8787/rhosp14/openstack-mistral-executor:2019-03-28.1 "kolla_start" 58 minutes ago Up 58 minutes (healthy) mistral_executor # undercloud package ceph-ansible.noarch 3.2.8-1.el7cp @rhelosp-ceph-3-tools
In this job, ansible 2.6.11 and THT 9.3.1 [0] are backing up the fetch directory in Swift. The task "create temporary ceph-ansible fetch directory tarball for swift backup" [1] is using the ansible archive module [2] and it looks like it fails to expand the path "{{playbook_dir}}/ceph-ansible/fetch_dir/*" [3] as per the logs [4]. For context, note this is the second run through of the job but I think that can be explained by tripleo steps. I assume that the fetch directory isn't being restored and that instead this is a new deployment. Perhaps this was introduced by the following: https://github.com/openstack/tripleo-heat-templates/commit/5baa88d94e25f887da7a2f7f8103d52795b340ec which passed in CI upstream, but note that upstream CI uses standalone which uses the local fetch directory and not the swift fetch directory so that might be the reason we're' seeing this in the new puddle. [0] THT Version: openstack-tripleo-heat-templates 9.3.1-0.20190314162753.d0a6cb1.el7ost [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/rocky/docker/services/ceph-ansible/ceph-base.yaml#L591 [2] https://docs.ansible.com/ansible/2.6/modules/archive_module.html [3] https://github.com/openstack/tripleo-heat-templates/blob/stable/rocky/docker/services/ceph-ansible/ceph-base.yaml#L593 [4] 2019-04-08 04:01:35,889 p=1075 u=mistral | TASK [create ceph-ansible fetch directory tarball in local backup] ************* 2019-04-08 04:01:35,889 p=1075 u=mistral | Monday 08 April 2019 04:01:35 -0400 (0:00:00.062) 0:16:28.246 ********** 2019-04-08 04:01:35,911 p=1075 u=mistral | skipping: [undercloud] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-04-08 04:01:35,941 p=1075 u=mistral | TASK [create temporary ceph-ansible fetch directory tarball for swift backup] *** 2019-04-08 04:01:35,942 p=1075 u=mistral | Monday 08 April 2019 04:01:35 -0400 (0:00:00.052) 0:16:28.299 ********** 2019-04-08 04:01:36,274 p=1075 u=mistral | fatal: [undercloud]: FAILED! => {"changed": false, "expanded_paths": "", "msg": "Error, no source paths were found", "path": "/var/lib/mistral/overcloud/ceph-ansible/fetch_dir/*", "state": "absent"} [5] [fultonj@skagra down]$ grep -A 3 "create temporary ceph-ansible fetch directory tarball for swift backup" ansible.log 2019-04-08 03:49:23,451 p=1075 u=mistral | TASK [create temporary ceph-ansible fetch directory tarball for swift backup] *** 2019-04-08 03:49:23,451 p=1075 u=mistral | Monday 08 April 2019 03:49:23 -0400 (0:00:00.044) 0:04:15.808 ********** 2019-04-08 03:49:23,468 p=1075 u=mistral | skipping: [undercloud] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-04-08 03:49:23,496 p=1075 u=mistral | TASK [backup temporary ceph-ansible fetch directory tarball in swift] ********** -- 2019-04-08 04:01:35,941 p=1075 u=mistral | TASK [create temporary ceph-ansible fetch directory tarball for swift backup] *** 2019-04-08 04:01:35,942 p=1075 u=mistral | Monday 08 April 2019 04:01:35 -0400 (0:00:00.052) 0:16:28.299 ********** 2019-04-08 04:01:36,274 p=1075 u=mistral | fatal: [undercloud]: FAILED! => {"changed": false, "expanded_paths": "", "msg": "Error, no source paths were found", "path": "/var/lib/mistral/overcloud/ceph-ansible/fetch_dir/*", "state": "absent"} 2019-04-08 04:01:36,275 p=1075 u=mistral | NO MORE HOSTS LEFT ************************************************************* [fultonj@skagra down]$
The '*' on line 593 docker/services/ceph-ansible/ceph-base.yaml introduced by [1] hit line 229 of ansible/lib/ansible/modules/files/archive.py [2]. The array is empty so it raises an exception. However, it's OK with us for the array to empty. The '*' was added in [1] to create a tar.gz containing all the fetch_dir content without including the parent directory in the resulting path. This worked as planned, but we just need to not get into a situation where it tries to archive an empty directory. Lines 589 and 614 [3] just need to get an additional AND where neither task runs if that directory is empty. [1] https://github.com/openstack/tripleo-heat-templates/commit/5baa88d94e25f887da7a2f7f8103d52795b340ec [2] https://github.com/ansible/ansible/blob/v2.6.11/lib/ansible/modules/files/archive.py#L229 [3] https://github.com/openstack/tripleo-heat-templates/blob/5baa88d94e25f887da7a2f7f8103d52795b340ec/docker/services/ceph-ansible/ceph-base.yaml#L589
Verified by CI, openstack-tripleo-heat-templates-9.3.1-0.20190314162756.d0a6cb1.el7ost is present on UC and ceph-external job now passes as expected. Links attached.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0878