Bug 1648968

Summary:	An empty ceph-ansible fetch directory should not fail a stack update
Product:	Red Hat OpenStack	Reporter:	John Fulton <johfulto>
Component:	openstack-tripleo-common	Assignee:	John Fulton <johfulto>
Status:	CLOSED ERRATA	QA Contact:	Yogev Rabl <yrabl>
Severity:	high	Docs Contact:
Priority:	medium
Version:	13.0 (Queens)	CC:	akaris, gfidente, jdurgin, lhh, mariel, mburns, nlevine, slinaber
Target Milestone:	z4	Keywords:	Triaged, ZStream
Target Release:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tripleo-common-8.6.6-8.el7ost.noarch.rpm	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-01-16 17:55:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description John Fulton 2018-11-12 15:35:26 UTC

If OSP13 has been configured to use an external ceph cluster via puppet-ceph and is then updated so that ceph-ansible is used to configure it as a ceph client, then the deployment will fail as described below:

When enabling ceph-ansible:

    -  OS::TripleO::Services::CephExternal: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-external.yaml
    +  OS::TripleO::Services::CephExternal: /usr/share/openstack-tripleo-heat-templates/docker/services/ceph-ansible/ceph-external.yaml

The stack update fails with:

    2018-11-07 15:59:38Z [overcloud-AllNodesDeploySteps-orpuuizd5xal.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
    2018-11-07 15:59:38Z [overcloud-AllNodesDeploySteps-orpuuizd5xal]: UPDATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
    2018-11-07 15:59:39Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
    2018-11-07 15:59:39Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

     Stack overcloud UPDATE_FAILED

    overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
      resource_type: OS::TripleO::WorkflowSteps
      physical_resource_id: a56a3276-651b-4ba9-80ab-b201549f7101
      status: CREATE_FAILED
      status_reason: |
        resources.WorkflowTasks_Step2_Execution: ERROR
    Heat Stack update failed.
    Heat Stack update failed.

In mistral engine.log I see:

    2018-11-07 10:59:37.243 14181 INFO workflow_trace [req-12be969e-4c9f-48c8-9521-b5f084e72445 c424161fc5c74afdab621c35c5a361f2 dbf7e0522d59438a936b0ccc0527b44a - default default] Workflow 'trip
    leo.overcloud.workflow_tasks.step2' [RUNNING -> ERROR, msg=Failure caused by error in tasks: ceph_base_ansible_workflow

      ceph_base_ansible_workflow [task_ex_id=6391a9bb-2d50-417a-8743-580205f777f3] -> Failure caused by error in tasks: restore_fetch_directory

      restore_fetch_directory [task_ex_id=2c2c1c95-c418-4a0f-8214-45ae392c4b24] -> {msg: 0 objects found in container: overcloud_ceph_ansible_fetch_dir but one object was expected.}
        [action_ex_id=bb238f30-dfa9-4ef1-9919-e55cf74b6771, idx=0]: {u'msg': u'0 objects found in container: overcloud_ceph_ansible_fetch_dir but one object was expected.'}

        [wf_ex_id=f972e6fc-c93d-499d-ac97-bb0975200748, idx=0]: Failure caused by error in tasks: restore_fetch_directory

      restore_fetch_directory [task_ex_id=2c2c1c95-c418-4a0f-8214-45ae392c4b24] -> {msg: 0 objects found in container: overcloud_ceph_ansible_fetch_dir but one object was expected.}
        [action_ex_id=bb238f30-dfa9-4ef1-9919-e55cf74b6771, idx=0]: {u'msg': u'0 objects found in container: overcloud_ceph_ansible_fetch_dir but one object was expected.'}

    ] (execution_id=a56a3276-651b-4ba9-80ab-b201549f7101)

In this case, it is possible to create the fetch directory manually and the deployment should proceed:

 https://access.redhat.com/solutions/3676921

However, this BZ tracks updating OSP13 so that the above procedure is not necessary.

Comment 2 John Fulton 2018-11-14 13:13:51 UTC

(In reply to John Fulton from comment #0)
> If OSP13 has been configured to use an external ceph cluster via puppet-ceph
...
> In this case, it is possible to create the fetch directory manually and the
> deployment should proceed:
> 
>  https://access.redhat.com/solutions/3676921

The above workaround is for an internal ceph deployment and assumes access to the ceph monitor. This bug was reported for an external ceph cluster where the OpenStack admin may not have direct access to the external ceph cluster. In that case, that of an external ceph deployment, it is possible to create the fetch directory manually and the deployment should proceed:

 https://access.redhat.com/solutions/3690471

In short, for the configuration of only ceph clients an empty fetch directory tarball is sufficient so you can create one (in the name format that Mistral wants) with the following and swift upload it to overcloud_ceph_ansible_fetch_dir.

 tar cvfz temporary_dir-$(date +%Y%m%d-%H%M%S).tar.gz --files-from /dev/null

This bug continues to track making the above workaround unnecessary.

Comment 5 John Fulton 2018-11-20 21:51:32 UTC

*** Bug 1645701 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2019-01-16 17:55:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0068