rhosp-director: HA overcloud with ceph and IPV6 fails Environment: puppet-ceph-2.3.1-0.20170805094345.868e6d6.el7ost.noarch instack-undercloud-7.2.1-0.20170729010706.el7ost.noarch ceph-ansible-3.0.0-0.1.rc3.el7cp.noarch openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch Steps to reproduce: Attempt to deploy overcloud with IPV6: openstack overcloud deploy --templates \ --libvirt-type kvm \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /home/stack/templates/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \ -e /home/stack/virt/network/network-environment-v6.yaml \ -e /home/stack/rhos12.yaml Result: (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: e652d2b4-acc5-408f-9dbe-3339a5bc98a7 status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR (undercloud) [stack@undercloud-0 ~]$ heat resource-list -n5 overcloud |grep -v COMPLE WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead +----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ | AllNodesDeploySteps | c56e3d5f-0316-4aac-914d-2f2411043482 | OS::TripleO::PostDeploySteps | CREATE_FAILED | 2017-08-23T00:08:51Z | overcloud | | WorkflowTasks_Step2_Execution | e652d2b4-acc5-408f-9dbe-3339a5bc98a7 | OS::Mistral::ExternalResource | CREATE_FAILED | 2017-08-23T00:21:56Z | overcloud-AllNodesDeploySteps-yf77rtka7ojp | +----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
Created attachment 1317142 [details] ansible_run.log Looks like a legit error happening during a task in ceph-ansible when using ipv6 (probably hardcoded references to the ipv4 addressing). I am attaching the ansible execution log.
Fix here: https://github.com/ceph/ceph-ansible/pull/1798
Was able to deploy using the fix in comment #4
Thanks Alex for the quick test. Moving this to POST.
Sébastien would you please tag and announce a new upstream ceph-ansible version that includes this change? Then we can rebase to that downstream.
done: https://github.com/ceph/ceph-ansible/releases/tag/v3.0.0rc5
Ken, looks like this can be moved to ON_QA?
The issue doesn't reproduce for me using ceph-ansible-3.0.0-0.1.rc4.el7cp.noarch I notice that this version of RPM already includes this patch: https://github.com/ceph/ceph-ansible/pull/1798 Successfully deployed and populated OC with IPV6 (docker{,-ha}.yaml are included by default now : openstack overcloud deploy --templates \ --libvirt-type kvm \ -e /home/stack/templates/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \ -e /home/stack/virt/network/network-environment-v6.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/inject-trust-anchor-hiera.yaml \ -e /home/stack/rhos12.yaml Verifying.
I am seeing the same error with ceph-ansible-3.0.0-0.1.rc4.el7cp.noarch. Am I missing something else? ... 2017-10-11 14:02:25Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_COMPLETE state changed 2017-10-11 14:02:25Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS state changed 2017-10-11 14:05:20Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2017-10-11 14:05:21Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR 2017-10-11 14:05:22Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR 2017-10-11 14:05:22Z [overcloud]: CREATE_FAILED Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: 42021a2f-93c5-4643-a620-2dcae0165fe2 status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR $ rpm -qa|grep ansible ansible-2.3.2.0-2.el7.noarch ceph-ansible-3.0.0-0.1.rc4.el7cp.noarch
From the ceph-install-workflow.log ... 2017-10-11 10:05:15,981 p=29317 u=mistral | skipping: [192.168.24.53] 2017-10-11 10:05:15,992 p=29317 u=mistral | TASK [ceph-mon : delete populate-kv-store docker] ****************************** 2017-10-11 10:05:16,006 p=29317 u=mistral | skipping: [192.168.24.53] 2017-10-11 10:05:16,018 p=29317 u=mistral | TASK [ceph-mon : generate systemd unit file] *********************************** 2017-10-11 10:05:16,345 p=29317 u=mistral | fatal: [192.168.24.53]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute u'ipv4'"} 2017-10-11 10:05:16,345 p=29317 u=mistral | RUNNING HANDLER [ceph-defaults : copy mon restart script] ********************** 2017-10-11 10:05:16,346 p=29317 u=mistral | RUNNING HANDLER [ceph-defaults : restart ceph mon daemon(s)] ******************* 2017-10-11 10:05:16,346 p=29317 u=mistral | RUNNING HANDLER [ceph-defaults : copy osd restart script] ********************** 2017-10-11 10:05:16,346 p=29317 u=mistral | RUNNING HANDLER [ceph-defaults : restart containerized ceph osds daemon(s)] **** 2017-10-11 10:05:16,347 p=29317 u=mistral | RUNNING HANDLER [ceph-defaults : restart non-containerized ceph osds daemon(s)] *** 2017-10-11 10:05:16,347 p=29317 u=mistral | RUNNING HANDLER [ceph-defaults : restart ceph mdss] **************************** 2017-10-11 10:05:16,347 p=29317 u=mistral | RUNNING HANDLER [ceph-defaults : restart ceph rgws] **************************** 2017-10-11 10:05:16,348 p=29317 u=mistral | PLAY RECAP ********************************************************************* 2017-10-11 10:05:16,348 p=29317 u=mistral | 192.168.24.53 : ok=27 changed=2 unreachable=0 failed=1 2017-10-11 10:05:16,348 p=29317 u=mistral | 192.168.24.57 : ok=1 changed=0 unreachable=0 failed=0 2017-10-11 10:05:16,348 p=29317 u=mistral | 192.168.24.58 : ok=1 changed=0 unreachable=0 failed=0
Tim, this was fixed in version ceph-ansible-3.0.0-0.1.rc5.el7cp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387