Description of problem: OSP12 deployment with external ceph fails On a baremetal environment with 3 controllers and 2 computes. (But succeeds without the external ceph) Version-Release number of selected component (if applicable): $ rhos-release -L Installed repositories (rhel-7.4): 12 ceph-2 ceph-osd-2 rhel-7.4 Steps to Reproduce: 1.Deploy OSP12 on baremetal with external ceph 2. 3. Actual results: Deployment fails with the following message : From overcloud_install.log : Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: 548f9fc0-9ee3-4853-b155-80f25f6a93df status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR Additional info: From mistral/ceph-install-workflow.log 2018-03-07 19:11:23,896 p=13129 u=mistral | TASK [ceph-defaults : set_fact docker_exec_cmd] ******************************** 2018-03-07 19:11:23,932 p=13129 u=mistral | fatal: [192.168.24.11]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:23,963 p=13129 u=mistral | fatal: [192.168.24.8]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:23,992 p=13129 u=mistral | fatal: [192.168.24.12]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,025 p=13129 u=mistral | fatal: [192.168.24.15]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,039 p=13129 u=mistral | fatal: [192.168.24.7]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,041 p=13129 u=mistral | PLAY RECAP ********************************************************************* $ mistral execution-list |grep -v SUCCESS +--------------------------------------+--------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+ | ID | Workflow ID | Workflow name | Description | Task Execution ID | State | State info | Created at | Updated at | +--------------------------------------+--------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+ | 0a25644c-3384-41bc-b49e-eebe0a3a8b9d | 3258fa8f-fa27-4f07-bcbc-01c68ffb28d8 | tripleo.baremetal.v1.cellv2_discovery | sub-workflow execution | 4b8fd697-d07b-4be5-af65-bec9267a775a | ERROR | None | 2018-03-07 15:11:39 | 2018-03-07 15:11:49 | | 548f9fc0-9ee3-4853-b155-80f25f6a93df | 9a67b540-1761-493d-93a5-79a6fe4fcb2a | tripleo.overcloud.workflow_tasks.step2 | Heat managed | <none> | ERROR | Failure caused by error i... | 2018-03-07 17:09:25 | 2018-03-07 17:11:27 | | 7e715b7f-4bba-4e77-8b72-260b7517b5ee | 799d3307-76a5-4982-9568-d9cba8fde8cb | tripleo.storage.v1.ceph-install | sub-workflow execution | 27d1e34d-03c0-4354-a5ac-8558c66f331e | ERROR | Failure caused by error i... | 2018-03-07 17:09:26 | 2018-03-07 17:11:25 | SOSreports from the undercloud,controller,compute and deployment files used are all there in the link : https://drive.google.com/drive/folders/14SGfFF9NDVGxEB7CIuYX2BWwkmwEdnm3?usp=sharing
Your overcloud controller node seems to have more than ceph problems, e.g. all of the containers are down except memcahe [0]. As per ceph-install-workflow.log [1], the deployment failed on the following (ceph-ansible-3.0.26-1.el7cp confirmed) : https://github.com/ceph/ceph-ansible/blob/v3.0.26/roles/ceph-defaults/tasks/facts.yml#L14-L19 It may be that the following ansible variable didn't return: hostvars[groups[mon_group_name][0]]['ansible_hostname'] [fultonj@skagra sosreport-pkomarov-20180308090054]$ cat hostname controller-0 [fultonj@skagra sosreport-pkomarov-20180308090054]$ Please re-run deployment but add -e debug.yaml to your 'openstack overcloud deploy ... -e debug.yaml' where debug.yaml contains the following: parameter_defaults: CephAnsiblePlaybookVerbosity: 3 then, after the deployment runs, update this bugzilla with: - /var/log/mistral/ceph-install-workflow.log from your undercloud - A tarball containing /tmp/ansible-mistral-action* from your undercloud - the exact 'openstack overcloud deploy ...' command you ran - the output of `ansible -m setup localhost` when run on your overcloud controller Thanks, John [0] All containers died on overcloud controller node except memcache: [fultonj@skagra docker]$ cat docker_ps_-a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 77a8fd42873d 192.168.24.1:8787/rhosp12/openstack-mariadb:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago mysql_image_tag 35eb34d3a61f 192.168.24.1:8787/rhosp12/openstack-memcached:2018-02-27.4 "/bin/bash -c 'source" 15 hours ago Up 15 hours memcached 2d7ddbf5f9bd 192.168.24.1:8787/rhosp12/openstack-haproxy:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago haproxy_image_tag 05936166dd67 192.168.24.1:8787/rhosp12/openstack-mariadb:2018-02-27.4 "bash -ecx 'if [ -e /" 15 hours ago Exited (0) 15 hours ago mysql_bootstrap 81a520488572 192.168.24.1:8787/rhosp12/openstack-redis:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago redis_image_tag 1783e86df69e 192.168.24.1:8787/rhosp12/openstack-rabbitmq:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago rabbitmq_image_tag 1bfc375b1c1b 192.168.24.1:8787/rhosp12/openstack-rabbitmq:2018-02-27.4 "kolla_start" 15 hours ago Exited (0) 15 hours ago rabbitmq_bootstrap d0aa0a689d67 192.168.24.1:8787/rhosp12/openstack-memcached:2018-02-27.4 "/bin/bash -c 'source" 15 hours ago Exited (0) 15 hours ago memcached_init_logs d77cec9718ef 192.168.24.1:8787/rhosp12/openstack-mariadb:2018-02-27.4 "chown -R mysql: /var" 15 hours ago Exited (0) 15 hours ago mysql_data_ownership [fultonj@skagra docker]$ [1] [fultonj@skagra mistral]$ tail -30 ceph-install-workflow.log 2018-03-07 19:11:23,334 p=13129 u=mistral | TASK [ceph-defaults : remove ceph nfs ganesha socket if exists and not used by a process] *** 2018-03-07 19:11:23,360 p=13129 u=mistral | skipping: [192.168.24.11] 2018-03-07 19:11:23,405 p=13129 u=mistral | skipping: [192.168.24.8] 2018-03-07 19:11:23,428 p=13129 u=mistral | skipping: [192.168.24.12] 2018-03-07 19:11:23,429 p=13129 u=mistral | skipping: [192.168.24.15] 2018-03-07 19:11:23,446 p=13129 u=mistral | skipping: [192.168.24.7] 2018-03-07 19:11:23,478 p=13129 u=mistral | TASK [ceph-defaults : set_fact monitor_name ansible_hostname] ****************** 2018-03-07 19:11:23,654 p=13129 u=mistral | ok: [192.168.24.11] 2018-03-07 19:11:23,683 p=13129 u=mistral | ok: [192.168.24.8] 2018-03-07 19:11:23,705 p=13129 u=mistral | ok: [192.168.24.12] 2018-03-07 19:11:23,736 p=13129 u=mistral | ok: [192.168.24.15] 2018-03-07 19:11:23,751 p=13129 u=mistral | ok: [192.168.24.7] 2018-03-07 19:11:23,767 p=13129 u=mistral | TASK [ceph-defaults : set_fact monitor_name ansible_fqdn] ********************** 2018-03-07 19:11:23,793 p=13129 u=mistral | skipping: [192.168.24.11] 2018-03-07 19:11:23,815 p=13129 u=mistral | skipping: [192.168.24.8] 2018-03-07 19:11:23,836 p=13129 u=mistral | skipping: [192.168.24.12] 2018-03-07 19:11:23,859 p=13129 u=mistral | skipping: [192.168.24.15] 2018-03-07 19:11:23,872 p=13129 u=mistral | skipping: [192.168.24.7] 2018-03-07 19:11:23,896 p=13129 u=mistral | TASK [ceph-defaults : set_fact docker_exec_cmd] ******************************** 2018-03-07 19:11:23,932 p=13129 u=mistral | fatal: [192.168.24.11]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:23,963 p=13129 u=mistral | fatal: [192.168.24.8]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:23,992 p=13129 u=mistral | fatal: [192.168.24.12]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,025 p=13129 u=mistral | fatal: [192.168.24.15]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,039 p=13129 u=mistral | fatal: [192.168.24.7]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,041 p=13129 u=mistral | PLAY RECAP ********************************************************************* 2018-03-07 19:11:24,041 p=13129 u=mistral | 192.168.24.11 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,041 p=13129 u=mistral | 192.168.24.12 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,041 p=13129 u=mistral | 192.168.24.15 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,042 p=13129 u=mistral | 192.168.24.7 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,042 p=13129 u=mistral | 192.168.24.8 : ok=2 changed=0 unreachable=0 failed=1 [fultonj@skagra mistral]$
I have not received the needinfo requested two weeks ago. It looks like a local environment issue but I asked for that info to be sure. Closing for now. Re-open if you have requested data or can reproduce and provide requested data.
*** This bug has been marked as a duplicate of bug 1552327 ***
Until a version of ceph-ansible > 3.0.29 becomes available, the workaround is to deploy using environments/puppet-ceph-external.yaml