Bug 1553196
Summary: | osp12 deployment with external ceph fails | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | pkomarov |
Component: | ceph | Assignee: | John Fulton <johfulto> |
Status: | CLOSED DUPLICATE | QA Contact: | Yogev Rabl <yrabl> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 12.0 (Pike) | CC: | gfidente, jdurgin, johfulto, lhh, nalmond, nlevine, pkomarov, srevivo |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-08-13 23:09:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
pkomarov
2018-03-08 13:35:21 UTC
Your overcloud controller node seems to have more than ceph problems, e.g. all of the containers are down except memcahe [0]. As per ceph-install-workflow.log [1], the deployment failed on the following (ceph-ansible-3.0.26-1.el7cp confirmed) : https://github.com/ceph/ceph-ansible/blob/v3.0.26/roles/ceph-defaults/tasks/facts.yml#L14-L19 It may be that the following ansible variable didn't return: hostvars[groups[mon_group_name][0]]['ansible_hostname'] [fultonj@skagra sosreport-pkomarov-20180308090054]$ cat hostname controller-0 [fultonj@skagra sosreport-pkomarov-20180308090054]$ Please re-run deployment but add -e debug.yaml to your 'openstack overcloud deploy ... -e debug.yaml' where debug.yaml contains the following: parameter_defaults: CephAnsiblePlaybookVerbosity: 3 then, after the deployment runs, update this bugzilla with: - /var/log/mistral/ceph-install-workflow.log from your undercloud - A tarball containing /tmp/ansible-mistral-action* from your undercloud - the exact 'openstack overcloud deploy ...' command you ran - the output of `ansible -m setup localhost` when run on your overcloud controller Thanks, John [0] All containers died on overcloud controller node except memcache: [fultonj@skagra docker]$ cat docker_ps_-a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 77a8fd42873d 192.168.24.1:8787/rhosp12/openstack-mariadb:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago mysql_image_tag 35eb34d3a61f 192.168.24.1:8787/rhosp12/openstack-memcached:2018-02-27.4 "/bin/bash -c 'source" 15 hours ago Up 15 hours memcached 2d7ddbf5f9bd 192.168.24.1:8787/rhosp12/openstack-haproxy:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago haproxy_image_tag 05936166dd67 192.168.24.1:8787/rhosp12/openstack-mariadb:2018-02-27.4 "bash -ecx 'if [ -e /" 15 hours ago Exited (0) 15 hours ago mysql_bootstrap 81a520488572 192.168.24.1:8787/rhosp12/openstack-redis:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago redis_image_tag 1783e86df69e 192.168.24.1:8787/rhosp12/openstack-rabbitmq:2018-02-27.4 "/bin/bash -c '/usr/b" 15 hours ago Exited (0) 15 hours ago rabbitmq_image_tag 1bfc375b1c1b 192.168.24.1:8787/rhosp12/openstack-rabbitmq:2018-02-27.4 "kolla_start" 15 hours ago Exited (0) 15 hours ago rabbitmq_bootstrap d0aa0a689d67 192.168.24.1:8787/rhosp12/openstack-memcached:2018-02-27.4 "/bin/bash -c 'source" 15 hours ago Exited (0) 15 hours ago memcached_init_logs d77cec9718ef 192.168.24.1:8787/rhosp12/openstack-mariadb:2018-02-27.4 "chown -R mysql: /var" 15 hours ago Exited (0) 15 hours ago mysql_data_ownership [fultonj@skagra docker]$ [1] [fultonj@skagra mistral]$ tail -30 ceph-install-workflow.log 2018-03-07 19:11:23,334 p=13129 u=mistral | TASK [ceph-defaults : remove ceph nfs ganesha socket if exists and not used by a process] *** 2018-03-07 19:11:23,360 p=13129 u=mistral | skipping: [192.168.24.11] 2018-03-07 19:11:23,405 p=13129 u=mistral | skipping: [192.168.24.8] 2018-03-07 19:11:23,428 p=13129 u=mistral | skipping: [192.168.24.12] 2018-03-07 19:11:23,429 p=13129 u=mistral | skipping: [192.168.24.15] 2018-03-07 19:11:23,446 p=13129 u=mistral | skipping: [192.168.24.7] 2018-03-07 19:11:23,478 p=13129 u=mistral | TASK [ceph-defaults : set_fact monitor_name ansible_hostname] ****************** 2018-03-07 19:11:23,654 p=13129 u=mistral | ok: [192.168.24.11] 2018-03-07 19:11:23,683 p=13129 u=mistral | ok: [192.168.24.8] 2018-03-07 19:11:23,705 p=13129 u=mistral | ok: [192.168.24.12] 2018-03-07 19:11:23,736 p=13129 u=mistral | ok: [192.168.24.15] 2018-03-07 19:11:23,751 p=13129 u=mistral | ok: [192.168.24.7] 2018-03-07 19:11:23,767 p=13129 u=mistral | TASK [ceph-defaults : set_fact monitor_name ansible_fqdn] ********************** 2018-03-07 19:11:23,793 p=13129 u=mistral | skipping: [192.168.24.11] 2018-03-07 19:11:23,815 p=13129 u=mistral | skipping: [192.168.24.8] 2018-03-07 19:11:23,836 p=13129 u=mistral | skipping: [192.168.24.12] 2018-03-07 19:11:23,859 p=13129 u=mistral | skipping: [192.168.24.15] 2018-03-07 19:11:23,872 p=13129 u=mistral | skipping: [192.168.24.7] 2018-03-07 19:11:23,896 p=13129 u=mistral | TASK [ceph-defaults : set_fact docker_exec_cmd] ******************************** 2018-03-07 19:11:23,932 p=13129 u=mistral | fatal: [192.168.24.11]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:23,963 p=13129 u=mistral | fatal: [192.168.24.8]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:23,992 p=13129 u=mistral | fatal: [192.168.24.12]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,025 p=13129 u=mistral | fatal: [192.168.24.15]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,039 p=13129 u=mistral | fatal: [192.168.24.7]: FAILED! => {"msg": "list object has no element 0"} 2018-03-07 19:11:24,041 p=13129 u=mistral | PLAY RECAP ********************************************************************* 2018-03-07 19:11:24,041 p=13129 u=mistral | 192.168.24.11 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,041 p=13129 u=mistral | 192.168.24.12 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,041 p=13129 u=mistral | 192.168.24.15 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,042 p=13129 u=mistral | 192.168.24.7 : ok=2 changed=0 unreachable=0 failed=1 2018-03-07 19:11:24,042 p=13129 u=mistral | 192.168.24.8 : ok=2 changed=0 unreachable=0 failed=1 [fultonj@skagra mistral]$ I have not received the needinfo requested two weeks ago. It looks like a local environment issue but I asked for that info to be sure. Closing for now. Re-open if you have requested data or can reproduce and provide requested data. *** This bug has been marked as a duplicate of bug 1552327 *** Until a version of ceph-ansible > 3.0.29 becomes available, the workaround is to deploy using environments/puppet-ceph-external.yaml |