Created attachment 1398911 [details] ceph-install-workflow log Description of problem: Ceph Ansible failed to report failures in the deployment and running of the OSDs in the Ceph cluster. The summary of the deployment of the cluster in /var/log/mistral/ceph-install-workflow.log is: 2018-02-21 11:14:37,429 p=7359 u=mistral | 192.168.24.10 : ok=109 changed=17 unreachable=0 failed=0 2018-02-21 11:14:37,429 p=7359 u=mistral | 192.168.24.14 : ok=58 changed=6 unreachable=0 failed=0 2018-02-21 11:14:37,429 p=7359 u=mistral | 192.168.24.15 : ok=37 changed=3 unreachable=0 failed=0 2018-02-21 11:14:37,429 p=7359 u=mistral | 192.168.24.17 : ok=56 changed=6 unreachable=0 failed=0 2018-02-21 11:14:37,429 p=7359 u=mistral | 192.168.24.6 : ok=56 changed=6 unreachable=0 failed=0 With the nodes, 192.168.24.6/14/17 are Ceph storage nodes. The status of the OSDs in those nodes are: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 20G 0 disk ├─vda1 252:1 0 1M 0 part └─vda2 252:2 0 20G 0 part / vdb 252:16 0 40G 0 disk ├─vdb1 252:17 0 39.5G 0 part └─vdb2 252:18 0 512M 0 part vdc 252:32 0 40G 0 disk ├─vdc1 252:33 0 39.5G 0 part └─vdc2 252:34 0 512M 0 part (The disks were partitioned) From journalctl: Feb 21 16:33:41 ceph-0 systemd[1]: ceph-osd failed. Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: command: Running command: /usr/bin/ceph-detect-init --default sysvinit Feb 21 16:33:41 ceph-0 dockerd-current[14904]: command: Running command: /usr/bin/ceph-detect-init --default sysvinit Feb 21 16:33:41 ceph-0 dockerd-current[14904]: activate: Marking with init system none Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: activate: Marking with init system none Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.NdxHnQ/none Feb 21 16:33:41 ceph-0 dockerd-current[14904]: command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.NdxHnQ/none Feb 21 16:33:41 ceph-0 dockerd-current[14904]: command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.NdxHnQ/none Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.NdxHnQ/none Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: activate: ceph osd.1 data dir is ready at /var/lib/ceph/tmp/mnt.NdxHnQ Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: move_mount: Moving mount to final location... Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: command_check_call: Running command: /bin/mount -o noatime,largeio,inode64,swalloc -- /dev/vdb1 /var/lib/ceph/osd/ceph-1 Feb 21 16:33:41 ceph-0 dockerd-current[14904]: activate: ceph osd.1 data dir is ready at /var/lib/ceph/tmp/mnt.NdxHnQ Feb 21 16:33:41 ceph-0 dockerd-current[14904]: move_mount: Moving mount to final location... Feb 21 16:33:41 ceph-0 dockerd-current[14904]: command_check_call: Running command: /bin/mount -o noatime,largeio,inode64,swalloc -- /dev/vdb1 /var/lib/ceph/osd/ceph-1 Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.NdxHnQ Feb 21 16:33:41 ceph-0 dockerd-current[14904]: command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.NdxHnQ Feb 21 16:33:41 ceph-0 ceph-osd-run.sh[156961]: 2018-02-21 16:33:41 /entrypoint.sh: SUCCESS Feb 21 16:33:41 ceph-0 dockerd-current[14904]: 2018-02-21 16:33:41 /entrypoint.sh: SUCCESS Feb 21 16:33:42 ceph-0 ceph-osd-run.sh[156961]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal Feb 21 16:33:42 ceph-0 dockerd-current[14904]: starting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal Feb 21 16:33:43 ceph-0 dockerd-current[14904]: 2018-02-21 16:33:43.263267 7f6516cd3d00 -1 osd.1 28 log_to_monitors {default=true} Feb 21 16:33:43 ceph-0 ceph-osd-run.sh[156961]: 2018-02-21 16:33:43.263267 7f6516cd3d00 -1 osd.1 28 log_to_monitors {default=true} Feb 21 16:33:43 ceph-0 ceph-osd-run.sh[156961]: 2018-02-21 16:33:43.272350 7f6516cd3d00 -1 osd.1 28 init authentication failed: (1) Operation not permitted Feb 21 16:33:43 ceph-0 dockerd-current[14904]: 2018-02-21 16:33:43.272350 7f6516cd3d00 -1 osd.1 28 init authentication failed: (1) Operation not permitted Feb 21 16:33:43 ceph-0 kernel: XFS (vdb1): Unmounting Filesystem Feb 21 16:33:43 ceph-0 oci-systemd-hook[158052]: systemdhook <debug>: 13fa4832bdd2: Skipping as container command is /entrypoint.sh, not init or systemd Feb 21 16:33:43 ceph-0 oci-umount[158055]: umounthook <debug>: 13fa4832bdd2: only runs in prestart stage, ignoring Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.435399957-05:00" level=debug msg="containerd: process exited" id=13fa4832bdd2aa52d4eca7e52bf5ba487d10157ed77c4063f519d Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.443381118-05:00" level=error msg="containerd: deleting container" error="exit status 1: \"container 13fa4832bdd2aa52d4 Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.444025747-05:00" level=debug msg="libcontainerd: received containerd event: &types.Event{Type:\"exit\", Id:\"13fa4832b Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.444272070-05:00" level=debug msg="attach: stdout: end" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.444288848-05:00" level=debug msg="attach: stderr: end" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.444361565-05:00" level=debug msg="AuthZ response using plugin rhel-push-plugin" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.479011114-05:00" level=debug msg="Calling GET /_ping" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.479252948-05:00" level=debug msg="{Action=_ping, Username=heat-admin, LoginUID=1000, PID=158062}" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.479472039-05:00" level=debug msg="AuthZ request using plugin rhel-push-plugin" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.480531419-05:00" level=debug msg="AuthZ response using plugin rhel-push-plugin" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.481473631-05:00" level=debug msg="Calling GET /v1.26/containers/json" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.481648277-05:00" level=debug msg="{Action=json, Username=heat-admin, LoginUID=1000, PID=158062}" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.481677925-05:00" level=debug msg="AuthZ request using plugin rhel-push-plugin" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.484338982-05:00" level=warning msg="13fa4832bdd2aa52d4eca7e52bf5ba487d10157ed77c4063f519d2c3fc3a966c cleanup: failed t Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.538930367-05:00" level=debug msg="AuthZ response using plugin rhel-push-plugin" Feb 21 16:33:43 ceph-0 dockerd-current[14904]: time="2018-02-21T11:33:43.544054834-05:00" level=debug msg="Removing volume reference: driver local, name 245738349e633015d779c1d85fa78b5998e6f3 Feb 21 16:33:43 ceph-0 systemd[1]: ceph-osd: main process exited, code=exited, status=1/FAILURE Version-Release number of selected component (if applicable): ceph-ansible-3.0.25-1.el7cp.noarch python2-mistral-lib-0.3.3-0.20180109062152.8986ce9.el7ost.noarch python2-mistralclient-3.1.4-0.20171117092239.291501a.el7ost.noarch openstack-mistral-engine-6.0.0-0.20180122153726.ae7950e.el7ost.noarch python-mistral-6.0.0-0.20180122153726.ae7950e.el7ost.noarch openstack-mistral-common-6.0.0-0.20180122153726.ae7950e.el7ost.noarch puppet-mistral-12.2.0-0.20180119074354.379b7ce.el7ost.noarch openstack-mistral-api-6.0.0-0.20180122153726.ae7950e.el7ost.noarch openstack-mistral-executor-6.0.0-0.20180122153726.ae7950e.el7ost.noarch puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch openstack-tripleo-image-elements-8.0.0-0.20180117094122.02d0985.el7ost.noarch openstack-tripleo-ui-8.1.1-0.20180122135122.aef02d8.el7ost.noarch openstack-tripleo-puppet-elements-8.0.0-0.20180117092204.120eca8.el7ost.noarch openstack-tripleo-heat-templates-8.0.0-0.20180122224017.el7ost.noarch openstack-tripleo-validations-8.1.1-0.20180119231917.2ff3c79.el7ost.noarch openstack-tripleo-common-8.3.1-0.20180123050219.el7ost.noarch ansible-tripleo-ipsec-0.0.1-0.20180119094817.5e80d4f.el7ost.noarch python-tripleoclient-9.0.1-0.20180119233147.el7ost.noarch openstack-tripleo-common-containers-8.3.1-0.20180123050219.el7ost.noarch How reproducible: unknown Steps to Reproduce: 1. Deploy an overcloud with 1 dedicated node with Ceph monitor and mgr on it, 3 controller nodes, 1 compute node and 3 ceph storage nodes, each with 2 OSDs in it Actual results: The cluster is not functional the deployment doesn't report the failure Expected results: 1) the OSDs are running and the Ceph cluster is functional 2) in case of an error, the deployment should fail with the Ceph cluster error Additional info: