Description of problem: When ceph containers are configured using ansible, it fails with the following error. snippet of stdout: TASK: [ceph-osd | check if a partition named 'ceph' exists (autodiscover disks)] *** skipping: [10.70.37.211] => (item={'key': u'vda', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'41943040', u'host': u'', u'rotational': u'1', u'removable': u'0', u' support_discard': u'0', u'model': None, u'size': u'20.00 GB', u'holders': [], u'partitions': {u'vda1': {u'start': u'2048', u'sectorsize': 512, u'sectors': u'614400', u'size': u'300.00 MB'}, u'vda2': {u'start': u '616448', u'sectorsize': 512, u'sectors': u'41326592', u'size': u'19.71 GB'}}}}) skipping: [10.70.37.43] => (item={'key': u'vda', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'41943040', u'host': u'', u'rotational': u'1', u'removable': u'0', u's upport_discard': u'0', u'model': None, u'size': u'20.00 GB', u'holders': [], u'partitions': {u'vda1': {u'start': u'2048', u'sectorsize': 512, u'sectors': u'614400', u'size': u'300.00 MB'}, u'vda2': {u'start': u' 616448', u'sectorsize': 512, u'sectors': u'41326592', u'size': u'19.71 GB'}}}}) skipping: [10.70.37.211] => (item={'key': u'vdc', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u 'support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.89] => (item={'key': u'vda', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'41943040', u'host': u'', u'rotational': u'1', u'removable': u'0', u's upport_discard': u'0', u'model': None, u'size': u'20.00 GB', u'holders': [], u'partitions': {u'vda1': {u'start': u'2048', u'sectorsize': 512, u'sectors': u'614400', u'size': u'300.00 MB'}, u'vda2': {u'start': u' 616448', u'sectorsize': 512, u'sectors': u'41326592', u'size': u'19.71 GB'}}}}) skipping: [10.70.37.43] => (item={'key': u'vdc', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u' support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.211] => (item={'key': u'vdb', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u 'support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.89] => (item={'key': u'vdc', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u' support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.43] => (item={'key': u'vdb', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u' support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.211] => (item={'key': u'vdd', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u 'support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.89] => (item={'key': u'vdb', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u' support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.43] => (item={'key': u'vdd', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u' support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) skipping: [10.70.37.89] => (item={'key': u'vdd', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u' support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}}) TASK: [ceph-osd | check the partition status of the journal devices] ********** fatal: [10.70.37.211] => with_items expects a list or a set fatal: [10.70.37.43] => with_items expects a list or a set fatal: [10.70.37.89] => with_items expects a list or a set FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/root/site-docker.retry 10.70.37.211 : ok=22 changed=0 unreachable=1 failed=0 10.70.37.43 : ok=22 changed=0 unreachable=1 failed=0 10.70.37.89 : ok=22 changed=0 unreachable=1 failed=0 Version-Release number of selected component (if applicable): [root@dhcp42-15 ceph-ansible]# git rev-parse HEAD 5298de3ef5e45296278df10a91ff43ced4dbb033 -bash-4.2# docker version Client: Version: 1.10.3 API version: 1.22 Package version: docker-common-1.10.3-46.el7.10.x86_64 Go version: go1.6.2 Git commit: 2a93377-unsupported Built: Fri Jul 29 13:45:25 2016 OS/Arch: linux/amd64 Server: Version: 1.10.3 API version: 1.22 Package version: docker-common-1.10.3-46.el7.10.x86_64 Go version: go1.6.2 Git commit: 2a93377-unsupported Built: Fri Jul 29 13:45:25 2016 OS/Arch: linux/amd64 group_vars/all file shall be attached. How reproducible: 2/2 Steps to Reproduce: 1. Follow the steps in https://access.redhat.com/articles/2429471#ansible to configure ceph cluster using ansible 2. choose 3 MONs and 3 OSDs 3. update all necessary files as required [refer attached files] Actual results: configuration fails at TASK: [ceph-osd | check the partition status of the journal devices] Expected results: No failures should be seen Additional info: - disks provisioned to OSDs are new and doesn't have any partition
Which ansible version? I believe this is only happens on some ansible versions.
I had done a git clone and this is the hash for the current commit. [root@dhcp42-15 ceph-ansible]# git rev-parse HEAD 5298de3ef5e45296278df10a91ff43ced4dbb033
I'm not able to find that hash in either ansible git or ceph-ansible-git. Where did you clone it?
I had cloned from here --> git clone https://github.com/ceph/ceph-ansible as mentioned in the deployment guide.
I don't see any attachement to this BZ, is it only me? The playbook shouldn't go through that step, so I suspect this is a configuration issue of ceph-ansible. This task is expected to get skipped because it is part of the non-containerized deployment. Please share your variable file. Thanks.
Okay, sorry, I meant the ansible version, not the ceph-ansible version.
Created attachment 1202483 [details] site-docker.yml
Created attachment 1202484 [details] groups_vars/all file used
I've attached 'groups_vars/all' for your reference. Current ansible version, #ansible --version ansible 1.9.4 I see that there is an update available now '1.9.4-1.el7_2'. I'll give a try with this version.
Okay, that's the same version of ansible I've seen this issue on (although on Fedora). It didn't happen on older or new ansible that I've seen.
I see the issue with ansible version '1.9.4-1.el7_2' too. Is there any other newer version I should try? Do you see anything amiss with the groups_vars/all file? I've skipped the configuration of 'rgw' and 'nfs' which I think shouldn't matter.
group_vars/all looks good to me. Can you send a full trace of the play? I really want to see what's happening here. Can you bump to 2.0.0.1 and see if you still have the issue? Thanks!
@ken, If I can get a full play trace, I'd be able to get a better understanding of what's happening. @krishnaram, can I get more logs please? Thanks
Created attachment 1203603 [details] ansible_playbook_console_redirect
No worries, would you mind testing this branch and see how to goes? https://github.com/ceph/ceph-ansible/pull/994 Thanks!
ansible fails with the same error reported earlier with the above patch cherry-picked.
Created attachment 1204103 [details] logs with the patch in comment#23
Are you running ceph-master?
sorry I meant ceph-ansible master
Yes. I from ceph ansible master, I had done a cherry pick of the patch with the fix. <<<<<<<Snippet of git log>>>>>>> commit 21d217e89098c102f9eb9de232698b1661003f38 Author: Sébastien Han <seb> Date: Thu Sep 22 16:41:06 2016 +0200 fix non skipped task for ansible v1.9.x please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1376283 Signed-off-by: Sébastien Han <seb> commit d7699672a8b3fa078b8a5a681ef04121bc6a681a Merge: efe4235 e756214 Author: Leseb <seb> Date: Thu Sep 22 11:54:41 2016 +0200 Merge pull request #988 from batrick/linode-dockerfile docker: add Dockerfile for Linode cluster dev env
I just pushed new changes on: https://github.com/ceph/ceph-ansible/pull/994/ can you try again? Thanks
I still see the same issue with your patch.
Thanks for sharing your setup, it's way easier to debug :). I successfully fix the problem with this PR: https://github.com/ceph/ceph-ansible/pull/994 Please purge the full setup and test again with both 1.9.4 and 2.x ;) ps: I have a branch with the patch, but please start over with the upstream branch I linked above. Thanks again.
yay! it worked. I'm able to setup ceph cluster with ansible now with the fix now, tried with both 1.9.4 and 2.1.1.0
Thanks a lot for testing, I'm going to publish a 1.0.7 version soon. (upstream ceph-ansible)
this is a blocker for the Ceph container release. Target => 2.2.
This looks to have made it into the stable-2.1 branch of ceph-ansible upstream.
Verified with below version: ceph-ansible-2.2.7-1.el7scon.noarch ansible-2.2.3.0-1.el7.noarch Followed Downstream Doc for installation and it dint failed. Able to bring up cluster hence moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1496