Bug 1376283 - [containers]: configuration of ceph-clusters with ansible fails at task - 'check the partition status of the journal devices'
Summary: [containers]: configuration of ceph-clusters with ansible fails at task - 'ch...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: ceph-ansible
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2
Assignee: Sébastien Han
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks: 1315538 1371113
TreeView+ depends on / blocked
 
Reported: 2016-09-15 03:18 UTC by krishnaram Karthick
Modified: 2017-06-19 13:15 UTC (History)
16 users (show)

Fixed In Version: ceph-ansible-2.2.1-1.el7scon
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-19 13:15:26 UTC
Embargoed:


Attachments (Terms of Use)
site-docker.yml (405 bytes, text/x-vhdl)
2016-09-19 13:37 UTC, krishnaram Karthick
no flags Details
groups_vars/all file used (2.29 KB, text/x-vhdl)
2016-09-19 13:38 UTC, krishnaram Karthick
no flags Details
ansible_playbook_console_redirect (103.96 KB, text/plain)
2016-09-22 06:38 UTC, krishnaram Karthick
no flags Details
logs with the patch in comment#23 (104.09 KB, text/plain)
2016-09-23 11:16 UTC, krishnaram Karthick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1496 0 normal SHIPPED_LIVE ceph-installer, ceph-ansible, and ceph-iscsi-ansible update 2017-06-19 17:14:02 UTC

Description krishnaram Karthick 2016-09-15 03:18:40 UTC
Description of problem:

When ceph containers are configured using ansible, it fails with the following error.

snippet of stdout:

TASK: [ceph-osd | check if a partition named 'ceph' exists (autodiscover disks)] ***
skipping: [10.70.37.211] => (item={'key': u'vda', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'41943040', u'host': u'', u'rotational': u'1', u'removable': u'0', u'
support_discard': u'0', u'model': None, u'size': u'20.00 GB', u'holders': [], u'partitions': {u'vda1': {u'start': u'2048', u'sectorsize': 512, u'sectors': u'614400', u'size': u'300.00 MB'}, u'vda2': {u'start': u
'616448', u'sectorsize': 512, u'sectors': u'41326592', u'size': u'19.71 GB'}}}})
skipping: [10.70.37.43] => (item={'key': u'vda', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'41943040', u'host': u'', u'rotational': u'1', u'removable': u'0', u's
upport_discard': u'0', u'model': None, u'size': u'20.00 GB', u'holders': [], u'partitions': {u'vda1': {u'start': u'2048', u'sectorsize': 512, u'sectors': u'614400', u'size': u'300.00 MB'}, u'vda2': {u'start': u'
616448', u'sectorsize': 512, u'sectors': u'41326592', u'size': u'19.71 GB'}}}})
skipping: [10.70.37.211] => (item={'key': u'vdc', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u
'support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.89] => (item={'key': u'vda', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'41943040', u'host': u'', u'rotational': u'1', u'removable': u'0', u's
upport_discard': u'0', u'model': None, u'size': u'20.00 GB', u'holders': [], u'partitions': {u'vda1': {u'start': u'2048', u'sectorsize': 512, u'sectors': u'614400', u'size': u'300.00 MB'}, u'vda2': {u'start': u'
616448', u'sectorsize': 512, u'sectors': u'41326592', u'size': u'19.71 GB'}}}})
skipping: [10.70.37.43] => (item={'key': u'vdc', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u'
support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.211] => (item={'key': u'vdb', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u
'support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.89] => (item={'key': u'vdc', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u'
support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.43] => (item={'key': u'vdb', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u'
support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.211] => (item={'key': u'vdd', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u
'support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.89] => (item={'key': u'vdb', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u'
support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.43] => (item={'key': u'vdd', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u'
support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})
skipping: [10.70.37.89] => (item={'key': u'vdd', 'value': {u'scheduler_mode': u'', u'sectorsize': u'512', u'vendor': u'0x1af4', u'sectors': u'314572800', u'host': u'', u'rotational': u'1', u'removable': u'0', u'
support_discard': u'0', u'model': None, u'size': u'150.00 GB', u'holders': [], u'partitions': {}}})

TASK: [ceph-osd | check the partition status of the journal devices] **********
fatal: [10.70.37.211] => with_items expects a list or a set
fatal: [10.70.37.43] => with_items expects a list or a set
fatal: [10.70.37.89] => with_items expects a list or a set

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/site-docker.retry

10.70.37.211               : ok=22   changed=0    unreachable=1    failed=0
10.70.37.43                : ok=22   changed=0    unreachable=1    failed=0
10.70.37.89                : ok=22   changed=0    unreachable=1    failed=0


Version-Release number of selected component (if applicable):

[root@dhcp42-15 ceph-ansible]# git rev-parse HEAD
5298de3ef5e45296278df10a91ff43ced4dbb033

-bash-4.2# docker version
Client:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-46.el7.10.x86_64
 Go version:      go1.6.2
 Git commit:      2a93377-unsupported
 Built:           Fri Jul 29 13:45:25 2016
 OS/Arch:         linux/amd64

Server:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-46.el7.10.x86_64
 Go version:      go1.6.2
 Git commit:      2a93377-unsupported
 Built:           Fri Jul 29 13:45:25 2016
 OS/Arch:         linux/amd64

group_vars/all file shall be attached.

How reproducible:
2/2

Steps to Reproduce:
1. Follow the steps in https://access.redhat.com/articles/2429471#ansible to configure ceph cluster using ansible
2. choose 3 MONs and 3 OSDs
3. update all necessary files as required [refer attached files]

Actual results:
configuration fails at TASK: [ceph-osd | check the partition status of the journal devices]

Expected results:
No failures should be seen

Additional info:
 - disks provisioned to OSDs are new and doesn't have any partition

Comment 6 Daniel Gryniewicz 2016-09-15 13:09:04 UTC
Which ansible version?  I believe this is only happens on some ansible versions.

Comment 7 krishnaram Karthick 2016-09-16 01:10:27 UTC
I had done a git clone and this is the hash for the current commit.

[root@dhcp42-15 ceph-ansible]# git rev-parse HEAD
5298de3ef5e45296278df10a91ff43ced4dbb033

Comment 8 Daniel Gryniewicz 2016-09-16 12:23:03 UTC
I'm not able to find that hash in either ansible git or ceph-ansible-git.  Where did you clone it?

Comment 9 krishnaram Karthick 2016-09-17 05:19:07 UTC
I had cloned from here --> git clone https://github.com/ceph/ceph-ansible as mentioned in the deployment guide.

Comment 10 seb 2016-09-19 08:14:40 UTC
I don't see any attachement to this BZ, is it only me?
The playbook shouldn't go through that step, so I suspect this is a configuration issue of ceph-ansible.
This task is expected to get skipped because it is part of the non-containerized deployment.

Please share your variable file.
Thanks.

Comment 11 Daniel Gryniewicz 2016-09-19 12:06:13 UTC
Okay, sorry, I meant the ansible version, not the ceph-ansible version.

Comment 12 krishnaram Karthick 2016-09-19 13:37:30 UTC
Created attachment 1202483 [details]
site-docker.yml

Comment 13 krishnaram Karthick 2016-09-19 13:38:05 UTC
Created attachment 1202484 [details]
groups_vars/all file used

Comment 14 krishnaram Karthick 2016-09-19 14:04:06 UTC
I've attached 'groups_vars/all' for your reference.

Current ansible version,

#ansible --version
ansible 1.9.4

I see that there is an update available now '1.9.4-1.el7_2'. I'll give a try with this version.

Comment 15 Daniel Gryniewicz 2016-09-19 14:11:34 UTC
Okay, that's the same version of ansible I've seen this issue on (although on Fedora).  It didn't happen on older or new ansible that I've seen.

Comment 16 krishnaram Karthick 2016-09-19 14:17:23 UTC
I see the issue with ansible version '1.9.4-1.el7_2' too. Is there any other newer version I should try?

Do you see anything amiss with the groups_vars/all file? I've skipped the configuration of 'rgw' and 'nfs' which I think shouldn't matter.

Comment 17 seb 2016-09-19 14:20:40 UTC
group_vars/all looks good to me.

Can you send a full trace of the play? I really want to see what's happening here.

Can you bump to 2.0.0.1 and see if you still have the issue?
Thanks!

Comment 20 seb 2016-09-20 08:28:37 UTC
@ken, If I can get a full play trace, I'd be able to get a better understanding of what's happening.

@krishnaram, can I get more logs please? Thanks

Comment 21 krishnaram Karthick 2016-09-22 06:38:10 UTC
Created attachment 1203603 [details]
ansible_playbook_console_redirect

Comment 23 seb 2016-09-22 14:42:33 UTC
No worries, would you mind testing this branch and see how to goes?
https://github.com/ceph/ceph-ansible/pull/994
Thanks!

Comment 24 krishnaram Karthick 2016-09-23 11:14:58 UTC
ansible fails with the same error reported earlier with the above patch cherry-picked.

Comment 25 krishnaram Karthick 2016-09-23 11:16:34 UTC
Created attachment 1204103 [details]
logs with the patch in comment#23

Comment 26 seb 2016-09-23 12:08:48 UTC
Are you running ceph-master?

Comment 27 seb 2016-09-23 12:10:21 UTC
sorry I meant ceph-ansible master

Comment 28 krishnaram Karthick 2016-09-26 03:21:44 UTC
Yes. I from ceph ansible master, I had done a cherry pick of the patch with the fix.

<<<<<<<Snippet of git log>>>>>>>

commit 21d217e89098c102f9eb9de232698b1661003f38
Author: Sébastien Han <seb>
Date:   Thu Sep 22 16:41:06 2016 +0200

    fix non skipped task for ansible v1.9.x
    
    please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1376283
    
    Signed-off-by: Sébastien Han <seb>

commit d7699672a8b3fa078b8a5a681ef04121bc6a681a
Merge: efe4235 e756214
Author: Leseb <seb>
Date:   Thu Sep 22 11:54:41 2016 +0200

    Merge pull request #988 from batrick/linode-dockerfile
    
    docker: add Dockerfile for Linode cluster dev env

Comment 29 seb 2016-09-30 13:37:17 UTC
I just pushed new changes on: https://github.com/ceph/ceph-ansible/pull/994/ can you try again?
Thanks

Comment 31 krishnaram Karthick 2016-10-03 05:06:31 UTC
I still see the same issue with your patch.

Comment 33 seb 2016-10-03 16:16:41 UTC
Thanks for sharing your setup, it's way easier to debug :).
I successfully fix the problem with this PR: https://github.com/ceph/ceph-ansible/pull/994

Please purge the full setup and test again with both 1.9.4 and 2.x ;)

ps: I have a branch with the patch, but please start over with the upstream branch I linked above.

Thanks again.

Comment 34 krishnaram Karthick 2016-10-04 08:11:05 UTC
yay! it worked.

I'm able to setup ceph cluster with ansible now with the fix now, tried with both 1.9.4 and 2.1.1.0

Comment 35 seb 2016-10-04 09:41:48 UTC
Thanks a lot for testing, I'm going to publish a 1.0.7 version soon. (upstream ceph-ansible)

Comment 37 Federico Lucifredi 2016-10-17 13:16:17 UTC
this is a blocker for the Ceph container release. Target => 2.2.

Comment 38 Andrew Schoen 2017-03-03 16:11:58 UTC
This looks to have made it into the stable-2.1 branch of ceph-ansible upstream.

Comment 41 Rachana Patel 2017-05-25 19:19:50 UTC
Verified with below version:
ceph-ansible-2.2.7-1.el7scon.noarch
ansible-2.2.3.0-1.el7.noarch


Followed Downstream Doc for installation and it dint failed. Able to bring up cluster hence moving to verified

Comment 43 errata-xmlrpc 2017-06-19 13:15:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1496


Note You need to log in before you can comment on or make changes to this bug.