Bug 1482067 - [support] "ansible-playbook site.yml --limit <osds|clients|rgws>" will not create proper content of /etc/ceph/ceph.conf
Summary: [support] "ansible-playbook site.yml --limit <osds|clients|rgws>" will not cr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 2.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 3.0
Assignee: Sébastien Han
QA Contact: Madhavi Kasturi
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1494421
TreeView+ depends on / blocked
 
Reported: 2017-08-16 11:58 UTC by Tomas Petr
Modified: 2021-03-11 15:36 UTC (History)
14 users (show)

Fixed In Version: RHEL: ceph-ansible-3.0.0-0.1.rc4.el7cp Ubuntu: ceph-ansible_3.0.0~rc4-2redhat1
Doc Type: Bug Fix
Doc Text:
.Using the `site.yml` playbook with the `--limit` option works as expected When using the `site.yml` playbook with the `--limit` option set to `osd`, `clients`, or `rgws` to deploy a cluster, the playbook created an incorrect configuration file with missing values. The playbook now uses the `delegate_facts` option that allows the playbook to instruct hosts to get information from other hosts that are not part of the current play, in this case Monitor hosts. As a result, the playbook creates a proper configuration file in the described scenario.
Clone Of:
Environment:
Last Closed: 2017-12-05 23:39:47 UTC
Embargoed:


Attachments (Terms of Use)
ansible-playbook site.yml -vvvvv | tee /tmp/site.yml.1482067 (193.13 KB, text/plain)
2017-08-25 09:48 UTC, Tomas Petr
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 1801 0 None closed site: delegate fact to all the hosts 2021-02-05 11:05:08 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Tomas Petr 2017-08-16 11:58:04 UTC
Description of problem:

The ansible-playbook executed with "--limit" flag for all types of nodes except mons will (re)create content of /etc/ceph/ceph.conf that is not complete:

/etc/ceph/ceph.conf after deployment with command "ansible-playbook site.yml"
-----------------
[root@rhscaosd5 ~]# cat /etc/ceph/ceph.conf 
# Please do not change this file directly since it is managed by Ansible and will be overwritten

[global]
fsid = a93b8b8c-a7fe-4103-8434-6a490f641a66
max open files = 131072
mon initial members = rhscamon1,rhscamon2,rhscamon3
mon host = 192.168.66.157,192.168.66.251,192.168.66.149
public network = 192.168.66.0/24
cluster network = 192.168.66.0/24

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 1024
-----------------


/etc/ceph/ceph.conf after re-run with command "ansible-playbook site.yml --limit osds" , where node rhscaosd5 is an osd node listed in /etc/ansible/hosts under tag [osds]
-----------------
[root@rhscaosd5 ~]# cat /etc/ceph/ceph.conf 
# Please do not change this file directly since it is managed by Ansible and will be overwritten

[global]
fsid = a93b8b8c-a7fe-4103-8434-6a490f641a66
max open files = 131072
mon initial members = ,,
mon host = ,,
public network = 192.168.66.0/24
cluster network = 192.168.66.0/24

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 1024
-----------------


Version-Release number of selected component (if applicable):
ceph-ansible-2.2.11-1.el7scon.noarch
ansible-2.2.3.0-1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy ceph cluster with ceph-ansible: "ansible-playbook site.yml"
2. re-run deployment with flag "--limit" and option other than "mons", like "ansible-playbook site.yml --limit osds"
3. observe /etc/ceph/ceph.conf on nodes from group used in step 2.

Actual results:
/etc/ceph/ceph.conf is (re)created with missing values:
mon initial members = ,,
mon host = ,,

Expected results:
/etc/ceph/ceph.conf has correct values

Additional info:

Comment 2 seb 2017-08-22 10:09:00 UTC
Unfortunately, this is expected. We can not use 'limit' since the ceph.conf needs to know the monitors in order to be built correctly.

I'm closing this as won't fix. This behaviour can not be changed, this is an Ansible limitation. Thanks for your understanding.

Comment 4 seb 2017-08-24 07:21:25 UTC
My bad, I wasn't aware of this feature, let me have a look into this.
Thanks!

Comment 7 seb 2017-08-24 09:30:38 UTC
See, if they can try it out it'd be even better :)

Comment 8 Tomas Petr 2017-08-24 11:26:39 UTC
(In reply to seb from comment #7)
> See, if they can try it out it'd be even better :)

I will give it a try, but there is lot of changes between downstream version and upstream, so simple c&p of the site.yml.sample from https://github.com/ceph/ceph-ansible/pull/1801 fails right away.

Comment 9 seb 2017-08-24 12:30:31 UTC
Even if they are a lot of changes that are not a problem. What's the failure?

Comment 10 Tomas Petr 2017-08-24 12:52:50 UTC
(In reply to seb from comment #9)
> Even if they are a lot of changes that are not a problem. What's the failure?

this a run with site.yml.sample + ceph-ansible-2.2.11-1.el7scon.noarch, ansible-2.2.3.0-1.el7.noarch
# ansible-playbook site.BZ1482067.yml --limit osds
ERROR! the role 'ceph-defaults' was not found in /usr/share/ceph-ansible/roles:/usr/share/ceph-ansible/roles:/usr/share/ceph-ansible

The error appears to have been in '/usr/share/ceph-ansible/site.BZ1482067.yml': line 58, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  roles:
    - ceph-defaults
      ^ here
there is no such role in 
/usr/share/ceph-ansible/roles/

-----------------------------------------------
if I will create Frankenstein's monster and just add/remove the lines inthe commit to site.yml.sample delivered with downstream version it fails with :
$ ansible-playbook site.yml.sample --limit osds
ERROR! Syntax Error while loading YAML.


The error appears to have been in '/usr/share/ceph-ansible/site.yml.sample': line 32, column 6, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


     - name: install python2 for Fedora
     ^ here
-------
I haven't tried that further

Comment 11 seb 2017-08-25 09:07:07 UTC
Right, ceph-defaults doesn't exist (yet) for you. So yes remove all the occurrence.
About the error "     - name: install python2 for Fedora" could this be an indent issue? Can you try to debug further?

Thanks!

Comment 12 Tomas Petr 2017-08-25 09:48:07 UTC
(In reply to seb from comment #11)
> Right, ceph-defaults doesn't exist (yet) for you. So yes remove all the
> occurrence.
> About the error "     - name: install python2 for Fedora" could this be an
> indent issue? Can you try to debug further?
> 
> Thanks!

Ok, that was mistake from my side, I have messed up the site.yml content.
------


Anyway, new try with just these changes, :
[root@rhsca ceph-ansible]# diff  site.yml site.yml.origin 
35c35
<     - name: gather and delegate facts
---
>     - name: gathering facts
37,39d36
<       delegate_to: "{{ item }}"
<       delegate_facts: True
<       with_items: "{{ groups['all'] }}"
42,44c39
<       when: 
<         - ansible_distribution == 'Fedora'
<         - ansible_distribution_major_version|int >= 23
---
>       when: ansible_distribution == 'Fedora' and ansible_distribution_major_version|int >= 23

....

TASK [install required packages for Fedora > 23] *******************************
task path: /usr/share/ceph-ansible/site.yml:40
fatal: [rhscaosd5]: FAILED! => {
    "failed": true, 
    "msg": "The conditional check 'ansible_distribution == 'Fedora'' failed. The error was: error while evaluating conditional (ansible_distribution == 'Fedora'): 'ansible_distribution' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/site.yml': line 40, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n      with_items: \"{{ groups['all'] }}\"\n    - name: install required packages for Fedora > 23\n      ^ here\n"
}

Comment 13 Tomas Petr 2017-08-25 09:48:41 UTC
Created attachment 1318073 [details]
ansible-playbook  site.yml -vvvvv | tee /tmp/site.yml.1482067

Comment 14 Andrew Schoen 2017-08-25 11:12:42 UTC
> ....
> 
> TASK [install required packages for Fedora > 23]
> *******************************
> task path: /usr/share/ceph-ansible/site.yml:40
> fatal: [rhscaosd5]: FAILED! => {
>     "failed": true, 
>     "msg": "The conditional check 'ansible_distribution == 'Fedora'' failed.
> The error was: error while evaluating conditional (ansible_distribution ==
> 'Fedora'): 'ansible_distribution' is undefined\n\nThe error appears to have
> been in '/usr/share/ceph-ansible/site.yml': line 40, column 7, but may\nbe
> elsewhere in the file depending on the exact syntax problem.\n\nThe
> offending line appears to be:\n\n      with_items: \"{{ groups['all'] }}\"\n
> - name: install required packages for Fedora > 23\n      ^ here\n"
> }

The changes added to site.yml.sample for the facts delegation now requires that we use ansible >= 2.3. I believe what you're seeing above is a bug in ansible 2.2 and should go away when using ansible 2.3.

Thanks,
Andrew

Comment 15 Tomas Petr 2017-08-31 12:52:56 UTC
(In reply to Andrew Schoen from comment #14)
> > ....
> The changes added to site.yml.sample for the facts delegation now requires
> that we use ansible >= 2.3. I believe what you're seeing above is a bug in
> ansible 2.2 and should go away when using ansible 2.3.
> 
> Thanks,
> Andrew

Hi Andrew,
you are right, thank you for pointing on that.
So I have updated ansible and give it another try:
# rpm -qa | grep ansible
ceph-ansible-2.2.11-1.el7scon.noarch
ansible-2.3.1.0-3.el7.noarch

# diff site.yml.BZ1482067 site.yml.origin
35c35
<     - name: gather and delegate facts
---
>     - name: gathering facts
37,39d36
<       delegate_to: "{{ item }}"
<       delegate_facts: True
<       with_items: "{{ groups['all'] }}"

# ansible-playbook  site.yml.BZ1482067 --limit osds

And it worked well, the ceph.conf has correct values:
[root@rhscaosd5 ~]# cat /etc/ceph/ceph.conf 
# Please do not change this file directly since it is managed by Ansible and will be overwritten

[global]
fsid = a93b8b8c-a7fe-4103-8434-6a490f641a66
max open files = 131072
mon initial members = rhscamon1,rhscamon2,rhscamon3
mon host = 192.168.66.157,192.168.66.251,192.168.66.149
public network = 192.168.66.0/24
cluster network = 192.168.66.0/24

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 1024

Comment 16 Ken Dreyer (Red Hat) 2017-10-12 22:56:25 UTC
(weird issue: GitHub says https://github.com/ceph/ceph-ansible/pull/1801 was merged in 02d849d2371afee242a6913473805f5e7522c9ae. I can't find that commit when I fetch today.)

git tag --contains 5bda515d7ca185e8feedc031624ff4b073caa728 says this has been fixed since 3.0.0rc4.

Comment 20 Sébastien Han 2017-10-24 15:03:24 UTC
lgtm

Comment 21 Madhavi Kasturi 2017-10-25 05:33:36 UTC
The configuration file is displays proper values, with ansible-playbook site.yml --limit osds|clients|rgws

Moving this BZ to verified.

Comment 24 errata-xmlrpc 2017-12-05 23:39:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.