Description of problem: If ceph_conf_overrides is used to set cluster_network instead of setting it under OSD options in the group_vars/all.yml file the default var for cluster_network == {{ public_network }} is still set resulting in two conflicting cluster_network entries in the resulting ceph.conf file: Version-Release number of selected component (if applicable): ceph-ansible-1.0.5-46.el7scon How reproducible: Always Steps to Reproduce: 1) Using the ceph-ansible playbook, modify group_vars/all.sample to the desired settings 2) When configuring cluster_network, *do not* use the OSD options parameter (comment that out) instead specify it in the CONFIG OVERRIDE section: ceph_conf_overrides: global: cluster_network: 10.10.100.0/24 3) Run the playbook 4) Once complete take a look at the resulting ceph.conf on one of the configured hosts, it should look like this: # cat /etc/ceph/ceph.conf [global] cluster_network = 10.10.100.0/24 <--- max open files = 131072 fsid = 78a15451-fe2b-4627-a99d-9e060d0aecf1 [mon.mon2] host = mon2 mon addr = 192.168.100.22 [mon.mon3] host = mon3 mon addr = 192.168.100.23 [mon.mon1] host = mon1 mon addr = 192.168.100.21 [client] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor [mon] [osd] osd mount options xfs = noatime,largeio,inode64,swalloc osd mkfs options xfs = -f -i size=2048 public_network = 192.168.100.0/24 cluster_network = 192.168.100.0/24 <--- osd mkfs type = xfs osd journal size = 1024 Actual results: It appears ansible has included cluster_network settings in both the [global] and [osd] sections of the ceph.conf with two differing values. In a customer's machine we saw that the removal of the duplicate cluster_network (and change to the correct IP) entry resulted in fixing his attempts to import the existing cluster into Red Hat Storage Console; but we're unsure what effects (if any) this may have on performance or problems within a cluster serving data. The problem is two fold, cluster_network setting in [osd] gets set by roles/ceph-common/defaults/main.yml to: cluster_network: "{{ public_network }}" since it was never uncommented in the all file. Then roles/ceph-common/tasks/main.yml comes in and dumps the config_overrides into the ceph.conf: config_overrides: "{{ ceph_conf_overrides }}" which results in the [global] option, which is actually the correct option here. --------- I think this is all fine. The cluster_network var works as expected, it's more an issue with conf overrides, but the implementation isn't smart in anyway we just dump the overrides values into the conf. Perhaps we need syntax checks? Or perhaps it's just something we just need to fix in the docs and conf_overrides is behaving as designed. Expected results: This is kind of a question mark for me. It depends on how you wish to handle it. See my comments upstream for more on this: https://github.com/ceph/ceph-ansible/issues/1262 Additional info: This behavior appears to also occur upstream. [global] mon initial members = mon1,mon2,mon3 cluster network = 192.168.100.0/24 mon host = 192.168.100.21,192.168.100.22,192.168.100.23 public network = 192.168.100.0/24 cluster_network = 10.10.100.0/24 I've opened an issue to track this as well: https://github.com/ceph/ceph-ansible/issues/1262 In addition, from a purely documentation standpoint, I'm not sure why we recommend using the config override option over the OSD options located in 'all': ## OSD options #public_network: 192.168.100.0/24 #cluster_network: 10.10.100.0/24 In Step 11 of the documentation located in Section 3.2.2 of https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/paged/installation-guide-for-red-hat-enterprise-linux/chapter-3-storage-cluster-installation we specify the setting of the public_network setting: "Set the public_network setting:" but we never actually discuss setting the cluster_network: setting as an option. This leads users to want to use the ceph_conf_overrides to complete this task if they follow the documentation per our instruction. I feel there should be a step 12 with an 'Optional' note specifying the setting of a cluster_network if desired, otherwise to comment the 'cluster_network' option which will set it to public_network as default and advise against placing this option inside of the ceph_conf_overrides section. In Section 3.2.5 of the same documentation we provide the following ceph_conf_override example: ceph_conf_overrides: global: osd_pool_default_size: 2 osd_pool_default_min_size: 1 cluster_network: 10.0.0.1/24 client.rgw.rgw1: log_file: /var/log/ceph/ceph-rgw-rgw1.log which contains the 'cluster_network' in the example. This results in further confusion by customers.
Please see my reply in https://github.com/ceph/ceph-ansible/issues/1262
(In reply to seb from comment #2) > Please see my reply in https://github.com/ceph/ceph-ansible/issues/1262 So, after the discussion upstream we'll just move forward to making this a purely documentation bug.