Description of problem: When using ceph ansible to define a crush rule it is possible to define it as the default rule. The default rule ID is not enforced Version-Release number of selected component (if applicable): ceph-ansible-3.1.5-1.el7cp.noarch How reproducible: Always Steps to Reproduce: 1. crush_rule_config: true crush_rules: - name: standard root: standard_root type: rack default: true - name: fast root: fast_root type: rack default: false 2. Create a pool 3. Check the pool crush_rule value Actual results: Rules are created but the default rule is not enforced Expected results: Rule configured as default should become the default rule. Additional info: # ceph osd pool create get_schwifty 64 # ceph osd pool get get_schwifty crush_rule crush_rule: replicated_rule Rules are created: # ceph osd crush rule ls replicated_rule standard fast Looking at the running mon config, the default rule is set -1 which means “pick the rule with the lowest numerical ID and use that" # ceph --admin-daemon /var/run/ceph/ceph-mon.lab-controller01.asok config get osd_pool_default_crush_rule { "osd_pool_default_crush_rule": "-1" } Looking at the ceph.conf file, it should contain the osd_pool_default_crush_rule = 1 parameter [1] but it does not # grep osd_pool_default_crush_rule /etc/ceph/ceph.conf -> empty [1] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/crush_rules.yml#L54 Looking at ceph-ansible logs we can see that the Mons "running" config seems to be correctly updated: 2018-10-10 08:56:44,348 p=32237 u=mistral | TASK [ceph-mon : insert new default crush rule into daemon to prevent restart] *** 2018-10-10 08:56:44,348 p=32237 u=mistral | Wednesday 10 October 2018 08:56:44 -0400 (0:00:00.083) 0:03:41.021 ***** 2018-10-10 08:56:44,828 p=32237 u=mistral | ok: [172.16.0.25 -> 172.16.0.22] => (item=172.16.0.22) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-lab-controller01", "ceph", "--cluster", "ceph", "daemon", "mon.lab-controller01", "config", "set", "osd_pool_defau lt_crush_rule", "1"], "delta": "0:00:00.210402", "end": "2018-10-10 12:56:46.953167", "failed": false, "item": "172.16.0.22", "rc": 0, "start": "2018-10-10 12:56:46.742765", "stderr": "", "stderr_lines": [], "stdout": "{\n \"success\": \"osd_pool_default_crush_rule = ' 1' (not observed, change may require restart) \"\n}", "stdout_lines": ["{", " \"success\": \"osd_pool_default_crush_rule = '1' (not observed, change may require restart) \"", "}"]} 2018-10-10 08:56:45,287 p=32237 u=mistral | ok: [172.16.0.25 -> 172.16.0.24] => (item=172.16.0.24) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-lab-controller03", "ceph", "--cluster", "ceph", "daemon", "mon.lab-controller03", "config", "set", "osd_pool_defau lt_crush_rule", "1"], "delta": "0:00:00.211802", "end": "2018-10-10 12:56:47.413902", "failed": false, "item": "172.16.0.24", "rc": 0, "start": "2018-10-10 12:56:47.202100", "stderr": "", "stderr_lines": [], "stdout": "{\n \"success\": \"osd_pool_default_crush_rule = ' 1' (not observed, change may require restart) \"\n}", "stdout_lines": ["{", " \"success\": \"osd_pool_default_crush_rule = '1' (not observed, change may require restart) \"", "}"]} 2018-10-10 08:56:45,752 p=32237 u=mistral | ok: [172.16.0.25 -> 172.16.0.25] => (item=172.16.0.25) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-lab-controller02", "ceph", "--cluster", "ceph", "daemon", "mon.lab-controller02", "config", "set", "osd_pool_defau lt_crush_rule", "1"], "delta": "0:00:00.227841", "end": "2018-10-10 12:56:47.878438", "failed": false, "item": "172.16.0.25", "rc": 0, "start": "2018-10-10 12:56:47.650597", "stderr": "", "stderr_lines": [], "stdout": "{\n \"success\": \"osd_pool_default_crush_rule = ' 1' (not observed, change may require restart) \"\n}", "stdout_lines": ["{", " \"success\": \"osd_pool_default_crush_rule = '1' (not observed, change may require restart) \"", "}"]} The "add new default crush rule to ceph.conf" task appears to run fine but the option is not present in ceph.conf ... 2018-10-10 08:56:45,793 p=32237 u=mistral | TASK [ceph-mon : add new default crush rule to ceph.conf] ********************** 2018-10-10 08:56:45,793 p=32237 u=mistral | Wednesday 10 October 2018 08:56:45 -0400 (0:00:01.444) 0:03:42.466 ***** 2018-10-10 08:56:46,335 p=32237 u=mistral | changed: [172.16.0.25 -> 172.16.0.22] => (item=172.16.0.22) => {"changed": true, "failed": false, "gid": 0, "group": "root", "item": "172.16.0.22", "mode": "0644", "msg": "option added", "owner": "root", "path": "/etc/ceph/ceph .conf", "secontext": "system_u:object_r:etc_t:s0", "size": 1973, "state": "file", "uid": 0} 2018-10-10 08:56:46,564 p=32237 u=mistral | changed: [172.16.0.25 -> 172.16.0.24] => (item=172.16.0.24) => {"changed": true, "failed": false, "gid": 0, "group": "root", "item": "172.16.0.24", "mode": "0644", "msg": "option added", "owner": "root", "path": "/etc/ceph/ceph .conf", "secontext": "system_u:object_r:etc_t:s0", "size": 1973, "state": "file", "uid": 0} 2018-10-10 08:56:46,811 p=32237 u=mistral | changed: [172.16.0.25 -> 172.16.0.25] => (item=172.16.0.25) => {"changed": true, "failed": false, "gid": 0, "group": "root", "item": "172.16.0.25", "mode": "0644", "msg": "option added", "owner": "root", "path": "/etc/ceph/ceph .conf", "secontext": "system_u:object_r:etc_t:s0", "size": 1973, "state": "file", "uid": 0} So my guess is that the default rule ID is indeed added to the Mons running config but adding it to ceph.conf somehow fails. The Mons are then restarted so they fall back to osd_pool_default_crush_rule = -1
So the issue is still present in ceph-ansible stable-3.2 but it depends on the ceph cluster architecture. As Greg reported, the osd_pool_default_crush_rule parameter seems to be apply correctly in the ceph.conf file (according to the ansible logs) but finally not present. In fact, the parameter is correctly set during the mons configuration (crush_rules.yml tasks [1]) but the mgrs configuration (which is executed after the mons) will erase it by reapplying the default ceph.conf template + overrides [2] (via the ceph-config role). So when I try with mon and mgr collocated on the same host I have the same issue than Greg. I suppose this come from a TripleO environment for him. Can you confirm this point ? When I try with mon and mgr on dedicated hosts, this works as expected # ceph osd pool create get_schwifty 64 pool 'get_schwifty' created # ceph osd pool get get_schwifty crush_rule crush_rule: standard # ceph osd crush rule ls replicated_rule standard ceph --admin-daemon /var/run/ceph/ceph-mon.mon0.asok config get osd_pool_default_crush_rule { "osd_pool_default_crush_rule": "1" } # grep osd_pool_default_crush_rule /etc/ceph/ceph.conf osd_pool_default_crush_rule = 1 [1] https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-mon/tasks/crush_rules.yml [2] https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-config/tasks/main.yml#L173-L189
Yes it comes from a TripleO environment. We talked about it with Seb some times ago and it indeed seems the config gets erased when the manager config gets written. IIRC the outcome was to avoid using a "ini_file" task.
Working fine with ceph-ansible-3.2.13-1.el7cp.noarch Moving to VERIFIED state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0911