1638092 – Default crush rule is not enforced

Bug 1638092 - Default crush rule is not enforced

Summary: Default crush rule is not enforced

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	z2
Target Release:	3.2
Assignee:	Dimitri Savineau
QA Contact:	Vasishta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1578730
TreeView+	depends on / blocked

Reported:	2018-10-10 16:03 UTC by Gregory Charot
Modified:	2019-10-24 11:46 UTC (History)
CC List:	14 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.2.10-1.el7cp Ubuntu: ceph-ansible_3.2.10-2redhat1
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-04-30 15:56:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 3697	'None'	closed	Set the default crush rule in ceph.conf	2020-11-04 23:17:46 UTC
Github	ceph ceph-ansible pull 3711	'None'	closed	Automatic backport of pull request #3697	2020-11-04 23:18:03 UTC
Red Hat Product Errata	RHSA-2019:0911	None	None	None	2019-04-30 15:57:00 UTC

Description Gregory Charot 2018-10-10 16:03:40 UTC

Description of problem:

When using ceph ansible to define a crush rule it is possible to define it as the default rule. The default rule ID is not enforced

Version-Release number of selected component (if applicable):
ceph-ansible-3.1.5-1.el7cp.noarch

How reproducible:

Always

Steps to Reproduce:
1.
    crush_rule_config: true
    crush_rules:
      - name: standard
        root: standard_root
        type: rack
        default: true
      - name: fast
        root: fast_root
        type: rack
        default: false
2. Create a pool
3. Check the pool crush_rule value

Actual results:

Rules are created but the default rule is not enforced

Expected results:

Rule configured as default should become the default rule.

Additional info:

# ceph osd pool create get_schwifty 64

# ceph osd pool get get_schwifty crush_rule
crush_rule: replicated_rule

Rules are created:
# ceph osd crush rule ls
replicated_rule
standard
fast

Looking at the running mon config, the default rule is set -1 which means “pick the rule with the lowest numerical ID and use that"

# ceph --admin-daemon  /var/run/ceph/ceph-mon.lab-controller01.asok config get osd_pool_default_crush_rule
{
    "osd_pool_default_crush_rule": "-1"
}

Looking at the ceph.conf file, it should contain the osd_pool_default_crush_rule = 1 parameter [1] but it does not
# grep osd_pool_default_crush_rule /etc/ceph/ceph.conf
-> empty

[1] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mon/tasks/crush_rules.yml#L54

Looking at ceph-ansible logs we can see that the Mons "running" config seems to be correctly updated:

2018-10-10 08:56:44,348 p=32237 u=mistral |  TASK [ceph-mon : insert new default crush rule into daemon to prevent restart] ***
2018-10-10 08:56:44,348 p=32237 u=mistral |  Wednesday 10 October 2018  08:56:44 -0400 (0:00:00.083)       0:03:41.021 *****
2018-10-10 08:56:44,828 p=32237 u=mistral |  ok: [172.16.0.25 -> 172.16.0.22] => (item=172.16.0.22) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-lab-controller01", "ceph", "--cluster", "ceph", "daemon", "mon.lab-controller01", "config", "set", "osd_pool_defau
lt_crush_rule", "1"], "delta": "0:00:00.210402", "end": "2018-10-10 12:56:46.953167", "failed": false, "item": "172.16.0.22", "rc": 0, "start": "2018-10-10 12:56:46.742765", "stderr": "", "stderr_lines": [], "stdout": "{\n    \"success\": \"osd_pool_default_crush_rule = '
1' (not observed, change may require restart) \"\n}", "stdout_lines": ["{", "    \"success\": \"osd_pool_default_crush_rule = '1' (not observed, change may require restart) \"", "}"]}
2018-10-10 08:56:45,287 p=32237 u=mistral |  ok: [172.16.0.25 -> 172.16.0.24] => (item=172.16.0.24) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-lab-controller03", "ceph", "--cluster", "ceph", "daemon", "mon.lab-controller03", "config", "set", "osd_pool_defau
lt_crush_rule", "1"], "delta": "0:00:00.211802", "end": "2018-10-10 12:56:47.413902", "failed": false, "item": "172.16.0.24", "rc": 0, "start": "2018-10-10 12:56:47.202100", "stderr": "", "stderr_lines": [], "stdout": "{\n    \"success\": \"osd_pool_default_crush_rule = '
1' (not observed, change may require restart) \"\n}", "stdout_lines": ["{", "    \"success\": \"osd_pool_default_crush_rule = '1' (not observed, change may require restart) \"", "}"]}
2018-10-10 08:56:45,752 p=32237 u=mistral |  ok: [172.16.0.25 -> 172.16.0.25] => (item=172.16.0.25) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-lab-controller02", "ceph", "--cluster", "ceph", "daemon", "mon.lab-controller02", "config", "set", "osd_pool_defau
lt_crush_rule", "1"], "delta": "0:00:00.227841", "end": "2018-10-10 12:56:47.878438", "failed": false, "item": "172.16.0.25", "rc": 0, "start": "2018-10-10 12:56:47.650597", "stderr": "", "stderr_lines": [], "stdout": "{\n    \"success\": \"osd_pool_default_crush_rule = '
1' (not observed, change may require restart) \"\n}", "stdout_lines": ["{", "    \"success\": \"osd_pool_default_crush_rule = '1' (not observed, change may require restart) \"", "}"]}

The "add new default crush rule to ceph.conf" task appears to run fine but the option is not present in ceph.conf ...

2018-10-10 08:56:45,793 p=32237 u=mistral |  TASK [ceph-mon : add new default crush rule to ceph.conf] **********************
2018-10-10 08:56:45,793 p=32237 u=mistral |  Wednesday 10 October 2018  08:56:45 -0400 (0:00:01.444)       0:03:42.466 *****
2018-10-10 08:56:46,335 p=32237 u=mistral |  changed: [172.16.0.25 -> 172.16.0.22] => (item=172.16.0.22) => {"changed": true, "failed": false, "gid": 0, "group": "root", "item": "172.16.0.22", "mode": "0644", "msg": "option added", "owner": "root", "path": "/etc/ceph/ceph
.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 1973, "state": "file", "uid": 0}
2018-10-10 08:56:46,564 p=32237 u=mistral |  changed: [172.16.0.25 -> 172.16.0.24] => (item=172.16.0.24) => {"changed": true, "failed": false, "gid": 0, "group": "root", "item": "172.16.0.24", "mode": "0644", "msg": "option added", "owner": "root", "path": "/etc/ceph/ceph
.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 1973, "state": "file", "uid": 0}
2018-10-10 08:56:46,811 p=32237 u=mistral |  changed: [172.16.0.25 -> 172.16.0.25] => (item=172.16.0.25) => {"changed": true, "failed": false, "gid": 0, "group": "root", "item": "172.16.0.25", "mode": "0644", "msg": "option added", "owner": "root", "path": "/etc/ceph/ceph
.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 1973, "state": "file", "uid": 0}

So my guess is that the default rule ID is indeed added to the Mons running config but adding it to ceph.conf somehow fails. The Mons are then restarted so they fall back to osd_pool_default_crush_rule = -1

Comment 3 Dimitri Savineau 2019-03-05 21:39:10 UTC

So the issue is still present in ceph-ansible stable-3.2 but it depends on the ceph cluster architecture.

As Greg reported, the osd_pool_default_crush_rule parameter seems to be apply correctly in the ceph.conf file (according to the ansible logs) but finally not present.

In fact, the parameter is correctly set during the mons configuration (crush_rules.yml tasks [1]) but the mgrs configuration (which is executed after the mons) will erase it by reapplying the default ceph.conf template + overrides [2] (via the ceph-config role).

So when I try with mon and mgr collocated on the same host I have the same issue than Greg.
I suppose this come from a TripleO environment for him. Can you confirm this point ?

When I try with mon and mgr on dedicated hosts, this works as expected

# ceph osd pool create get_schwifty 64
pool 'get_schwifty' created
# ceph osd pool get get_schwifty crush_rule
crush_rule: standard
# ceph osd crush rule ls
replicated_rule
standard
ceph --admin-daemon /var/run/ceph/ceph-mon.mon0.asok config get osd_pool_default_crush_rule
{
    "osd_pool_default_crush_rule": "1"
}
# grep osd_pool_default_crush_rule /etc/ceph/ceph.conf 
osd_pool_default_crush_rule = 1


[1] https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-mon/tasks/crush_rules.yml
[2] https://github.com/ceph/ceph-ansible/blob/stable-3.2/roles/ceph-config/tasks/main.yml#L173-L189

Comment 4 Gregory Charot 2019-03-06 10:16:55 UTC

Yes it comes from a TripleO environment. We talked about it with Seb some times ago and it indeed seems the config gets erased when the manager config gets written. IIRC the outcome was to avoid using a "ini_file" task.

Comment 8 Vasishta 2019-04-24 09:15:43 UTC

Working fine with ceph-ansible-3.2.13-1.el7cp.noarch
Moving to VERIFIED state.

Comment 10 errata-xmlrpc 2019-04-30 15:56:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911

Note You need to log in before you can comment on or make changes to this bug.