DescriptionSergii Mykhailushko
2022-03-10 09:38:12 UTC
Description of problem:
It looks we're unable to disable PG autoscale for the specific pools via the YAML templates.
Here is one pool for example as it's described in the twemplate:
~~~
- application: cephfs
min_size: 2
name: manila_metadata
pg_autoscale_mode: False <--
rule_name: replicated_hdd
size: 3
type: replicated
~~~
After deploying with that, checking the autoscale status, it's set to "warn", while as per our understaning, "pg_autoscale_mode: False" in the template should have set it to "off".
~~~
# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
manila_metadata 4152M 3.0 448.2T 0.0000 4.0 64 16 warn
...
~~~
Disabling autoscaling manually with "ceph osd pool set <poolname> pg_autoscale_mode off" works as expected, so the problem seems to be with the way ansible is parsing the template.
From ceph-ansible log we see that ansible sets the value to "warn":
~~~
2020-06-25 17:08:42,052 p=395161 u=root | changed: [controller01 -> 1.2.3.4] => (item={'application': 'cephfs', 'name': 'manila_metadata', 'pg_num': '64', 'rule_name': 'replicated_rule'}) => changed=true
ansible_loop_var: item
cmd:
- podman
- run
- --rm
- --net=host
- -v
- /etc/ceph:/etc/ceph:z
- -v
- /var/lib/ceph/:/var/lib/ceph/:z
- -v
- /var/log/ceph/:/var/log/ceph/:z
- --entrypoint=ceph
...
- --cluster
- ceph <--
- osd <--
- pool <--
- set <--
- manila_metadata <--
- pg_autoscale_mode <--
- warn <--
delta: '0:00:00.839304'
end: '2020-06-25 17:08:42.028138'
item:
application: cephfs
name: manila_metadata
pg_num: '64'
rule_name: replicated_rule
rc: 0
start: '2020-06-25 17:08:41.188834'
stderr: set pool 6 pg_autoscale_mode to warn
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
~~~
Checking ceph-ansible code we see that the default setting for "pg_autoscale_mode" is False with the ternary of "on" and "warn", so not sure why we still get the latter:
https://github.com/ceph/ceph-ansible/blob/27b10488dbc018f0873b8487862b6fdf1210e6bc/roles/ceph-client/tasks/create_users_keys.yml#L117
Since YAML definition specify possible boolean values as:
~~~
y|Y|yes|Yes|YES|n|N|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF
~~~
setting "pg_autoscale_mode: off" in the template has the same effect (no effect actually), and after the deploy we still get the "warn" value.
From the above it looks that currently we have no (documented) solution of how to disable PG autoscaling via the templates.
Version-Release number of selected component (if applicable):
ceph-ansible-4.0.62.8-1.el8cp.noarch (currently latest available for RHCS 4)
How reproducible:
Always.
Additional info:
This was reproduced in the OSP environment when deploying via director. But it should be also reproducible in a clean Ceph installation, since we're most probably dealing with ceph-ansible issue here.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 4.3 Bug Fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:6684
Description of problem: It looks we're unable to disable PG autoscale for the specific pools via the YAML templates. Here is one pool for example as it's described in the twemplate: ~~~ - application: cephfs min_size: 2 name: manila_metadata pg_autoscale_mode: False <-- rule_name: replicated_hdd size: 3 type: replicated ~~~ After deploying with that, checking the autoscale status, it's set to "warn", while as per our understaning, "pg_autoscale_mode: False" in the template should have set it to "off". ~~~ # ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE manila_metadata 4152M 3.0 448.2T 0.0000 4.0 64 16 warn ... ~~~ Disabling autoscaling manually with "ceph osd pool set <poolname> pg_autoscale_mode off" works as expected, so the problem seems to be with the way ansible is parsing the template. From ceph-ansible log we see that ansible sets the value to "warn": ~~~ 2020-06-25 17:08:42,052 p=395161 u=root | changed: [controller01 -> 1.2.3.4] => (item={'application': 'cephfs', 'name': 'manila_metadata', 'pg_num': '64', 'rule_name': 'replicated_rule'}) => changed=true ansible_loop_var: item cmd: - podman - run - --rm - --net=host - -v - /etc/ceph:/etc/ceph:z - -v - /var/lib/ceph/:/var/lib/ceph/:z - -v - /var/log/ceph/:/var/log/ceph/:z - --entrypoint=ceph ... - --cluster - ceph <-- - osd <-- - pool <-- - set <-- - manila_metadata <-- - pg_autoscale_mode <-- - warn <-- delta: '0:00:00.839304' end: '2020-06-25 17:08:42.028138' item: application: cephfs name: manila_metadata pg_num: '64' rule_name: replicated_rule rc: 0 start: '2020-06-25 17:08:41.188834' stderr: set pool 6 pg_autoscale_mode to warn stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ~~~ Checking ceph-ansible code we see that the default setting for "pg_autoscale_mode" is False with the ternary of "on" and "warn", so not sure why we still get the latter: https://github.com/ceph/ceph-ansible/blob/27b10488dbc018f0873b8487862b6fdf1210e6bc/roles/ceph-client/tasks/create_users_keys.yml#L117 Since YAML definition specify possible boolean values as: ~~~ y|Y|yes|Yes|YES|n|N|no|No|NO |true|True|TRUE|false|False|FALSE |on|On|ON|off|Off|OFF ~~~ setting "pg_autoscale_mode: off" in the template has the same effect (no effect actually), and after the deploy we still get the "warn" value. From the above it looks that currently we have no (documented) solution of how to disable PG autoscaling via the templates. Version-Release number of selected component (if applicable): ceph-ansible-4.0.62.8-1.el8cp.noarch (currently latest available for RHCS 4) How reproducible: Always. Additional info: This was reproduced in the OSP environment when deploying via director. But it should be also reproducible in a clean Ceph installation, since we're most probably dealing with ceph-ansible issue here.