Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2062621

Summary:	Unable to set "pg_autoscale_mode" to "off" in YAML templates
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Sergii Mykhailushko <smykhail>
Component:	Ceph-Ansible	Assignee:	Guillaume Abrioux <gabrioux>
Status:	CLOSED ERRATA	QA Contact:	Manasa <mgowri>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.0	CC:	aschoen, ceph-eng-bugs, csharpe, gjose, gmeno, lithomas, mgowri, mmuench, nthomas, tserlin, vereddy, ykaul
Target Milestone:	---
Target Release:	4.3z1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	ceph-ansible-4.0.70.4-1.el8cp, ceph-ansible-4.0.70.4-1.el7cp	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-09-22 11:21:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sergii Mykhailushko 2022-03-10 09:38:12 UTC

Description of problem:

It looks we're unable to disable PG autoscale for the specific pools via the YAML templates.

Here is one pool for example as it's described in the twemplate:

~~~
- application: cephfs
    min_size: 2
    name: manila_metadata
    pg_autoscale_mode: False   <--
    rule_name: replicated_hdd
    size: 3
    type: replicated
~~~

After deploying with that, checking the autoscale status, it's set to "warn", while as per our understaning, "pg_autoscale_mode: False" in the template should have set it to "off".

~~~
# ceph osd pool autoscale-status
POOL                  SIZE TARGET SIZE RATE RAW CAPACITY  RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE 
manila_metadata      4152M              3.0       448.2T 0.0000                               4.0     64         16 warn
...
~~~

Disabling autoscaling manually with "ceph osd pool set <poolname> pg_autoscale_mode off" works as expected, so the problem seems to be with the way ansible is parsing the template.

From ceph-ansible log we see that ansible sets the value to "warn":

~~~
2020-06-25 17:08:42,052 p=395161 u=root |  changed: [controller01 -> 1.2.3.4] => (item={'application': 'cephfs', 'name': 'manila_metadata', 'pg_num': '64', 'rule_name': 'replicated_rule'}) => changed=true 
  ansible_loop_var: item
  cmd:
  - podman
  - run
  - --rm
  - --net=host
  - -v
  - /etc/ceph:/etc/ceph:z
  - -v
  - /var/lib/ceph/:/var/lib/ceph/:z
  - -v
  - /var/log/ceph/:/var/log/ceph/:z
  - --entrypoint=ceph
  ...
  - --cluster
  - ceph                 <--
  - osd                  <--
  - pool                 <--
  - set                  <--
  - manila_metadata      <--
  - pg_autoscale_mode    <--
  - warn                 <--
  delta: '0:00:00.839304'
  end: '2020-06-25 17:08:42.028138'
  item:
    application: cephfs
    name: manila_metadata
    pg_num: '64'
    rule_name: replicated_rule
  rc: 0
  start: '2020-06-25 17:08:41.188834'
  stderr: set pool 6 pg_autoscale_mode to warn
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
~~~

Checking ceph-ansible code we see that the default setting for "pg_autoscale_mode" is False with the ternary of "on" and "warn", so not sure why we still get the latter:

https://github.com/ceph/ceph-ansible/blob/27b10488dbc018f0873b8487862b6fdf1210e6bc/roles/ceph-client/tasks/create_users_keys.yml#L117

Since YAML definition specify possible boolean values as:

~~~
y|Y|yes|Yes|YES|n|N|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF
~~~

setting  "pg_autoscale_mode: off" in the template has the same effect (no effect actually), and after the deploy we still get the "warn" value.

From the above it looks that currently we have no (documented) solution of how to disable PG autoscaling via the templates.


Version-Release number of selected component (if applicable):

ceph-ansible-4.0.62.8-1.el8cp.noarch (currently latest available for RHCS 4)


How reproducible:

Always.


Additional info:

This was reproduced in the OSP environment when deploying via director. But it should be also reproducible in a clean Ceph installation, since we're most probably dealing with ceph-ansible issue here.

Comment 14 errata-xmlrpc 2022-09-22 11:21:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6684