2249017 – [CEE/sd][cephadm-ansible] Unable to get global configuration values via cephadm-ansible module ceph_config on RHCS 5.3z3

Bug 2249017 - [CEE/sd][cephadm-ansible] Unable to get global configuration values via cephadm-ansible module ceph_config on RHCS 5.3z3

Summary: [CEE/sd][cephadm-ansible] Unable to get global configuration values via cepha...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	5.3z6
Assignee:	Sridhar Seshasayee
QA Contact:	skanta
Docs Contact:	Ranjini M N
URL:
Whiteboard:
Depends On:	2213766 2250161
Blocks:	2258797
TreeView+	depends on / blocked

Reported:	2023-11-10 08:23 UTC by Sridhar Seshasayee
Modified:	2024-02-08 16:56 UTC (History)
CC List:	15 users (show)
Fixed In Version:	ceph-16.2.10-218.el8cp
Doc Type:	Bug Fix
Doc Text:	.The `ceph config dump` command output is now consistent Previously, the `ceph config dump` command without the pretty print formatted output showed the localized option name and its value. An example of a normalized vs localized option is shown below: ---- Normalized: mgr/dashboard/ssl_server_port Localized: mgr/dashboard/x/ssl_server_port ---- However, the pretty-printed (for example, JSON) version of the command only showed the normalized option name as shown in the example above. The `ceph config dump` command result was inconsistent between with and without the pretty-print option. With this fix, the output is consistent and always shows the localized option name when using the `ceph config dump --format TYPE` command, with `TYPE` as the pretty-print type.
Clone Of:	2213766
Environment:
Last Closed:	2024-02-08 16:56:51 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	63185	None	None	None	2023-11-10 08:27:58 UTC
Github	ceph ceph pull 53984	None	Merged	pacific: mon/ConfigMonitor: Show localized name in "config dump --format json" output	2023-11-10 08:27:58 UTC
Red Hat Issue Tracker	RHCEPH-7892	None	None	None	2023-11-10 08:24:12 UTC
Red Hat Product Errata	RHSA-2024:0745	None	None	None	2024-02-08 16:56:55 UTC

Description Sridhar Seshasayee 2023-11-10 08:23:48 UTC

+++ This bug was initially created as a clone of Bug #2213766 +++

Description of problem:

Unable to get Ceph configuration value via Cephadm ansible playbook using module ceph_config


Version-Release number of selected component (if applicable):
RHCS 5.3z3 (16.2.10-172.el8cp)

How reproducible:
It is happening all the time if we choose specific few ceph configuration parameters

Steps to Reproduce:
1. Write a cephadm-ansible playbook using ceph_config task
2. Try to get the value of a Ceph configuration (example, mgr/dashboard/<NODE_NAME>/server_addr)
3. The task fails to get the value, although same value can be retrieved via ceph config dump. 

Actual results:

It is failed to fetch the value

~~~
TASK [get the mgr/dashboard/<NODE_NAME>/server_addr configuration] ******************************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:7
ok: [rhcs5-admin] => changed=false 
  ansible_facts:
    discovered_interpreter_python: /usr/libexec/platform-python
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - dump
  - --format
  - json
  delta: '0:00:02.256076'
  end: '2023-06-07 18:02:25.640664'
  rc: 0
  start: '2023-06-07 18:02:23.384588'
  stderr: No value found for who=mgr option=mgr/dashboard/<NODE_NAME>/server_addr  <=== Unable to find the config parameter
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
~~~


Expected results:

It should be able to fetch the value


Additional info:

It is not happening for all the ceph configuration parameters, I have noticed it for the parameter mgr/dashboard/<NODE_NAME>/server_addr.

--- Additional comment from Tridibesh Chakraborty on 2023-06-09 09:17:01 UTC ---

I reproduced this issue on my test cluster and can get the same result.

~~~
TASK [get the mgr/dashboard/ceph1/server_addr configuration] ******************************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:7
ok: [rhcs5-admin] => changed=false 
  ansible_facts:
    discovered_interpreter_python: /usr/libexec/platform-python
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - dump
  - --format
  - json
  delta: '0:00:02.256076'
  end: '2023-06-07 18:02:25.640664'
  rc: 0
  start: '2023-06-07 18:02:23.384588'
  stderr: No value found for who=mgr option=mgr/dashboard/ceph1/server_addr  <=== Unable to find the config parameter
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
~~~

From the output I can see the task is executing below command to find the value. 

~~~
$ cephadm shell ceph config dump --format json
~~~

So, when I ran ceph config dump, I can get the value of this parameter mgr/dashboard/ceph1/server_addr

~~~
[root@rhcs5-admin ~]# cephadm shell ceph config dump | grep addr
Inferring fsid a395b05a-fb08-11ed-a3b4-001a4a00046e
Using recent ceph image registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:13c53f3e7d4801365b083f14c5a9606b373b3f3249c155240775dd268c220fcf
  mgr         advanced  mgr/dashboard/ceph1/server_addr        10.74.250.188                                                                                                     * 
[root@rhcs5-admin ~]# 
~~~

But if I try to run the same command as the task, I don't see this parameter there. 

~~~
[root@rhcs5-admin ~]# ceph config dump --format json|grep ceph1
[root@rhcs5-admin ~]# 
~~~

The problem here is the json format output shows as mgr/dashboard/server_addr instead of mgr/dashboard/ceph1/server_addr. That is why the playbook fails to get the value of the configuration. 

~~~
[root@rhcs5-admin ~]# ceph config dump --format json-pretty | grep -A 5 -B 3 addr
    },
    {
        "section": "mgr",
        "name": "mgr/dashboard/server_addr",
        "value": "10.74.250.188",
        "level": "advanced",
        "can_update_at_runtime": false,    <== can't be updated runtime
        "mask": ""
    },
~~~

I noticed that this parameter can't be updated at runtime. So this can be the reason it is not showing in the json format output after I have added the configuration. If this is not the reason, then it might be a bug. 

~~~
[root@rhcs5-admin ~]# ceph config help mgr/dashboard/ceph1/server_addr
mgr/dashboard/server_addr - 
  (str, advanced)
  Default: ::
  Can update at runtime: false
~~~

--- Additional comment from Tridibesh Chakraborty on 2023-06-14 06:06:51 UTC ---

Hello Guillaume,

Do you think here customer making any mistake or he is genuinely hitting a bug?

Thanks,
Tridibesh

--- Additional comment from Scott Ostapovicz on 2023-06-14 16:06:50 UTC ---

Missed the 5.3 z4 deadline.  Moving from z4 to z5.

--- Additional comment from Tridibesh Chakraborty on 2023-06-16 06:36:21 UTC ---

I have checked whether he have hit this same bug on RHCS 6 (17.2.5-75.el9cp). This issue is not there in RHCS 6. So looks like we are hitting here a bug on RHCS 5.3z3. 

Example:

~~~
[admin@rhcs5node1 cephadm-ansible]$ ansible-playbook -i hosts test.yml -vv
ansible-playbook [core 2.13.3]
  config file = /usr/share/cephadm-ansible/ansible.cfg
  configured module search path = ['/usr/share/cephadm-ansible/library']
  ansible python module location = /usr/lib/python3.9/site-packages/ansible
  ansible collection location = /home/admin/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible-playbook
  python version = 3.9.14 (main, Jan  9 2023, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
  jinja version = 3.1.2
  libyaml = True
Using /usr/share/cephadm-ansible/ansible.cfg as config file
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation warnings
 can be disabled by setting deprecation_warnings=False in ansible.cfg.
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
redirecting (type: callback) ansible.builtin.profile_tasks to ansible.posix.profile_tasks
[WARNING]: Skipping callback plugin 'profile_tasks', unable to load
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.

PLAYBOOK: test.yml ************************************************************************************************************************************************************************************************
1 plays in test.yml

PLAY [get mgr/dashboard/server_addr] ******************************************************************************************************************************************************************************
META: ran handlers

TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration] ***********************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:7
ok: [rhcs5node1] => changed=false 
  ansible_facts:
    discovered_interpreter_python: /usr/bin/python3
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - get
  - mgr
  - mgr/dashboard/rhcs5node1.example.com/server_addr
  delta: '0:00:03.022180'
  end: '2023-06-16 12:03:03.498535'
  rc: 0
  start: '2023-06-16 12:03:00.476355'
  stderr: |-
    Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de
    Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config
    Using ceph image with id '35949bb370c9' and tag 'latest' created on 2023-03-13 13:39:34 +0000 UTC
    registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:086ef365ba781876aaf144155f68237e6113dabfe9f1cc5bb0b553a64590a3c3
  stderr_lines: <omitted>
  stdout: who=mgr option=mgr/dashboard/rhcs5node1.example.com/server_addr value=10.74.250.90 already set. Skipping.
  stdout_lines: <omitted>

TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] *************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:15
ok: [rhcs5node1] => changed=false 
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - get
  - mgr
  - mgr/dashboard/rhcs5node1.example.com/server_addr
  delta: '0:00:02.753226'
  end: '2023-06-16 12:03:06.721271'
  rc: 0
  start: '2023-06-16 12:03:03.968045'
  stderr: |-
    Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de
    Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config
    Using ceph image with id '35949bb370c9' and tag 'latest' created on 2023-03-13 13:39:34 +0000 UTC
    registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:086ef365ba781876aaf144155f68237e6113dabfe9f1cc5bb0b553a64590a3c3
  stderr_lines: <omitted>
  stdout: 10.74.250.90
  stdout_lines: <omitted>

TASK [print current mgr/dashboard/rhcs5node1.example.com/server_addr setting] *************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:22
ok: [rhcs5node1] => 
  msg: the value of 'mgr/dashboard/rhcs5node1.example.com/server_addr' is 10.74.250.90
META: ran handlers
META: ran handlers

PLAY RECAP ********************************************************************************************************************************************************************************************************
rhcs5node1                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

~~~

Thanks,
Tridibesh

--- Additional comment from Tridibesh Chakraborty on 2023-06-16 10:24:10 UTC ---

Hi,

I have just update to the latest RHCS 6 (17.2.6-70.el9cp) and can see there also same issue ported from RHCS 5.3.z3. It was not present in RHCS 6.0. 

~~~
[admin@rhcs5node1 cephadm-ansible]$ ansible-playbook -i hosts test.yml -vv
ansible-playbook [core 2.13.3]
  config file = /usr/share/cephadm-ansible/ansible.cfg
  configured module search path = ['/usr/share/cephadm-ansible/library']
  ansible python module location = /usr/lib/python3.9/site-packages/ansible
  ansible collection location = /home/admin/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible-playbook
  python version = 3.9.14 (main, Jan  9 2023, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
  jinja version = 3.1.2
  libyaml = True
Using /usr/share/cephadm-ansible/ansible.cfg as config file
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation warnings
 can be disabled by setting deprecation_warnings=False in ansible.cfg.
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
redirecting (type: callback) ansible.builtin.yaml to community.general.yaml
redirecting (type: callback) ansible.builtin.profile_tasks to ansible.posix.profile_tasks
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.

PLAYBOOK: test.yml ************************************************************************************************************************************************************************************************
1 plays in test.yml

PLAY [get mgr/dashboard/server_addr] ******************************************************************************************************************************************************************************
META: ran handlers

TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration] ***********************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:7
Friday 16 June 2023  14:28:39 +0530 (0:00:00.042)       0:00:00.042 *********** 
changed: [rhcs5node1] => changed=true 
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - set
  - mgr
  - mgr/dashboard/rhcs5node1.example.com/server_addr
  - 10.74.250.90
  delta: '0:00:04.624535'
  end: '2023-06-16 14:28:44.847336'
  rc: 0
  start: '2023-06-16 14:28:40.222801'
  stderr: |-
    Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de
    Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config
    Using ceph image with id '5b153ed12055' and tag 'latest' created on 2023-06-02 13:33:37 +0000 UTC
    registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:01b39bf32df3c124a91115c9a8bcf1bceb3eb12c6c5068a6e99cf50908137bdb
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] *************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:15
Friday 16 June 2023  14:28:44 +0530 (0:00:05.598)       0:00:05.640 *********** 
ok: [rhcs5node1] => changed=false 
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - dump
  - --format
  - json
  delta: '0:00:02.190302'
  end: '2023-06-16 14:28:47.470659'
  rc: 0
  start: '2023-06-16 14:28:45.280357'
  stderr: No value found for who=mgr option=mgr/dashboard/rhcs5node1.example.com/server_addr
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

TASK [print current mgr/dashboard/rhcs5node1.example.com/server_addr setting] *************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:22
Friday 16 June 2023  14:28:47 +0530 (0:00:02.638)       0:00:08.279 *********** 
ok: [rhcs5node1] => 
  msg: 'the value of ''mgr/dashboard/rhcs5node1.example.com/server_addr'' is '
META: ran handlers
META: ran handlers

PLAY RECAP ********************************************************************************************************************************************************************************************************
rhcs5node1                 : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Friday 16 June 2023  14:28:47 +0530 (0:00:00.088)       0:00:08.367 *********** 
=============================================================================== 
set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration ----------------------------------------------------------------------------------------------------------------------------------- 5.60s
/usr/share/cephadm-ansible/test.yml:7 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration ------------------------------------------------------------------------------------------------------------------------------------- 2.64s
/usr/share/cephadm-ansible/test.yml:15 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
print current mgr/dashboard/rhcs5node1.example.com/server_addr setting ------------------------------------------------------------------------------------------------------------------------------------- 0.09s
/usr/share/cephadm-ansible/test.yml:22 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[admin@rhcs5node1 cephadm-ansible]$ 
~~~

It is unable to get the value as it is using `ceph config get --format json` command which is not returning the configuration value. 

Thanks,
Tridibesh

--- Additional comment from Tridibesh Chakraborty on 2023-07-10 04:06:50 UTC ---

Hello  Guillaume,

Good day!!

Can you please share your comments on this BZ? Customer is looking for an update as they are waiting for next update for long time. 

Thanks,
Tridibesh

--- Additional comment from Guillaume Abrioux on 2023-07-24 12:23:09 UTC ---

Hi Tridibsesh,

As you have noticed, the issue is that `ceph config dump --format json` doesn't report `mgr/dashboard/<node-name>/server_addr`.
Another confusing detail is that setting `mgr/dashboard/<node-name>/server_addr` seems to update `mgr/dashboard/server_addr`.

--- Additional comment from Tridibesh Chakraborty on 2023-07-25 10:28:14 UTC ---

Moving the BZ to RADOS as `ceph config dump` and `ceph config dump --format json` giving different output. As a result cephadm-ansible is picking up the wrong output. So the `ceph config dump --format json` output needs to be fixed. 

If you see the output, in plain output it is showing the configuration `mgr/dashboard/rhcs5-admin/server_addr`, but same is showing in json format output as `mgr/dashboard/server_addr` instead of `mgr/dashboard/<NODE>/server_addr`. 

ceph config dump output

~~~
[root@rhcs5-admin ~]# ceph config dump | grep server
  mgr                                    advanced  mgr/dashboard/ceph1/server_addr        10.74.250.188                                                                                                     * 
  mgr                                    advanced  mgr/dashboard/rhcs5-admin/server_addr  10.74.252.167                                                                                                     * 
  mgr                                    advanced  mgr/dashboard/ssl_server_port          8443                                                                                                              * 
[root@rhcs5-admin ~]# 
~~~

ceph config dump --format json output:

~~~
[root@rhcs5-admin ~]# ceph config dump --format json

[{"section":"global","name":"container_image","value":"registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:09fc3e5baf198614d70669a106eb87dbebee16d4e91484375778d4adbccadacd","level":"basic","can_update_at_runtime":false,"mask":""},{"section":"mon","name":"auth_allow_insecure_global_id_reclaim","value":"false","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mon","name":"log_file","value":"/var/log/ceph/$cluster-$name.log","level":"basic","can_update_at_runtime":false,"mask":""},{"section":"mon","name":"log_to_file","value":"false","level":"basic","can_update_at_runtime":true,"mask":""},{"section":"mon","name":"mon_data_avail_warn","value":"10","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mon","name":"public_network","value":"10.74.248.0/21","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"debug_mgr","value":"5/5","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mgr","name":"mgr/cephadm/container_init","value":"True","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/cephadm/migration_current","value":"5","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/ALERTMANAGER_API_HOST","value":"http://rhcs5-admin.example.com:9093","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/GRAFANA_API_SSL_VERIFY","value":"false","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/GRAFANA_API_URL","value":"https://rhcs5-admin.example.com:3000","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/PROMETHEUS_API_HOST","value":"http://rhcs5-admin.example.com:9095","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/RGW_API_ACCESS_KEY","value":"4U0QL7NR5OIS7RO5GBBT","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/RGW_API_SECRET_KEY","value":"eD8bRPz94UbaSjCmEzklex139OjCKIBm6qSQ186k","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/server_addr","value":"10.74.250.188","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/server_addr","value":"10.74.252.167","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/ssl_server_port","value":"8443","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/orchestrator/orchestrator","value":"cephadm","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"osd","name":"osd_memory_target_autotune","value":"true","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"osd","name":"osd_scrub_min_interval","value":"172800.000000","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mds","name":"mds_export_ephemeral_random_max","value":"0.100000","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mds.cephfs","name":"mds_join_fs","value":"cephfs","level":"basic","can_update_at_runtime":true,"mask":""},{"section":"client.rgw.myrgw.ceph1.jjzziy","name":"rgw_frontends","value":"beast port=80","level":"basic","can_update_at_runtime":false,"mask":""},{"section":"client.rgw.myrgw.ceph2.woatne","name":"rgw_frontends","value":"beast port=80","level":"basic","can_update_at_runtime":false,"mask":""}][root@rhcs5-admin ~]# 
[root@rhcs5-admin ~]# 
~~~

If it doesn't comes under RADOS, please move it to the concern team.

Thanks,
Tridibesh

--- Additional comment from Tridibesh Chakraborty on 2023-07-25 10:29:05 UTC ---

Please note, this behavior is same on RHCS 5.3z3, RHCS 5.3z4 and RHCS 6.1

--- Additional comment from Guillaume Abrioux on 2023-07-25 11:17:50 UTC ---

Not sure whether this is relevant, but if mgr/dashboard/<hostname>/server_addr and mgr/dashboard/server_addr are well two different parameters,
note that updating mgr/dashboard/<hostname>/server_addr updates mgr/dashboard/server_addr, see below:

[ceph: root@ceph-node0 /]# ceph config dump | grep server_addr
mgr           advanced  mgr/dashboard/ceph-node0/server_addr   192.168.9.12                                                                               *
[ceph: root@ceph-node0 /]# ceph config dump --format json | jq | grep -A 5 -B 3 server_addr
  },
  {
    "section": "mgr",
    "name": "mgr/dashboard/server_addr",
    "value": "192.168.9.12",
    "level": "advanced",
    "can_update_at_runtime": false,
    "mask": ""
  },
[ceph: root@ceph-node0 /]# ceph config set mgr mgr/dashboard/ceph-node0/server_addr 192.168.9.122
[ceph: root@ceph-node0 /]# ceph config dump --format json | jq | grep -A 5 -B 3 server_addr
  },
  {
    "section": "mgr",
    "name": "mgr/dashboard/server_addr",
    "value": "192.168.9.122",
    "level": "advanced",
    "can_update_at_runtime": false,
    "mask": ""
  },
[ceph: root@ceph-node0 /]# ceph config dump | grep server_addr
mgr           advanced  mgr/dashboard/ceph-node0/server_addr   192.168.9.122                                                                              *
[ceph: root@ceph-node0 /]#

--- Additional comment from Tridibesh Chakraborty on 2023-07-28 08:00:55 UTC ---

Hello Team,

Can someone from RADOS team please confirm if this is intended or we are hitting here a possible bug? If it doesn't fall under RADOS, will it be possible for you to guide me to the proper team who handles this?

Thanks,
Tridibesh

--- Additional comment from Radoslaw Zarzynski on 2023-07-28 12:56:48 UTC ---

Yeah, this might be an actual bug.
Sridhar, would you mind taking a look?

--- Additional comment from Tridibesh Chakraborty on 2023-08-03 10:06:22 UTC ---

Hello Sridhar,

Did you get a chance to look into this and can you please confirm whether we are hitting here any bug or not. Also if it is a bug, will it be possible to get a fix on RHCS 5.3z5?

Thanks,
Tridibesh

--- Additional comment from Sridhar Seshasayee on 2023-08-07 12:25:03 UTC ---

Hi Tridibesh,

I took a look into this and it appears that it's the way "config dump" command has been
working all along. Therefore, there's a difference between the outputs of "config dump"
and "config dump --format json" commands. The former prints the "localized" name which
includes the name of the instance (in this case 'x') and the latter prints the
'normalized' name (i.e. without the instance name).

Also, curiously the ansible script uses "config get" in cases where it succeeds in
getting the value for the option. For cases that fail, "config dump" is used. Can
you check why this switch was made from "config get" to "config dump"? I think if
the ansible script reverts to using "config get", the script will succeed.

Success Case:
-------------
TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] *************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:15
ok: [rhcs5node1] => changed=false 
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - get
  - mgr
  - mgr/dashboard/rhcs5node1.example.com/server_addr
  delta: '0:00:02.753226'
  end: '2023-06-16 12:03:06.721271'
  rc: 0
  start: '2023-06-16 12:03:03.968045'
  stderr: |-
    Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de
    Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config
    Using ceph image with id '35949bb370c9' and tag 'latest' created on 2023-03-13 13:39:34 +0000 UTC
    registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:086ef365ba781876aaf144155f68237e6113dabfe9f1cc5bb0b553a64590a3c3
  stderr_lines: <omitted>
  stdout: 10.74.250.90
  stdout_lines: <omitted>


Failure Case:
-------------
TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] *************************************************************************************************************************************
task path: /usr/share/cephadm-ansible/test.yml:15
Friday 16 June 2023  14:28:44 +0530 (0:00:05.598)       0:00:05.640 *********** 
ok: [rhcs5node1] => changed=false 
  cmd:
  - cephadm
  - shell
  - ceph
  - config
  - dump
  - --format
  - json
  delta: '0:00:02.190302'
  end: '2023-06-16 14:28:47.470659'
  rc: 0
  start: '2023-06-16 14:28:45.280357'
  stderr: No value found for who=mgr option=mgr/dashboard/rhcs5node1.example.com/server_addr
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Also, in comment#4, the task 
"TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration]" uses
"config get" instead of "config set" to set the server_addr. Can you please check
this?

But the fact remains that there's a difference in the "config dump" output which
probably needs to be fixed for consistency. I am trying to identify a fix for this.

But in the meantime, can you check if the ansible script succeeds in all cases
across releases if "config get" is used? This can very well fix the issue for
the moment until I can identify a fix for the "config dump" output.

Thanks,
-Sridhar

--- Additional comment from Tridibesh Chakraborty on 2023-08-08 04:03:43 UTC ---

Hello Sridhar,

Thanks for your time and analyzing the issue. 

>Also, curiously the ansible script uses "config get" in cases where it succeeds in getting the value for the option. For cases that fail, "config dump" is used. Can you check why this switch was made from "config get" to "config dump"? I think if the ansible script reverts to using "config get", the script will succeed.

I think this is because of the BZ# 2188319 where we were unable to get/set global configuration value. When I tried to find the cause, it appeared to me that `ceph config get` command was not working with global configuration values. As the setting the value via ansible module ceph_config was initially checking the current value via `ceph config get` which was not returning any value, it was unable to set. You can refer below example from 16.2.10-160.el8cp cluster. 

~~~
[root@rhcs5-admin cephadm-ansible]# ceph config set global osd_pool_default_size 2
[root@rhcs5-admin cephadm-ansible]# ceph config dump |grep osd_pool_default_size
global                advanced  osd_pool_default_size                  2                                                                                                                   
[root@rhcs5-admin cephadm-ansible]# 
[root@rhcs5-admin cephadm-ansible]# ceph config get global osd_pool_default_size
Error EINVAL: unrecognized entity 'global'
[root@rhcs5-admin cephadm-ansible]# ceph config get osd osd_pool_default_size
2
[root@rhcs5-admin cephadm-ansible]# 

~~~

Guillaume, can confirm on this more. 


>Also, in comment#4, the task "TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration]" uses "config get" instead of "config set" to set the server_addr. Can you please check this?

If I am not wrong in RHCS 6.0, it uses the old method where while setting a configuration parameter it first fetch the value and then it set it if the provided value is different than the current one. This is the same logic I was telling in my above ^^response.


>But in the meantime, can you check if the ansible script succeeds in all cases across releases if "config get" is used? This can very well fix the issue for the moment until I can identify a fix for the "config dump" output.

In this scenario, it will succeed, but again we will have the same problem with global configuration values which was fixed on BZ#2188319. 

All these points Guillaume can confirm whether my explanation is correct. 

Thanks,
Tridibesh

--- Additional comment from Sridhar Seshasayee on 2023-08-08 06:13:56 UTC ---

Hi Tridibesh,

Looking into the history of BZ# 2188319, I think the expectation from "config get global ..." is
incorrect. When a config setting is applied "globally" using "config set global ...", it implies
that the value is being changed across all the daemons (osd, mon, mgr & mds) and clients. See,

https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#confsec-global

However, there are certain rules mentioned further down in the above link that I am pasting below
that talks about precedence and this needs to be taken into account in the implementation:

-- Start
"Any given daemon will draw its settings from the global section, the daemon- or client-type
section, and the section sharing its name. Settings in the most-specific section take precedence
so precedence: for example, if the same option is specified in both global, mon, and mon.foo on
the same source (i.e. that is, in the same configuration file), the mon.foo setting will be used.

If multiple values of the same configuration option are specified in the same section, the last
value specified takes precedence.

Note that values from the local configuration file always take precedence over values from the
monitor configuration database, regardless of the section in which they appear."
-- End

Considering the above rule, the script MUST ensure that any <daemon> or <daemon>.<id> section
doesn't have the config setting that it wants to override globally.

Therefore, to verify the config change, the script can simply get the changed value from any of
the active daemon configuration sections. For e.g., in this case the following command may be
used,

$ ceph config get osd osd_pool_default_size

The above is a valid method to verify the changed value as you have already shown in comment#15.
The "ceph config get osd osd_pool_default_size" reported the changed value of 2 correctly.

Therefore, I think the script can revert "config dump" and use "config get" from any of the
relevant daemons. To be more accurate, the script just needs to keep a track of the daemon the
config value belongs to and then use that in the "config get" command provided the above rules
are adhered to.

-Sridhar

--- Additional comment from Guillaume Abrioux on 2023-08-08 07:25:03 UTC ---

(In reply to Sridhar Seshasayee from comment #16)

> -- Start
> "Any given daemon will draw its settings from the global section, the
> daemon- or client-type
> section, and the section sharing its name. Settings in the most-specific
> section take precedence
> so precedence: for example, if the same option is specified in both global,
> mon, and mon.foo on
> the same source (i.e. that is, in the same configuration file), the mon.foo
> setting will be used.
> 
> If multiple values of the same configuration option are specified in the
> same section, the last
> value specified takes precedence.
> 
> Note that values from the local configuration file always take precedence
> over values from the
> monitor configuration database, regardless of the section in which they
> appear."
> -- End
> 
> Considering the above rule, the script MUST ensure that any <daemon> or
> <daemon>.<id> section
> doesn't have the config setting that it wants to override globally.

This is why this is definitely easier to rely on `ceph config dump` output.
 
> Therefore, to verify the config change, the script can simply get the changed value from any of
> the active daemon configuration sections. For e.g., in this case the following command may be used,
> 
> $ ceph config get osd osd_pool_default_size
> 
> The above is a valid method to verify the changed value as you have already
> shown in comment#15.
> The "ceph config get osd osd_pool_default_size" reported the changed value
> of 2 correctly.


> Therefore, I think the script can revert "config dump" and use "config get" from any of the
> relevant daemons. To be more accurate, the script just needs to keep a track of the daemon the
> config value belongs to and then use that in the "config get" command  provided the above rules
> are adhered to.

At the end, that doesn't change the fact that `ceph config dump --format json` should be fixed.

If I understand it correctly, you are suggesting we implement more complexity in that Ansible module to work around
a valid bug in `ceph config dump`: `--format json` not honoured.

--- Additional comment from Sridhar Seshasayee on 2023-08-08 09:37:27 UTC ---

(In reply to Guillaume Abrioux from comment #17)


> 
> This is why this is definitely easier to rely on `ceph config dump` output.
>  

"config dump" merely provides a way to ascertain the global setting. Considering the
rules I posted in my previous comment, it still doesn't change the fact that the 
any <daemon>.<id> settings (if it exists) will not be overridden by the global setting.
Therefore, in any case, the existing daemon specific setting of the same config option
must be checked and removed before applying the global setting.


> 
> At the end, that doesn't change the fact that `ceph config dump --format
> json` should be fixed.
> 
> If I understand it correctly, you are suggesting we implement more
> complexity in that Ansible module to work around
> a valid bug in `ceph config dump`: `--format json` not honoured.

Sure, the "ceph config dump --format json" should be consistent and it's something
I am investigating to fix. Right now the json format of the output prints the
normalized value of the config option (i.e. without the instance name as mentioned
in comment#14). This can still be used to check daemon specific settings.

That said, I don't think implementing the complexity of checking daemon specific
config settings is a workaround. It is needed if an option is to be set globally.
Otherwise, there's a possibility that a subset of the daemons will use the global
setting and another subset will use daemon specific setting leading to unpredictable
behavior.

Therefore, I am suggesting a couple of things:

1. Ensure the global setting is truly global by removing any existing daemon setting
   for the same config option. This can be done by using "ceph config dump --format json"
   and checking for the normalized config option (if applicable) set for the daemon.
   The existing daemon setting may be removed using "ceph config rm <daemon>.<id> <option>"
   or "ceph config rm <daemon> <option>". The global setting may then be applied using
    "ceph config set global <option> <value>".

2. The global setting can then be verified as before using the appropriate daemon name
   (not "global") and the localized name (i.e. with the instance name) like below:

   ceph config get <daemon> <localized_option_name>

   This means revert to the original way to verify an option using daemon name and option
   name.

Point 1 above must be done regardless of the way the config setting is verified. 
Point 2 above can eventually use "ceph config dump --format json" when the output is fixed.
I am trying to ascertain if the current output of "ceph config dump --format json" is
intentional since this is how it has been working so far.

I hope the above clarifies things.

--- Additional comment from Sridhar Seshasayee on 2023-08-09 14:29:50 UTC ---

Hi Guillaume, Tridibesh,

I have raised PR: https://github.com/ceph/ceph/pull/52906 that fixes the output
of the "ceph config dump --format json" to be consistent across all output formats.

-Sridhar

--- Additional comment from Pawan on 2023-08-16 09:24:38 UTC ---

Since the GA for 5.3z5 is on 22nd, Any ETA on when the fix would land downstream?

--- Additional comment from Neha Ojha on 2023-08-16 16:16:59 UTC ---

(In reply to Pawan from comment #20)
> Since the GA for 5.3z5 is on 22nd, Any ETA on when the fix would land
> downstream?

Hi Pawan, I don't see a reason for us to push for the 22nd deadline, do you? I am changing the target to 7.x because the fix will need to land there first. We'll need to clone this BZ for 6.x and 5.x (if we find another release vehicle).

--- Additional comment from Pawan on 2023-08-17 02:34:43 UTC ---

(In reply to Neha Ojha from comment #21)
> (In reply to Pawan from comment #20)
> > Since the GA for 5.3z5 is on 22nd, Any ETA on when the fix would land
> > downstream?
> 
> Hi Pawan, I don't see a reason for us to push for the 22nd deadline, do you?
> I am changing the target to 7.x because the fix will need to land there
> first. We'll need to clone this BZ for 6.x and 5.x (if we find another
> release vehicle).

Sounds good Neha. Noted! Thanks for the update.

--- Additional comment from Sridhar Seshasayee on 2023-10-06 16:09:54 UTC ---

Added upstream Ceph tracker info.

--- Additional comment from Tridibesh Chakraborty on 2023-10-09 05:28:53 UTC ---

Hello Sridhar/Neha,

Are we planning to add this into RHCS 5.3z6 as from the release calendar I can see we are planning for RHCS 5.3z6 on next year Q1? 

Thanks,
Tridibeshoted! Thanks for the update.

--- Additional comment from Sridhar Seshasayee on 2023-10-10 07:46:36 UTC ---

Hello Tridibesh,

Yes, RHCS 5.3z6 is definitely possible. But I will let Neha take the final decision on this.
Neha is on PTO this week, and so you can expect a response from her sometime next week.

-Sridhar

--- Additional comment from Neha Ojha on 2023-10-26 19:52:41 UTC ---

(In reply to Sridhar Seshasayee from comment #25)
> Hello Tridibesh,
> 
> Yes, RHCS 5.3z6 is definitely possible. But I will let Neha take the final
> decision on this.
> Neha is on PTO this week, and so you can expect a response from her sometime
> next week.
> 
> -Sridhar

Sure, let's plan the fix for 5.3z6, which seems to be planned for March 2024 - we'll need to clone this BZ for 5.x and 6.x

Comment 9 errata-xmlrpc 2024-02-08 16:56:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:0745

Note You need to log in before you can comment on or make changes to this bug.