+++ This bug was initially created as a clone of Bug #2213766 +++ Description of problem: Unable to get Ceph configuration value via Cephadm ansible playbook using module ceph_config Version-Release number of selected component (if applicable): RHCS 5.3z3 (16.2.10-172.el8cp) How reproducible: It is happening all the time if we choose specific few ceph configuration parameters Steps to Reproduce: 1. Write a cephadm-ansible playbook using ceph_config task 2. Try to get the value of a Ceph configuration (example, mgr/dashboard/<NODE_NAME>/server_addr) 3. The task fails to get the value, although same value can be retrieved via ceph config dump. Actual results: It is failed to fetch the value ~~~ TASK [get the mgr/dashboard/<NODE_NAME>/server_addr configuration] ****************************************************************************************************************************************************** task path: /usr/share/cephadm-ansible/test.yml:7 ok: [rhcs5-admin] => changed=false ansible_facts: discovered_interpreter_python: /usr/libexec/platform-python cmd: - cephadm - shell - ceph - config - dump - --format - json delta: '0:00:02.256076' end: '2023-06-07 18:02:25.640664' rc: 0 start: '2023-06-07 18:02:23.384588' stderr: No value found for who=mgr option=mgr/dashboard/<NODE_NAME>/server_addr <=== Unable to find the config parameter stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ~~~ Expected results: It should be able to fetch the value Additional info: It is not happening for all the ceph configuration parameters, I have noticed it for the parameter mgr/dashboard/<NODE_NAME>/server_addr. --- Additional comment from Tridibesh Chakraborty on 2023-06-09 09:17:01 UTC --- I reproduced this issue on my test cluster and can get the same result. ~~~ TASK [get the mgr/dashboard/ceph1/server_addr configuration] ****************************************************************************************************************************************************** task path: /usr/share/cephadm-ansible/test.yml:7 ok: [rhcs5-admin] => changed=false ansible_facts: discovered_interpreter_python: /usr/libexec/platform-python cmd: - cephadm - shell - ceph - config - dump - --format - json delta: '0:00:02.256076' end: '2023-06-07 18:02:25.640664' rc: 0 start: '2023-06-07 18:02:23.384588' stderr: No value found for who=mgr option=mgr/dashboard/ceph1/server_addr <=== Unable to find the config parameter stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> ~~~ From the output I can see the task is executing below command to find the value. ~~~ $ cephadm shell ceph config dump --format json ~~~ So, when I ran ceph config dump, I can get the value of this parameter mgr/dashboard/ceph1/server_addr ~~~ [root@rhcs5-admin ~]# cephadm shell ceph config dump | grep addr Inferring fsid a395b05a-fb08-11ed-a3b4-001a4a00046e Using recent ceph image registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:13c53f3e7d4801365b083f14c5a9606b373b3f3249c155240775dd268c220fcf mgr advanced mgr/dashboard/ceph1/server_addr 10.74.250.188 * [root@rhcs5-admin ~]# ~~~ But if I try to run the same command as the task, I don't see this parameter there. ~~~ [root@rhcs5-admin ~]# ceph config dump --format json|grep ceph1 [root@rhcs5-admin ~]# ~~~ The problem here is the json format output shows as mgr/dashboard/server_addr instead of mgr/dashboard/ceph1/server_addr. That is why the playbook fails to get the value of the configuration. ~~~ [root@rhcs5-admin ~]# ceph config dump --format json-pretty | grep -A 5 -B 3 addr }, { "section": "mgr", "name": "mgr/dashboard/server_addr", "value": "10.74.250.188", "level": "advanced", "can_update_at_runtime": false, <== can't be updated runtime "mask": "" }, ~~~ I noticed that this parameter can't be updated at runtime. So this can be the reason it is not showing in the json format output after I have added the configuration. If this is not the reason, then it might be a bug. ~~~ [root@rhcs5-admin ~]# ceph config help mgr/dashboard/ceph1/server_addr mgr/dashboard/server_addr - (str, advanced) Default: :: Can update at runtime: false ~~~ --- Additional comment from Tridibesh Chakraborty on 2023-06-14 06:06:51 UTC --- Hello Guillaume, Do you think here customer making any mistake or he is genuinely hitting a bug? Thanks, Tridibesh --- Additional comment from Scott Ostapovicz on 2023-06-14 16:06:50 UTC --- Missed the 5.3 z4 deadline. Moving from z4 to z5. --- Additional comment from Tridibesh Chakraborty on 2023-06-16 06:36:21 UTC --- I have checked whether he have hit this same bug on RHCS 6 (17.2.5-75.el9cp). This issue is not there in RHCS 6. So looks like we are hitting here a bug on RHCS 5.3z3. Example: ~~~ [admin@rhcs5node1 cephadm-ansible]$ ansible-playbook -i hosts test.yml -vv ansible-playbook [core 2.13.3] config file = /usr/share/cephadm-ansible/ansible.cfg configured module search path = ['/usr/share/cephadm-ansible/library'] ansible python module location = /usr/lib/python3.9/site-packages/ansible ansible collection location = /home/admin/.ansible/collections:/usr/share/ansible/collections executable location = /usr/bin/ansible-playbook python version = 3.9.14 (main, Jan 9 2023, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)] jinja version = 3.1.2 libyaml = True Using /usr/share/cephadm-ansible/ansible.cfg as config file [DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. redirecting (type: callback) ansible.builtin.yaml to community.general.yaml redirecting (type: callback) ansible.builtin.yaml to community.general.yaml redirecting (type: callback) ansible.builtin.profile_tasks to ansible.posix.profile_tasks [WARNING]: Skipping callback plugin 'profile_tasks', unable to load Skipping callback 'default', as we already have a stdout callback. Skipping callback 'minimal', as we already have a stdout callback. Skipping callback 'oneline', as we already have a stdout callback. PLAYBOOK: test.yml ************************************************************************************************************************************************************************************************ 1 plays in test.yml PLAY [get mgr/dashboard/server_addr] ****************************************************************************************************************************************************************************** META: ran handlers TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration] *********************************************************************************************************************************** task path: /usr/share/cephadm-ansible/test.yml:7 ok: [rhcs5node1] => changed=false ansible_facts: discovered_interpreter_python: /usr/bin/python3 cmd: - cephadm - shell - ceph - config - get - mgr - mgr/dashboard/rhcs5node1.example.com/server_addr delta: '0:00:03.022180' end: '2023-06-16 12:03:03.498535' rc: 0 start: '2023-06-16 12:03:00.476355' stderr: |- Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config Using ceph image with id '35949bb370c9' and tag 'latest' created on 2023-03-13 13:39:34 +0000 UTC registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:086ef365ba781876aaf144155f68237e6113dabfe9f1cc5bb0b553a64590a3c3 stderr_lines: <omitted> stdout: who=mgr option=mgr/dashboard/rhcs5node1.example.com/server_addr value=10.74.250.90 already set. Skipping. stdout_lines: <omitted> TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] ************************************************************************************************************************************* task path: /usr/share/cephadm-ansible/test.yml:15 ok: [rhcs5node1] => changed=false cmd: - cephadm - shell - ceph - config - get - mgr - mgr/dashboard/rhcs5node1.example.com/server_addr delta: '0:00:02.753226' end: '2023-06-16 12:03:06.721271' rc: 0 start: '2023-06-16 12:03:03.968045' stderr: |- Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config Using ceph image with id '35949bb370c9' and tag 'latest' created on 2023-03-13 13:39:34 +0000 UTC registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:086ef365ba781876aaf144155f68237e6113dabfe9f1cc5bb0b553a64590a3c3 stderr_lines: <omitted> stdout: 10.74.250.90 stdout_lines: <omitted> TASK [print current mgr/dashboard/rhcs5node1.example.com/server_addr setting] ************************************************************************************************************************************* task path: /usr/share/cephadm-ansible/test.yml:22 ok: [rhcs5node1] => msg: the value of 'mgr/dashboard/rhcs5node1.example.com/server_addr' is 10.74.250.90 META: ran handlers META: ran handlers PLAY RECAP ******************************************************************************************************************************************************************************************************** rhcs5node1 : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ~~~ Thanks, Tridibesh --- Additional comment from Tridibesh Chakraborty on 2023-06-16 10:24:10 UTC --- Hi, I have just update to the latest RHCS 6 (17.2.6-70.el9cp) and can see there also same issue ported from RHCS 5.3.z3. It was not present in RHCS 6.0. ~~~ [admin@rhcs5node1 cephadm-ansible]$ ansible-playbook -i hosts test.yml -vv ansible-playbook [core 2.13.3] config file = /usr/share/cephadm-ansible/ansible.cfg configured module search path = ['/usr/share/cephadm-ansible/library'] ansible python module location = /usr/lib/python3.9/site-packages/ansible ansible collection location = /home/admin/.ansible/collections:/usr/share/ansible/collections executable location = /usr/bin/ansible-playbook python version = 3.9.14 (main, Jan 9 2023, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)] jinja version = 3.1.2 libyaml = True Using /usr/share/cephadm-ansible/ansible.cfg as config file [DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. redirecting (type: callback) ansible.builtin.yaml to community.general.yaml redirecting (type: callback) ansible.builtin.yaml to community.general.yaml redirecting (type: callback) ansible.builtin.profile_tasks to ansible.posix.profile_tasks Skipping callback 'default', as we already have a stdout callback. Skipping callback 'minimal', as we already have a stdout callback. Skipping callback 'oneline', as we already have a stdout callback. PLAYBOOK: test.yml ************************************************************************************************************************************************************************************************ 1 plays in test.yml PLAY [get mgr/dashboard/server_addr] ****************************************************************************************************************************************************************************** META: ran handlers TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration] *********************************************************************************************************************************** task path: /usr/share/cephadm-ansible/test.yml:7 Friday 16 June 2023 14:28:39 +0530 (0:00:00.042) 0:00:00.042 *********** changed: [rhcs5node1] => changed=true cmd: - cephadm - shell - ceph - config - set - mgr - mgr/dashboard/rhcs5node1.example.com/server_addr - 10.74.250.90 delta: '0:00:04.624535' end: '2023-06-16 14:28:44.847336' rc: 0 start: '2023-06-16 14:28:40.222801' stderr: |- Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config Using ceph image with id '5b153ed12055' and tag 'latest' created on 2023-06-02 13:33:37 +0000 UTC registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:01b39bf32df3c124a91115c9a8bcf1bceb3eb12c6c5068a6e99cf50908137bdb stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] ************************************************************************************************************************************* task path: /usr/share/cephadm-ansible/test.yml:15 Friday 16 June 2023 14:28:44 +0530 (0:00:05.598) 0:00:05.640 *********** ok: [rhcs5node1] => changed=false cmd: - cephadm - shell - ceph - config - dump - --format - json delta: '0:00:02.190302' end: '2023-06-16 14:28:47.470659' rc: 0 start: '2023-06-16 14:28:45.280357' stderr: No value found for who=mgr option=mgr/dashboard/rhcs5node1.example.com/server_addr stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> TASK [print current mgr/dashboard/rhcs5node1.example.com/server_addr setting] ************************************************************************************************************************************* task path: /usr/share/cephadm-ansible/test.yml:22 Friday 16 June 2023 14:28:47 +0530 (0:00:02.638) 0:00:08.279 *********** ok: [rhcs5node1] => msg: 'the value of ''mgr/dashboard/rhcs5node1.example.com/server_addr'' is ' META: ran handlers META: ran handlers PLAY RECAP ******************************************************************************************************************************************************************************************************** rhcs5node1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Friday 16 June 2023 14:28:47 +0530 (0:00:00.088) 0:00:08.367 *********** =============================================================================== set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration ----------------------------------------------------------------------------------------------------------------------------------- 5.60s /usr/share/cephadm-ansible/test.yml:7 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration ------------------------------------------------------------------------------------------------------------------------------------- 2.64s /usr/share/cephadm-ansible/test.yml:15 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- print current mgr/dashboard/rhcs5node1.example.com/server_addr setting ------------------------------------------------------------------------------------------------------------------------------------- 0.09s /usr/share/cephadm-ansible/test.yml:22 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [admin@rhcs5node1 cephadm-ansible]$ ~~~ It is unable to get the value as it is using `ceph config get --format json` command which is not returning the configuration value. Thanks, Tridibesh --- Additional comment from Tridibesh Chakraborty on 2023-07-10 04:06:50 UTC --- Hello Guillaume, Good day!! Can you please share your comments on this BZ? Customer is looking for an update as they are waiting for next update for long time. Thanks, Tridibesh --- Additional comment from Guillaume Abrioux on 2023-07-24 12:23:09 UTC --- Hi Tridibsesh, As you have noticed, the issue is that `ceph config dump --format json` doesn't report `mgr/dashboard/<node-name>/server_addr`. Another confusing detail is that setting `mgr/dashboard/<node-name>/server_addr` seems to update `mgr/dashboard/server_addr`. --- Additional comment from Tridibesh Chakraborty on 2023-07-25 10:28:14 UTC --- Moving the BZ to RADOS as `ceph config dump` and `ceph config dump --format json` giving different output. As a result cephadm-ansible is picking up the wrong output. So the `ceph config dump --format json` output needs to be fixed. If you see the output, in plain output it is showing the configuration `mgr/dashboard/rhcs5-admin/server_addr`, but same is showing in json format output as `mgr/dashboard/server_addr` instead of `mgr/dashboard/<NODE>/server_addr`. ceph config dump output ~~~ [root@rhcs5-admin ~]# ceph config dump | grep server mgr advanced mgr/dashboard/ceph1/server_addr 10.74.250.188 * mgr advanced mgr/dashboard/rhcs5-admin/server_addr 10.74.252.167 * mgr advanced mgr/dashboard/ssl_server_port 8443 * [root@rhcs5-admin ~]# ~~~ ceph config dump --format json output: ~~~ [root@rhcs5-admin ~]# ceph config dump --format json [{"section":"global","name":"container_image","value":"registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:09fc3e5baf198614d70669a106eb87dbebee16d4e91484375778d4adbccadacd","level":"basic","can_update_at_runtime":false,"mask":""},{"section":"mon","name":"auth_allow_insecure_global_id_reclaim","value":"false","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mon","name":"log_file","value":"/var/log/ceph/$cluster-$name.log","level":"basic","can_update_at_runtime":false,"mask":""},{"section":"mon","name":"log_to_file","value":"false","level":"basic","can_update_at_runtime":true,"mask":""},{"section":"mon","name":"mon_data_avail_warn","value":"10","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mon","name":"public_network","value":"10.74.248.0/21","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"debug_mgr","value":"5/5","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mgr","name":"mgr/cephadm/container_init","value":"True","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/cephadm/migration_current","value":"5","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/ALERTMANAGER_API_HOST","value":"http://rhcs5-admin.example.com:9093","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/GRAFANA_API_SSL_VERIFY","value":"false","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/GRAFANA_API_URL","value":"https://rhcs5-admin.example.com:3000","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/PROMETHEUS_API_HOST","value":"http://rhcs5-admin.example.com:9095","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/RGW_API_ACCESS_KEY","value":"4U0QL7NR5OIS7RO5GBBT","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/RGW_API_SECRET_KEY","value":"eD8bRPz94UbaSjCmEzklex139OjCKIBm6qSQ186k","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/server_addr","value":"10.74.250.188","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/server_addr","value":"10.74.252.167","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/dashboard/ssl_server_port","value":"8443","level":"advanced","can_update_at_runtime":false,"mask":""},{"section":"mgr","name":"mgr/orchestrator/orchestrator","value":"cephadm","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"osd","name":"osd_memory_target_autotune","value":"true","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"osd","name":"osd_scrub_min_interval","value":"172800.000000","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mds","name":"mds_export_ephemeral_random_max","value":"0.100000","level":"advanced","can_update_at_runtime":true,"mask":""},{"section":"mds.cephfs","name":"mds_join_fs","value":"cephfs","level":"basic","can_update_at_runtime":true,"mask":""},{"section":"client.rgw.myrgw.ceph1.jjzziy","name":"rgw_frontends","value":"beast port=80","level":"basic","can_update_at_runtime":false,"mask":""},{"section":"client.rgw.myrgw.ceph2.woatne","name":"rgw_frontends","value":"beast port=80","level":"basic","can_update_at_runtime":false,"mask":""}][root@rhcs5-admin ~]# [root@rhcs5-admin ~]# ~~~ If it doesn't comes under RADOS, please move it to the concern team. Thanks, Tridibesh --- Additional comment from Tridibesh Chakraborty on 2023-07-25 10:29:05 UTC --- Please note, this behavior is same on RHCS 5.3z3, RHCS 5.3z4 and RHCS 6.1 --- Additional comment from Guillaume Abrioux on 2023-07-25 11:17:50 UTC --- Not sure whether this is relevant, but if mgr/dashboard/<hostname>/server_addr and mgr/dashboard/server_addr are well two different parameters, note that updating mgr/dashboard/<hostname>/server_addr updates mgr/dashboard/server_addr, see below: [ceph: root@ceph-node0 /]# ceph config dump | grep server_addr mgr advanced mgr/dashboard/ceph-node0/server_addr 192.168.9.12 * [ceph: root@ceph-node0 /]# ceph config dump --format json | jq | grep -A 5 -B 3 server_addr }, { "section": "mgr", "name": "mgr/dashboard/server_addr", "value": "192.168.9.12", "level": "advanced", "can_update_at_runtime": false, "mask": "" }, [ceph: root@ceph-node0 /]# ceph config set mgr mgr/dashboard/ceph-node0/server_addr 192.168.9.122 [ceph: root@ceph-node0 /]# ceph config dump --format json | jq | grep -A 5 -B 3 server_addr }, { "section": "mgr", "name": "mgr/dashboard/server_addr", "value": "192.168.9.122", "level": "advanced", "can_update_at_runtime": false, "mask": "" }, [ceph: root@ceph-node0 /]# ceph config dump | grep server_addr mgr advanced mgr/dashboard/ceph-node0/server_addr 192.168.9.122 * [ceph: root@ceph-node0 /]# --- Additional comment from Tridibesh Chakraborty on 2023-07-28 08:00:55 UTC --- Hello Team, Can someone from RADOS team please confirm if this is intended or we are hitting here a possible bug? If it doesn't fall under RADOS, will it be possible for you to guide me to the proper team who handles this? Thanks, Tridibesh --- Additional comment from Radoslaw Zarzynski on 2023-07-28 12:56:48 UTC --- Yeah, this might be an actual bug. Sridhar, would you mind taking a look? --- Additional comment from Tridibesh Chakraborty on 2023-08-03 10:06:22 UTC --- Hello Sridhar, Did you get a chance to look into this and can you please confirm whether we are hitting here any bug or not. Also if it is a bug, will it be possible to get a fix on RHCS 5.3z5? Thanks, Tridibesh --- Additional comment from Sridhar Seshasayee on 2023-08-07 12:25:03 UTC --- Hi Tridibesh, I took a look into this and it appears that it's the way "config dump" command has been working all along. Therefore, there's a difference between the outputs of "config dump" and "config dump --format json" commands. The former prints the "localized" name which includes the name of the instance (in this case 'x') and the latter prints the 'normalized' name (i.e. without the instance name). Also, curiously the ansible script uses "config get" in cases where it succeeds in getting the value for the option. For cases that fail, "config dump" is used. Can you check why this switch was made from "config get" to "config dump"? I think if the ansible script reverts to using "config get", the script will succeed. Success Case: ------------- TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] ************************************************************************************************************************************* task path: /usr/share/cephadm-ansible/test.yml:15 ok: [rhcs5node1] => changed=false cmd: - cephadm - shell - ceph - config - get - mgr - mgr/dashboard/rhcs5node1.example.com/server_addr delta: '0:00:02.753226' end: '2023-06-16 12:03:06.721271' rc: 0 start: '2023-06-16 12:03:03.968045' stderr: |- Inferring fsid 7ba9be50-cd8c-11ed-9c6f-001a4a0004de Inferring config /var/lib/ceph/7ba9be50-cd8c-11ed-9c6f-001a4a0004de/mon.rhcs5node1.example.com/config Using ceph image with id '35949bb370c9' and tag 'latest' created on 2023-03-13 13:39:34 +0000 UTC registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:086ef365ba781876aaf144155f68237e6113dabfe9f1cc5bb0b553a64590a3c3 stderr_lines: <omitted> stdout: 10.74.250.90 stdout_lines: <omitted> Failure Case: ------------- TASK [get the mgr/dashboard/rhcs5node1.example.com/server_addr configuration] ************************************************************************************************************************************* task path: /usr/share/cephadm-ansible/test.yml:15 Friday 16 June 2023 14:28:44 +0530 (0:00:05.598) 0:00:05.640 *********** ok: [rhcs5node1] => changed=false cmd: - cephadm - shell - ceph - config - dump - --format - json delta: '0:00:02.190302' end: '2023-06-16 14:28:47.470659' rc: 0 start: '2023-06-16 14:28:45.280357' stderr: No value found for who=mgr option=mgr/dashboard/rhcs5node1.example.com/server_addr stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> Also, in comment#4, the task "TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration]" uses "config get" instead of "config set" to set the server_addr. Can you please check this? But the fact remains that there's a difference in the "config dump" output which probably needs to be fixed for consistency. I am trying to identify a fix for this. But in the meantime, can you check if the ansible script succeeds in all cases across releases if "config get" is used? This can very well fix the issue for the moment until I can identify a fix for the "config dump" output. Thanks, -Sridhar --- Additional comment from Tridibesh Chakraborty on 2023-08-08 04:03:43 UTC --- Hello Sridhar, Thanks for your time and analyzing the issue. >Also, curiously the ansible script uses "config get" in cases where it succeeds in getting the value for the option. For cases that fail, "config dump" is used. Can you check why this switch was made from "config get" to "config dump"? I think if the ansible script reverts to using "config get", the script will succeed. I think this is because of the BZ# 2188319 where we were unable to get/set global configuration value. When I tried to find the cause, it appeared to me that `ceph config get` command was not working with global configuration values. As the setting the value via ansible module ceph_config was initially checking the current value via `ceph config get` which was not returning any value, it was unable to set. You can refer below example from 16.2.10-160.el8cp cluster. ~~~ [root@rhcs5-admin cephadm-ansible]# ceph config set global osd_pool_default_size 2 [root@rhcs5-admin cephadm-ansible]# ceph config dump |grep osd_pool_default_size global advanced osd_pool_default_size 2 [root@rhcs5-admin cephadm-ansible]# [root@rhcs5-admin cephadm-ansible]# ceph config get global osd_pool_default_size Error EINVAL: unrecognized entity 'global' [root@rhcs5-admin cephadm-ansible]# ceph config get osd osd_pool_default_size 2 [root@rhcs5-admin cephadm-ansible]# ~~~ Guillaume, can confirm on this more. >Also, in comment#4, the task "TASK [set the 'mgr/dashboard/rhcs5node1.example.com/server_addr' configuration]" uses "config get" instead of "config set" to set the server_addr. Can you please check this? If I am not wrong in RHCS 6.0, it uses the old method where while setting a configuration parameter it first fetch the value and then it set it if the provided value is different than the current one. This is the same logic I was telling in my above ^^response. >But in the meantime, can you check if the ansible script succeeds in all cases across releases if "config get" is used? This can very well fix the issue for the moment until I can identify a fix for the "config dump" output. In this scenario, it will succeed, but again we will have the same problem with global configuration values which was fixed on BZ#2188319. All these points Guillaume can confirm whether my explanation is correct. Thanks, Tridibesh --- Additional comment from Sridhar Seshasayee on 2023-08-08 06:13:56 UTC --- Hi Tridibesh, Looking into the history of BZ# 2188319, I think the expectation from "config get global ..." is incorrect. When a config setting is applied "globally" using "config set global ...", it implies that the value is being changed across all the daemons (osd, mon, mgr & mds) and clients. See, https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#confsec-global However, there are certain rules mentioned further down in the above link that I am pasting below that talks about precedence and this needs to be taken into account in the implementation: -- Start "Any given daemon will draw its settings from the global section, the daemon- or client-type section, and the section sharing its name. Settings in the most-specific section take precedence so precedence: for example, if the same option is specified in both global, mon, and mon.foo on the same source (i.e. that is, in the same configuration file), the mon.foo setting will be used. If multiple values of the same configuration option are specified in the same section, the last value specified takes precedence. Note that values from the local configuration file always take precedence over values from the monitor configuration database, regardless of the section in which they appear." -- End Considering the above rule, the script MUST ensure that any <daemon> or <daemon>.<id> section doesn't have the config setting that it wants to override globally. Therefore, to verify the config change, the script can simply get the changed value from any of the active daemon configuration sections. For e.g., in this case the following command may be used, $ ceph config get osd osd_pool_default_size The above is a valid method to verify the changed value as you have already shown in comment#15. The "ceph config get osd osd_pool_default_size" reported the changed value of 2 correctly. Therefore, I think the script can revert "config dump" and use "config get" from any of the relevant daemons. To be more accurate, the script just needs to keep a track of the daemon the config value belongs to and then use that in the "config get" command provided the above rules are adhered to. -Sridhar --- Additional comment from Guillaume Abrioux on 2023-08-08 07:25:03 UTC --- (In reply to Sridhar Seshasayee from comment #16) > -- Start > "Any given daemon will draw its settings from the global section, the > daemon- or client-type > section, and the section sharing its name. Settings in the most-specific > section take precedence > so precedence: for example, if the same option is specified in both global, > mon, and mon.foo on > the same source (i.e. that is, in the same configuration file), the mon.foo > setting will be used. > > If multiple values of the same configuration option are specified in the > same section, the last > value specified takes precedence. > > Note that values from the local configuration file always take precedence > over values from the > monitor configuration database, regardless of the section in which they > appear." > -- End > > Considering the above rule, the script MUST ensure that any <daemon> or > <daemon>.<id> section > doesn't have the config setting that it wants to override globally. This is why this is definitely easier to rely on `ceph config dump` output. > Therefore, to verify the config change, the script can simply get the changed value from any of > the active daemon configuration sections. For e.g., in this case the following command may be used, > > $ ceph config get osd osd_pool_default_size > > The above is a valid method to verify the changed value as you have already > shown in comment#15. > The "ceph config get osd osd_pool_default_size" reported the changed value > of 2 correctly. > Therefore, I think the script can revert "config dump" and use "config get" from any of the > relevant daemons. To be more accurate, the script just needs to keep a track of the daemon the > config value belongs to and then use that in the "config get" command provided the above rules > are adhered to. At the end, that doesn't change the fact that `ceph config dump --format json` should be fixed. If I understand it correctly, you are suggesting we implement more complexity in that Ansible module to work around a valid bug in `ceph config dump`: `--format json` not honoured. --- Additional comment from Sridhar Seshasayee on 2023-08-08 09:37:27 UTC --- (In reply to Guillaume Abrioux from comment #17) > > This is why this is definitely easier to rely on `ceph config dump` output. > "config dump" merely provides a way to ascertain the global setting. Considering the rules I posted in my previous comment, it still doesn't change the fact that the any <daemon>.<id> settings (if it exists) will not be overridden by the global setting. Therefore, in any case, the existing daemon specific setting of the same config option must be checked and removed before applying the global setting. > > At the end, that doesn't change the fact that `ceph config dump --format > json` should be fixed. > > If I understand it correctly, you are suggesting we implement more > complexity in that Ansible module to work around > a valid bug in `ceph config dump`: `--format json` not honoured. Sure, the "ceph config dump --format json" should be consistent and it's something I am investigating to fix. Right now the json format of the output prints the normalized value of the config option (i.e. without the instance name as mentioned in comment#14). This can still be used to check daemon specific settings. That said, I don't think implementing the complexity of checking daemon specific config settings is a workaround. It is needed if an option is to be set globally. Otherwise, there's a possibility that a subset of the daemons will use the global setting and another subset will use daemon specific setting leading to unpredictable behavior. Therefore, I am suggesting a couple of things: 1. Ensure the global setting is truly global by removing any existing daemon setting for the same config option. This can be done by using "ceph config dump --format json" and checking for the normalized config option (if applicable) set for the daemon. The existing daemon setting may be removed using "ceph config rm <daemon>.<id> <option>" or "ceph config rm <daemon> <option>". The global setting may then be applied using "ceph config set global <option> <value>". 2. The global setting can then be verified as before using the appropriate daemon name (not "global") and the localized name (i.e. with the instance name) like below: ceph config get <daemon> <localized_option_name> This means revert to the original way to verify an option using daemon name and option name. Point 1 above must be done regardless of the way the config setting is verified. Point 2 above can eventually use "ceph config dump --format json" when the output is fixed. I am trying to ascertain if the current output of "ceph config dump --format json" is intentional since this is how it has been working so far. I hope the above clarifies things. --- Additional comment from Sridhar Seshasayee on 2023-08-09 14:29:50 UTC --- Hi Guillaume, Tridibesh, I have raised PR: https://github.com/ceph/ceph/pull/52906 that fixes the output of the "ceph config dump --format json" to be consistent across all output formats. -Sridhar --- Additional comment from Pawan on 2023-08-16 09:24:38 UTC --- Since the GA for 5.3z5 is on 22nd, Any ETA on when the fix would land downstream? --- Additional comment from Neha Ojha on 2023-08-16 16:16:59 UTC --- (In reply to Pawan from comment #20) > Since the GA for 5.3z5 is on 22nd, Any ETA on when the fix would land > downstream? Hi Pawan, I don't see a reason for us to push for the 22nd deadline, do you? I am changing the target to 7.x because the fix will need to land there first. We'll need to clone this BZ for 6.x and 5.x (if we find another release vehicle). --- Additional comment from Pawan on 2023-08-17 02:34:43 UTC --- (In reply to Neha Ojha from comment #21) > (In reply to Pawan from comment #20) > > Since the GA for 5.3z5 is on 22nd, Any ETA on when the fix would land > > downstream? > > Hi Pawan, I don't see a reason for us to push for the 22nd deadline, do you? > I am changing the target to 7.x because the fix will need to land there > first. We'll need to clone this BZ for 6.x and 5.x (if we find another > release vehicle). Sounds good Neha. Noted! Thanks for the update. --- Additional comment from Sridhar Seshasayee on 2023-10-06 16:09:54 UTC --- Added upstream Ceph tracker info. --- Additional comment from Tridibesh Chakraborty on 2023-10-09 05:28:53 UTC --- Hello Sridhar/Neha, Are we planning to add this into RHCS 5.3z6 as from the release calendar I can see we are planning for RHCS 5.3z6 on next year Q1? Thanks, Tridibeshoted! Thanks for the update. --- Additional comment from Sridhar Seshasayee on 2023-10-10 07:46:36 UTC --- Hello Tridibesh, Yes, RHCS 5.3z6 is definitely possible. But I will let Neha take the final decision on this. Neha is on PTO this week, and so you can expect a response from her sometime next week. -Sridhar --- Additional comment from Neha Ojha on 2023-10-26 19:52:41 UTC --- (In reply to Sridhar Seshasayee from comment #25) > Hello Tridibesh, > > Yes, RHCS 5.3z6 is definitely possible. But I will let Neha take the final > decision on this. > Neha is on PTO this week, and so you can expect a response from her sometime > next week. > > -Sridhar Sure, let's plan the fix for 5.3z6, which seems to be planned for March 2024 - we'll need to clone this BZ for 5.x and 6.x
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0745