Description of problem: Moving from RHCS 3.3 to RHCS 4.1, once we are in RHCS4.1 we want to start using 2 RGW per node. We have multisite configured in ceph-ansible: [root@xxx]# cat group_vars/all.yml | grep rgw_multisite rgw_multisite: True Because we have multiple-realms the configuration of each rgw node is specified in host_vars/nodeX To configure 2 RGW per node, we add the following info to the all.yml file: $ cat group_vars/all.yml | grep radosgw_num radosgw_num_instances: 2 And also we have a per node rgw entry in the ceph_conf_overrides, for example: client.rgw.cepha.rgw0: host: cepha keyring: /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw0/keyring log file: /var/log/ceph/ceph-rgw-0-cepha.log log_to_file: true rgw frontends: beast endpoint=x.x.x.x:8080 rgw_dynamic_resharding: false rgw_enable_apis: s3,admin rgw_zone: ieec1 rgw_zonegroup: produccion rgw_realm: xxxx client.rgw.cepha.rgw1: host: cepha keyring: /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw1/keyring log file: /var/log/ceph/ceph-rgw-1-cepha.log log_to_file: true rgw frontends: beast endpoint=x.x.x.x:8081 rgw_dynamic_resharding: false rgw_enable_apis: s3,admin rgw_zone: ieec1 rgw_zonegroup: produccion rgw_realm: xxxx With this config we then run the site-container.yml with the radosgw limit: #ansible-playbook -vv -i inventory site-container.yml --limit rgws The play finishes ok, but no changes have been made, we have the same count of RGW services. This is because in the task in the file roles/ceph-facts/tasks/set_radosgw_address.yml, never maches de conditional '- rgw_instances is undefined' and the task is always skipped. - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_instances is undefined - rgw_multisite | bool Ansible log of the task that gets skipped: TASK [ceph-facts : set_fact rgw_instances with rgw multisite] ***************************************************************************************************************************************************************************************************************************************************************** task path: /root/danip/ceph-ansible-dc1/roles/ceph-facts/tasks/set_radosgw_address.yml:53 Monday 20 July 2020 07:16:00 -0400 (0:00:00.221) 0:12:57.310 *********** skipping: [cepha] => (item=0) => changed=false ansible_loop_var: item item: '0' skip_reason: Conditional result was False skipping: [cepha] => (item=1) => changed=false ansible_loop_var: item item: '1' skip_reason: Conditional result was False skipping: [cephb] => (item=0) => changed=false ansible_loop_var: item item: '0' skip_reason: Conditional result was False skipping: [cephc] => (item=0) => changed=false ansible_loop_var: item item: '0' skip_reason: Conditional result was False skipping: [cephb] => (item=1) => changed=false ansible_loop_var: item item: '1' skip_reason: Conditional result was False skipping: [cephc] => (item=1) => changed=false ansible_loop_var: item item: '1' skip_reason: Conditional result was False If we remove the conditional check '- rgw_instances is undefined' , and run it like this: - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_multisite | bool The number of radosgw gets configured to 2 per node, but the systemd units fail to start because both instances of radosGW are running on the same port, the env file of the systemd unit sets the same port for each RGW. [root@cepha ~]$ cat /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw?/EnvironmentFile INST_NAME=rgw0 INST_PORT=8080 INST_NAME=rgw1 INST_PORT=8080 To workaround this issue we had to modify the fact rgw_instance from file /root/danip/ceph-ansible-dc1/roles/ceph-facts/tasks/set_radosgw_address.yml file, so it increases by one the number of the port with: 'radosgw_frontend_port': radosgw_frontend_port | int + item|int, This is the full task with the modifications: ##ORIGINAL - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_instances is undefined - rgw_multisite | bool ## MODIFIED TO WORK. - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int + item|int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_multisite | bool Note: All credits to Daniel Parkes
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
there is a pull request with a fix https://github.com/ceph/ceph-ansible/pull/5583
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0081