.Adding a new Ceph Object Gateway instance when upgrading fails
The `radosgw_frontend_port` option did not consider more than one Ceph Object Gateway instance, and configured port `8080` to all instances. With this release, the `radosgw_frontend_port` option is increased for each Ceph Object Gateway instance, allowing you to use more than one Ceph Object Gateway instance.
Description of problem:
Moving from RHCS 3.3 to RHCS 4.1, once we are in RHCS4.1 we want to start using 2 RGW per node.
We have multisite configured in ceph-ansible:
[root@xxx]# cat group_vars/all.yml | grep rgw_multisite
rgw_multisite: True
Because we have multiple-realms the configuration of each rgw node is specified in host_vars/nodeX
To configure 2 RGW per node, we add the following info to the all.yml file:
$ cat group_vars/all.yml | grep radosgw_num
radosgw_num_instances: 2
And also we have a per node rgw entry in the ceph_conf_overrides, for example:
client.rgw.cepha.rgw0:
host: cepha
keyring: /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw0/keyring
log file: /var/log/ceph/ceph-rgw-0-cepha.log
log_to_file: true
rgw frontends: beast endpoint=x.x.x.x:8080
rgw_dynamic_resharding: false
rgw_enable_apis: s3,admin
rgw_zone: ieec1
rgw_zonegroup: produccion
rgw_realm: xxxx
client.rgw.cepha.rgw1:
host: cepha
keyring: /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw1/keyring
log file: /var/log/ceph/ceph-rgw-1-cepha.log
log_to_file: true
rgw frontends: beast endpoint=x.x.x.x:8081
rgw_dynamic_resharding: false
rgw_enable_apis: s3,admin
rgw_zone: ieec1
rgw_zonegroup: produccion
rgw_realm: xxxx
With this config we then run the site-container.yml with the radosgw limit:
#ansible-playbook -vv -i inventory site-container.yml --limit rgws
The play finishes ok, but no changes have been made, we have the same count of RGW services. This is because in the task in the file roles/ceph-facts/tasks/set_radosgw_address.yml, never maches de conditional '- rgw_instances is undefined' and the task is always skipped.
- name: set_fact rgw_instances with rgw multisite
set_fact:
rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}"
with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }}
when:
- inventory_hostname in groups.get(rgw_group_name, [])
- rgw_instances is undefined
- rgw_multisite | bool
Ansible log of the task that gets skipped:
TASK [ceph-facts : set_fact rgw_instances with rgw multisite] *****************************************************************************************************************************************************************************************************************************************************************
task path: /root/danip/ceph-ansible-dc1/roles/ceph-facts/tasks/set_radosgw_address.yml:53
Monday 20 July 2020 07:16:00 -0400 (0:00:00.221) 0:12:57.310 ***********
skipping: [cepha] => (item=0) => changed=false
ansible_loop_var: item
item: '0'
skip_reason: Conditional result was False
skipping: [cepha] => (item=1) => changed=false
ansible_loop_var: item
item: '1'
skip_reason: Conditional result was False
skipping: [cephb] => (item=0) => changed=false
ansible_loop_var: item
item: '0'
skip_reason: Conditional result was False
skipping: [cephc] => (item=0) => changed=false
ansible_loop_var: item
item: '0'
skip_reason: Conditional result was False
skipping: [cephb] => (item=1) => changed=false
ansible_loop_var: item
item: '1'
skip_reason: Conditional result was False
skipping: [cephc] => (item=1) => changed=false
ansible_loop_var: item
item: '1'
skip_reason: Conditional result was False
If we remove the conditional check '- rgw_instances is undefined' , and run it like this:
- name: set_fact rgw_instances with rgw multisite
set_fact:
rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}"
with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }}
when:
- inventory_hostname in groups.get(rgw_group_name, [])
- rgw_multisite | bool
The number of radosgw gets configured to 2 per node, but the systemd units fail to start because both instances of radosGW are running on the same port, the env file of the systemd unit sets the same port for each RGW.
[root@cepha ~]$ cat /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw?/EnvironmentFile
INST_NAME=rgw0
INST_PORT=8080
INST_NAME=rgw1
INST_PORT=8080
To workaround this issue we had to modify the fact rgw_instance from file /root/danip/ceph-ansible-dc1/roles/ceph-facts/tasks/set_radosgw_address.yml file, so it increases by one the number of the port with:
'radosgw_frontend_port': radosgw_frontend_port | int + item|int,
This is the full task with the modifications:
##ORIGINAL
- name: set_fact rgw_instances with rgw multisite
set_fact:
rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}"
with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }}
when:
- inventory_hostname in groups.get(rgw_group_name, [])
- rgw_instances is undefined
- rgw_multisite | bool
## MODIFIED TO WORK.
- name: set_fact rgw_instances with rgw multisite
set_fact:
rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int + item|int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}"
with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }}
when:
- inventory_hostname in groups.get(rgw_group_name, [])
- rgw_multisite | bool
Note: All credits to Daniel Parkes
Comment 1RHEL Program Management
2020-07-23 08:38:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2021:0081
Description of problem: Moving from RHCS 3.3 to RHCS 4.1, once we are in RHCS4.1 we want to start using 2 RGW per node. We have multisite configured in ceph-ansible: [root@xxx]# cat group_vars/all.yml | grep rgw_multisite rgw_multisite: True Because we have multiple-realms the configuration of each rgw node is specified in host_vars/nodeX To configure 2 RGW per node, we add the following info to the all.yml file: $ cat group_vars/all.yml | grep radosgw_num radosgw_num_instances: 2 And also we have a per node rgw entry in the ceph_conf_overrides, for example: client.rgw.cepha.rgw0: host: cepha keyring: /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw0/keyring log file: /var/log/ceph/ceph-rgw-0-cepha.log log_to_file: true rgw frontends: beast endpoint=x.x.x.x:8080 rgw_dynamic_resharding: false rgw_enable_apis: s3,admin rgw_zone: ieec1 rgw_zonegroup: produccion rgw_realm: xxxx client.rgw.cepha.rgw1: host: cepha keyring: /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw1/keyring log file: /var/log/ceph/ceph-rgw-1-cepha.log log_to_file: true rgw frontends: beast endpoint=x.x.x.x:8081 rgw_dynamic_resharding: false rgw_enable_apis: s3,admin rgw_zone: ieec1 rgw_zonegroup: produccion rgw_realm: xxxx With this config we then run the site-container.yml with the radosgw limit: #ansible-playbook -vv -i inventory site-container.yml --limit rgws The play finishes ok, but no changes have been made, we have the same count of RGW services. This is because in the task in the file roles/ceph-facts/tasks/set_radosgw_address.yml, never maches de conditional '- rgw_instances is undefined' and the task is always skipped. - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_instances is undefined - rgw_multisite | bool Ansible log of the task that gets skipped: TASK [ceph-facts : set_fact rgw_instances with rgw multisite] ***************************************************************************************************************************************************************************************************************************************************************** task path: /root/danip/ceph-ansible-dc1/roles/ceph-facts/tasks/set_radosgw_address.yml:53 Monday 20 July 2020 07:16:00 -0400 (0:00:00.221) 0:12:57.310 *********** skipping: [cepha] => (item=0) => changed=false ansible_loop_var: item item: '0' skip_reason: Conditional result was False skipping: [cepha] => (item=1) => changed=false ansible_loop_var: item item: '1' skip_reason: Conditional result was False skipping: [cephb] => (item=0) => changed=false ansible_loop_var: item item: '0' skip_reason: Conditional result was False skipping: [cephc] => (item=0) => changed=false ansible_loop_var: item item: '0' skip_reason: Conditional result was False skipping: [cephb] => (item=1) => changed=false ansible_loop_var: item item: '1' skip_reason: Conditional result was False skipping: [cephc] => (item=1) => changed=false ansible_loop_var: item item: '1' skip_reason: Conditional result was False If we remove the conditional check '- rgw_instances is undefined' , and run it like this: - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_multisite | bool The number of radosgw gets configured to 2 per node, but the systemd units fail to start because both instances of radosGW are running on the same port, the env file of the systemd unit sets the same port for each RGW. [root@cepha ~]$ cat /var/lib/ceph/radosgw/ceph-rgw.cepha.rgw?/EnvironmentFile INST_NAME=rgw0 INST_PORT=8080 INST_NAME=rgw1 INST_PORT=8080 To workaround this issue we had to modify the fact rgw_instance from file /root/danip/ceph-ansible-dc1/roles/ceph-facts/tasks/set_radosgw_address.yml file, so it increases by one the number of the port with: 'radosgw_frontend_port': radosgw_frontend_port | int + item|int, This is the full task with the modifications: ##ORIGINAL - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_instances is undefined - rgw_multisite | bool ## MODIFIED TO WORK. - name: set_fact rgw_instances with rgw multisite set_fact: rgw_instances: "{{ rgw_instances|default([]) | union([{'instance_name': 'rgw' + item | string, 'radosgw_address': _radosgw_address, 'radosgw_frontend_port': radosgw_frontend_port | int + item|int, 'rgw_realm': rgw_realm | string, 'rgw_zonegroup': rgw_zonegroup | string, 'rgw_zone': rgw_zone | string, 'system_access_key': system_access_key, 'system_secret_key': system_secret_key, 'rgw_zone_user': rgw_zone_user, 'rgw_zone_user_display_name': rgw_zone_user_display_name, 'endpoint': (rgw_pull_proto + '://' + rgw_pullhost + ':' + rgw_pull_port | string) if not rgw_zonemaster | bool and rgw_zonesecondary | bool else omit }]) }}" with_sequence: start=0 end={{ radosgw_num_instances|int - 1 }} when: - inventory_hostname in groups.get(rgw_group_name, []) - rgw_multisite | bool Note: All credits to Daniel Parkes