.The keepalive daemons communicate and enter the main/primary state
Previously, keepalive configurations were populated with IPs that matched the host IP reported from the `ceph orch host ls` command. As a result, if the VIP was configured on a different subnet than the host IP listed, the keepalive daemons were not able to communicate, resulting in the keepalive daemons to enter a Primary state.
With this fix, the IPs of keepalive peers in the keepalive configuration are now chosen to match the subnet of the VIP. The keepalive daemons can now communicate even if the VIP is in a different subnet than the host IP from `ceph orch host ls` command. In this case, only one keepalive daemon enters Primary state.
Description of problem:
When setting up a RGW HA ingress service, `keepalived` VRRP communciation only works when the virtual IP of the service is assigned to the interface that also has the inventory IP assigned.
Version-Release number of selected component (if applicable):
RHCS 5.3
How reproducible:
Configure the ingress service with a VIP from an IP range that does not match the range of the inventory IP of the target node
Steps to Reproduce:
1. define template with constraints as above
2. deploy ingress service
Actual results:
As the VRRP instances cannot communicate, all configured `keepalived` instances will assign the VIP, all `keepalived` instances will switch to MASTER:
Mon Jul 10 10:36:42 2023: (VI_0) Entering MASTER STATE
Expected results:
keepalive should be configured with unicast addresses that are assigned to the auto-configured interface
Additional info:
The template is filled via the `host_ip` and `other_ips` based on the inventory:
~~~
[...]
other_ips = [utils.resolve_ip(self.mgr.inventory.get_addr(h)) for h in hosts]
[...]
'host_ip': utils.resolve_ip(self.mgr.inventory.get_addr(host)),
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/services/ingress.py
~~~
[...]
unicast_src_ip {{ host_ip }}
unicast_peer {
{% for ip in other_ips %}
{{ ip }}
{% endfor %}
}
[...]
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/ingress/keepalived.conf.j2
But the `interface` setting is derived from the VIP, by matching the IP of the VIP within `self.mgr.cache.networks.get(host, {}).items()`.
This can lead to a `keepalived.conf` with unicast addresses configured which are not assigned to the interface set via the `interface` directive, preventing VRRP communication and leading to "split brain".
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:7780
Description of problem: When setting up a RGW HA ingress service, `keepalived` VRRP communciation only works when the virtual IP of the service is assigned to the interface that also has the inventory IP assigned. Version-Release number of selected component (if applicable): RHCS 5.3 How reproducible: Configure the ingress service with a VIP from an IP range that does not match the range of the inventory IP of the target node Steps to Reproduce: 1. define template with constraints as above 2. deploy ingress service Actual results: As the VRRP instances cannot communicate, all configured `keepalived` instances will assign the VIP, all `keepalived` instances will switch to MASTER: Mon Jul 10 10:36:42 2023: (VI_0) Entering MASTER STATE Expected results: keepalive should be configured with unicast addresses that are assigned to the auto-configured interface Additional info: The template is filled via the `host_ip` and `other_ips` based on the inventory: ~~~ [...] other_ips = [utils.resolve_ip(self.mgr.inventory.get_addr(h)) for h in hosts] [...] 'host_ip': utils.resolve_ip(self.mgr.inventory.get_addr(host)), ~~~ https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/services/ingress.py ~~~ [...] unicast_src_ip {{ host_ip }} unicast_peer { {% for ip in other_ips %} {{ ip }} {% endfor %} } [...] ~~~ https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/ingress/keepalived.conf.j2 But the `interface` setting is derived from the VIP, by matching the IP of the VIP within `self.mgr.cache.networks.get(host, {}).items()`. This can lead to a `keepalived.conf` with unicast addresses configured which are not assigned to the interface set via the `interface` directive, preventing VRRP communication and leading to "split brain".