Bug 2222010 - [GSS] RGW HA ingress service activates virtual IP on all instances at the same time
Summary: [GSS] RGW HA ingress service activates virtual IP on all instances at the sam...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.3
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
: 7.0
Assignee: Adam King
QA Contact: Mohit Bisht
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-11 13:46 UTC by Harald Klein
Modified: 2023-08-02 15:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-6997 0 None None None 2023-07-12 12:17:49 UTC

Description Harald Klein 2023-07-11 13:46:37 UTC
Description of problem:

When setting up a RGW HA ingress service, `keepalived` VRRP communciation only works when the virtual IP of the service is assigned to the interface that also has the inventory IP assigned. 

Version-Release number of selected component (if applicable):

RHCS 5.3

How reproducible:

Configure the ingress service with a VIP from an IP range that does not match the range of the inventory IP of the target node

Steps to Reproduce:
1. define template with constraints as above
2. deploy ingress service

Actual results:

As the VRRP instances cannot communicate, all configured `keepalived` instances will assign the VIP, all `keepalived` instances will switch to MASTER:

Mon Jul 10 10:36:42 2023: (VI_0) Entering MASTER STATE

Expected results:

keepalive should be configured with unicast addresses that are assigned to the auto-configured interface

Additional info:

The template is filled via the `host_ip` and `other_ips` based on the inventory:
~~~
[...]
        other_ips = [utils.resolve_ip(self.mgr.inventory.get_addr(h)) for h in hosts]
[...]
                'host_ip': utils.resolve_ip(self.mgr.inventory.get_addr(host)),
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/services/ingress.py

~~~
[...]  
  unicast_src_ip {{ host_ip }}
  unicast_peer {
    {% for ip in other_ips %}
    {{ ip }}
    {% endfor %}
  }
[...]
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/ingress/keepalived.conf.j2

But the `interface` setting is derived from the VIP, by matching the IP of the VIP within `self.mgr.cache.networks.get(host, {}).items()`.
This can lead to a `keepalived.conf` with unicast addresses configured which are not assigned to the interface set via the `interface` directive, preventing VRRP communication and leading to "split brain".

Comment 1 Scott Ostapovicz 2023-07-12 12:15:58 UTC
Missed the 6.1 z1 window.  Retargeting to 6.1 z2.


Note You need to log in before you can comment on or make changes to this bug.