Bug 2222010

Summary: [GSS] RGW HA ingress service activates virtual IP on all instances at the same time
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harald Klein <hklein>
Component: CephadmAssignee: Adam King <adking>
Status: ASSIGNED --- QA Contact: Mohit Bisht <mobisht>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.3CC: adking, cephqe-warriors, jmulligan, nravinas, sostapov
Target Milestone: ---   
Target Release: 7.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harald Klein 2023-07-11 13:46:37 UTC
Description of problem:

When setting up a RGW HA ingress service, `keepalived` VRRP communciation only works when the virtual IP of the service is assigned to the interface that also has the inventory IP assigned. 

Version-Release number of selected component (if applicable):

RHCS 5.3

How reproducible:

Configure the ingress service with a VIP from an IP range that does not match the range of the inventory IP of the target node

Steps to Reproduce:
1. define template with constraints as above
2. deploy ingress service

Actual results:

As the VRRP instances cannot communicate, all configured `keepalived` instances will assign the VIP, all `keepalived` instances will switch to MASTER:

Mon Jul 10 10:36:42 2023: (VI_0) Entering MASTER STATE

Expected results:

keepalive should be configured with unicast addresses that are assigned to the auto-configured interface

Additional info:

The template is filled via the `host_ip` and `other_ips` based on the inventory:
~~~
[...]
        other_ips = [utils.resolve_ip(self.mgr.inventory.get_addr(h)) for h in hosts]
[...]
                'host_ip': utils.resolve_ip(self.mgr.inventory.get_addr(host)),
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/services/ingress.py

~~~
[...]  
  unicast_src_ip {{ host_ip }}
  unicast_peer {
    {% for ip in other_ips %}
    {{ ip }}
    {% endfor %}
  }
[...]
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/ingress/keepalived.conf.j2

But the `interface` setting is derived from the VIP, by matching the IP of the VIP within `self.mgr.cache.networks.get(host, {}).items()`.
This can lead to a `keepalived.conf` with unicast addresses configured which are not assigned to the interface set via the `interface` directive, preventing VRRP communication and leading to "split brain".

Comment 1 Scott Ostapovicz 2023-07-12 12:15:58 UTC
Missed the 6.1 z1 window.  Retargeting to 6.1 z2.