Bug 2222010 - [GSS] RGW HA ingress service activates virtual IP on all instances at the same time
Summary: [GSS] RGW HA ingress service activates virtual IP on all instances at the sam...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.3
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
: 7.0
Assignee: Adam King
QA Contact: Sayalee
URL:
Whiteboard:
Depends On:
Blocks: 2237662
TreeView+ depends on / blocked
 
Reported: 2023-07-11 13:46 UTC by Harald Klein
Modified: 2023-12-13 15:20 UTC (History)
9 users (show)

Fixed In Version: ceph-18.2.0-5.el9cp
Doc Type: Bug Fix
Doc Text:
.The keepalive daemons communicate and enter the main/primary state Previously, keepalive configurations were populated with IPs that matched the host IP reported from the `ceph orch host ls` command. As a result, if the VIP was configured on a different subnet than the host IP listed, the keepalive daemons were not able to communicate, resulting in the keepalive daemons to enter a Primary state. With this fix, the IPs of keepalive peers in the keepalive configuration are now chosen to match the subnet of the VIP. The keepalive daemons can now communicate even if the VIP is in a different subnet than the host IP from `ceph orch host ls` command. In this case, only one keepalive daemon enters Primary state.
Clone Of:
Environment:
Last Closed: 2023-12-13 15:20:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-6997 0 None None None 2023-07-12 12:17:49 UTC
Red Hat Product Errata RHBA-2023:7780 0 None None None 2023-12-13 15:20:55 UTC

Description Harald Klein 2023-07-11 13:46:37 UTC
Description of problem:

When setting up a RGW HA ingress service, `keepalived` VRRP communciation only works when the virtual IP of the service is assigned to the interface that also has the inventory IP assigned. 

Version-Release number of selected component (if applicable):

RHCS 5.3

How reproducible:

Configure the ingress service with a VIP from an IP range that does not match the range of the inventory IP of the target node

Steps to Reproduce:
1. define template with constraints as above
2. deploy ingress service

Actual results:

As the VRRP instances cannot communicate, all configured `keepalived` instances will assign the VIP, all `keepalived` instances will switch to MASTER:

Mon Jul 10 10:36:42 2023: (VI_0) Entering MASTER STATE

Expected results:

keepalive should be configured with unicast addresses that are assigned to the auto-configured interface

Additional info:

The template is filled via the `host_ip` and `other_ips` based on the inventory:
~~~
[...]
        other_ips = [utils.resolve_ip(self.mgr.inventory.get_addr(h)) for h in hosts]
[...]
                'host_ip': utils.resolve_ip(self.mgr.inventory.get_addr(host)),
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/services/ingress.py

~~~
[...]  
  unicast_src_ip {{ host_ip }}
  unicast_peer {
    {% for ip in other_ips %}
    {{ ip }}
    {% endfor %}
  }
[...]
~~~
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/ingress/keepalived.conf.j2

But the `interface` setting is derived from the VIP, by matching the IP of the VIP within `self.mgr.cache.networks.get(host, {}).items()`.
This can lead to a `keepalived.conf` with unicast addresses configured which are not assigned to the interface set via the `interface` directive, preventing VRRP communication and leading to "split brain".

Comment 1 Scott Ostapovicz 2023-07-12 12:15:58 UTC
Missed the 6.1 z1 window.  Retargeting to 6.1 z2.

Comment 23 errata-xmlrpc 2023-12-13 15:20:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780


Note You need to log in before you can comment on or make changes to this bug.