2306799 – [Openstack 17.1.3] Octavia loadbalancer not allow members to access the VIP by default

Bug 2306799 - [Openstack 17.1.3] Octavia loadbalancer not allow members to access the VIP by default

Summary: [Openstack 17.1.3] Octavia loadbalancer not allow members to access the VIP b...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-octavia
Sub Component:
Version:	17.1 (Wallaby)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z4
Target Release:	17.1
Assignee:	Gregory Thiemonge
QA Contact:	Arkady Shtempler
Docs Contact:	Greg Rakauskas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-21 16:54 UTC by Conrado Gusso Bozza
Modified:	2025-03-22 04:25 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-octavia-8.0.2-17.1.20240829160807.8cbe692.el9ost
Doc Type:	Bug Fix
Doc Text:	Before this update, load balancers could not reply to requests from hosts attached to a subnet that is also used as a member subnet in the same load balancer. This was caused by a missing default network route on the VIP interface of the load balancer. In RHOSP 17.1.4, the missing route has been added, and connectivity is restored.
Clone Of:
Environment:
Last Closed:	2024-11-21 09:42:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-32632	0	None	None	None	2024-08-21 16:55:41 UTC
Red Hat Product Errata	RHBA-2024:9974	0	None	None	None	2024-11-21 09:42:22 UTC

Description Conrado Gusso Bozza 2024-08-21 16:54:49 UTC

Description of problem:
loadbalancer is not responding to request from pool member subnet when loadbalancer and pool members are in different subnet as it worked in Openstack 16.2.5.

We suspect that the issue is related to some files deleted by [1].

Version-Release number of selected component (if applicable):
Openstack 17.1.3
octavia-amphora-17.1-20240516.1.x86_64.qcow2

How reproducible:
Always

Steps to Reproduce:
1. In a running stack, create a loadbalancer in a private subnet. (eg; private-lb-subnet)
2. Adde a pool member to the loadbalancer which is in another private subnet (eg: private-pool-subnet)
3. create a router to route both subnet.

Actual results:
- loadbalancer is handling request from private-lb-subnet (expected behaviour)
- loadbalancer is not handling request from private-pool-subnet

Expected results:
- loadbalancer is handling request from private-pool-subnet and private-lb-subnet.

Additional info:

[1] https://review.opendev.org/c/openstack/octavia/+/807310

Workaround:
[root@amphora ~]# ip netns exec amphora-haproxy ip route add default via 10.10.10.1 dev eth1 onlink table 1

Comment 4 Gregory Thiemonge 2024-08-27 10:01:05 UTC

Some notes on this issue:

A default route is missing from table 1 in ACTIVE_STANDBY:

[cloud-user@amphora-46e9b6ca-82a2-4317-a54d-a0a7de2769d2 ~]$ sudo ip netns exec amphora-haproxy ip route
default via 10.0.2.1 dev eth1 proto static onlink 
10.0.1.0/24 dev eth1 proto kernel scope link src 10.0.1.89 
10.0.2.0/24 dev eth1 proto kernel scope link src 10.0.2.55 
[cloud-user@amphora-46e9b6ca-82a2-4317-a54d-a0a7de2769d2 ~]$ sudo ip netns exec amphora-haproxy ip route show table 1
10.0.2.0/24 dev eth1 proto keepalived scope link src 10.0.2.87

Before Wallaby, the route was added by the ifcfg scripts:

https://github.com/openstack/octavia/blob/unmaintained/victoria/octavia/amphorae/backends/agent/api_server/templates/rh_route_ethX.conf.j2#L25


In Wallaby/Xena/Yoga, it's added only when not in ACTIVE_STANDBY

https://github.com/openstack/octavia/blob/unmaintained/wallaby/octavia/amphorae/backends/utils/interface_file.py#L132-L139


Since Zed, it's handled by keepalived

https://github.com/openstack/octavia/blob/master/octavia/amphorae/drivers/keepalived/jinja/templates/keepalived_base.template#L60

Comment 9 Gregory Thiemonge 2024-08-29 11:46:37 UTC

The bug occurs when
- the VIP is on subnet1 (ex: 10.1.0.12 with VRRP IP 10.1.0.100)
- a member is on subnet2, subnet2 is plugged into the amphora (ex: 10.2.0.91 in the amphora)
- subnet1 and subnet2 are plugged into a router
- a client on subnet2 (10.2.0.80) tries to reach the VIP

In the amphora-haproxy namespace, the interfaces are configured:
```
$ sudo ip -n amphora-haproxy -br a
eth1         UP      10.1.0.100/24 10.1.0.12/32
eth2         UP      10.2.0.91/24
```

There's an explicit rule for the VIP:
```
$ sudo ip -n amphora-haproxy -br rules
[..]
100:    from 10.1.0.12 lookup 1 proto keepalived
```

Default routing table
```
$ sudo ip -n amphora-haproxy -br route
default via 10.1.0.1 dev eth1 proto static onlink 
10.1.0.0/24 dev eth1 proto kernel scope link src 10.1.0.100 
10.2.0.0/24 dev eth2 proto kernel scope link src 10.2.0.91 
```

Table 1
```
$ sudo ip -n amphora-haproxy -br route show table 1
10.1.0.0/24 dev eth1 proto keepalived scope link src 10.1.0.12
```

A TCP SYN packet from the client (10.2.0.80) to the VIP (10.1.0.12) is received on eth1, the linux kernel must send a SYN-ACK packet on the same interface to the client to acknowledge the connection, as the SYN-ACK packet is emitted from 10.1.0.12, it uses routing table 1, but there's no route to 10.2.0.0/24 and no default route, the packet is dropped.

A default route in table 1 via `10.1.0.1 dev eth1` would fix this issue.

Comment 11 Conrado Gusso Bozza 2024-09-04 19:14:15 UTC

Hello Gregory,

Customer tried an hotfix on a testing infrastructure. They applied procedure detailed in https://access.redhat.com/solutions/6214931 to update the default amphora-image with a slight modification of code.

So, they asked if this hotfix can be deployed on production infrastructure without side effect?

diff interface_file.py interface_file.py.ori
132,140c132,139
< # OASIS CASE 03908793
< #            if topology != consts.TOPOLOGY_ACTIVE_STANDBY:
<             self.routes.append({
<                 consts.DST: (
<                     "::/0" if ip_version == 6 else "0.0.0.0/0"),
<                 consts.GATEWAY: gateway,
<                 consts.FLAGS: [consts.ONLINK],
<                 consts.TABLE: 1,
<             })
---
>             if topology != consts.TOPOLOGY_ACTIVE_STANDBY:
>                 self.routes.append({
>                     consts.DST: (
>                         "::/0" if ip_version == 6 else "0.0.0.0/0"),
>                     consts.GATEWAY: gateway,
>                     consts.FLAGS: [consts.ONLINK],
>                     consts.TABLE: 1,
>                 })

Regards,

Conrado

Comment 36 errata-xmlrpc 2024-11-21 09:42:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:9974

Comment 37 Red Hat Bugzilla 2025-03-22 04:25:28 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.