Bug 1920025 - [OSP 16.1][neutron][ovn] - FIP to FIP communication broken when multiple subnets exist on the floating IP network and floating IP's subnet is different than router's subnet
Summary: [OSP 16.1][neutron][ovn] - FIP to FIP communication broken when multiple subn...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Terry Wilson
QA Contact: Fiorella Yanac
URL:
Whiteboard:
Depends On:
Blocks: 1929901
TreeView+ depends on / blocked
 
Reported: 2021-01-25 15:14 UTC by Matt Flusche
Modified: 2023-10-30 15:35 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1929901 (view as bug list)
Environment:
Last Closed: 2023-10-30 15:35:47 UTC
Target Upstream Version:
Embargoed:
fyanac: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-1408 0 None None None 2022-03-24 14:30:25 UTC

Description Matt Flusche 2021-01-25 15:14:17 UTC
Description of problem:
OSP 16.1 OVN non-DVR deployment

- Confusing situation; I'll do my best to explain.
- This environment has a single external network with multiple external IPv4 subnets.
- When neutron routers are created, their external interface is assigned an address randomly from one of the external subnets.
- When floating IPs are created they are randomly assigned from one of the external subnets.
- This works fine for most communication.  General external access via the VIP works without issues.
- However VIP <-> VIP communication is broken when the VIP falls on a different subnet than the external router interface address.
- If both VIPs and the routers live on the same subnet VIP <-> VIP communication works fine.
- When this issue occurs, the communication path seems to involve the external router and may be dependent on its configuration. For example, it seems in some situations packets are being sent via the router for traffic on the same L3 networks.
- I'll provide more details in private comments.

Version-Release number of selected component (if applicable):
OSP 16.1

How reproducible:
100% in this specific environment.

Steps to Reproduce:
See details above.

Comment 53 isnuryusuf-gls 2021-08-04 13:29:36 UTC
iam facing same issue, did we have estimate timeline for releasing the Bugs Fix?

thanks

Comment 60 Jakub Libosvar 2022-11-14 21:56:53 UTC
The issue is fixed in ovn2.13-20.12.0-149.el8fdp. We have 20.12.0-196 which contains the fix. Moving to ON_QA to validate this works with OSP.

Comment 66 John Apple II 2022-11-30 00:18:46 UTC
@twilson - As far as our system goes, the last set of patches we applied above with CEE support seems to have brought us back to functionality.  The only issue I have now in the cluster is that when a VM comes online it takes about 7 minutes for the FIP to become reachable.  However, this may be due to the IBM Cloud Router config (which is opaque to us, and IBM will not share - so we cannot attempt to sync.). It's likely due to the way gARP sends notifications out, but since we aren't allowed to see any of the traffic crossing IBM's network, we are unsure of the actual issue - it could be on OVN's side, it might be on the IBM net.

Comment 67 Mark Michelson 2022-12-01 21:32:56 UTC
@twilson - The internal IPs of VMs are not necessarily going to appear as MAC_Bindings because OVN is aware of the MAC<->IP binding for VIF ports. OVN is able to create logical flows in northd that switch packets destined to vm3's IP address without the need of receiving an ARP from vm3.


Note You need to log in before you can comment on or make changes to this bug.