Bug 1920025

Summary: [OSP 16.1][neutron][ovn] - FIP to FIP communication broken when multiple subnets exist on the floating IP network and floating IP's subnet is different than router's subnet
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: openstack-neutronAssignee: Terry Wilson <twilson>
Status: CLOSED CURRENTRELEASE QA Contact: Fiorella Yanac <fyanac>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: afariasa, apevec, asimonel, broskos, chrisw, dalvarez, egarciar, ekuris, eolivare, ffernand, fyanac, jappleii, jlibosva, john.apple, mamorim, mmichels, nsatsia, ralonsoh, rshah, scohen, sputhenp, twilson, yusuf, yusufhadiwinata
Target Milestone: z9Keywords: TestOnly, Triaged
Target Release: 16.1 (Train on RHEL 8.2)Flags: fyanac: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1929901 (view as bug list) Environment:
Last Closed: 2023-10-30 15:35:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1929901    

Description Matt Flusche 2021-01-25 15:14:17 UTC
Description of problem:
OSP 16.1 OVN non-DVR deployment

- Confusing situation; I'll do my best to explain.
- This environment has a single external network with multiple external IPv4 subnets.
- When neutron routers are created, their external interface is assigned an address randomly from one of the external subnets.
- When floating IPs are created they are randomly assigned from one of the external subnets.
- This works fine for most communication.  General external access via the VIP works without issues.
- However VIP <-> VIP communication is broken when the VIP falls on a different subnet than the external router interface address.
- If both VIPs and the routers live on the same subnet VIP <-> VIP communication works fine.
- When this issue occurs, the communication path seems to involve the external router and may be dependent on its configuration. For example, it seems in some situations packets are being sent via the router for traffic on the same L3 networks.
- I'll provide more details in private comments.

Version-Release number of selected component (if applicable):
OSP 16.1

How reproducible:
100% in this specific environment.

Steps to Reproduce:
See details above.

Comment 53 isnuryusuf-gls 2021-08-04 13:29:36 UTC
iam facing same issue, did we have estimate timeline for releasing the Bugs Fix?

thanks

Comment 60 Jakub Libosvar 2022-11-14 21:56:53 UTC
The issue is fixed in ovn2.13-20.12.0-149.el8fdp. We have 20.12.0-196 which contains the fix. Moving to ON_QA to validate this works with OSP.

Comment 66 John Apple II 2022-11-30 00:18:46 UTC
@twilson - As far as our system goes, the last set of patches we applied above with CEE support seems to have brought us back to functionality.  The only issue I have now in the cluster is that when a VM comes online it takes about 7 minutes for the FIP to become reachable.  However, this may be due to the IBM Cloud Router config (which is opaque to us, and IBM will not share - so we cannot attempt to sync.). It's likely due to the way gARP sends notifications out, but since we aren't allowed to see any of the traffic crossing IBM's network, we are unsure of the actual issue - it could be on OVN's side, it might be on the IBM net.

Comment 67 Mark Michelson 2022-12-01 21:32:56 UTC
@twilson - The internal IPs of VMs are not necessarily going to appear as MAC_Bindings because OVN is aware of the MAC<->IP binding for VIF ports. OVN is able to create logical flows in northd that switch packets destined to vm3's IP address without the need of receiving an ARP from vm3.