Bug 2144492 - Restarting OVS with DVR creates a network loop
Summary: Restarting OVS with DVR creates a network loop
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 17.1
Assignee: Jakub Libosvar
QA Contact: Roman Safronov
URL:
Whiteboard:
: 2225666 (view as bug list)
Depends On:
Blocks: 1823324
TreeView+ depends on / blocked
 
Reported: 2022-11-21 13:15 UTC by Roman Safronov
Modified: 2024-01-19 04:25 UTC (History)
18 users (show)

Fixed In Version: openstack-neutron-18.6.1-1.20230518200972.el9ost
Doc Type: Known Issue
Doc Text:
If you migrate a RHOSP 17.1.0 ML2/OVS deployment with distributed virtual routing (DVR) to ML2/OVN, the floating IP (FIP) downtime that occurs during ML2/OVN migration can exceed 60 seconds.
Clone Of:
: 2225666 (view as bug list)
Environment:
Last Closed: 2023-09-20 00:29:44 UTC
Target Upstream Version:
Embargoed:
gurpsing: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 445886 0 None MERGED Dynamic log level support 2023-08-14 12:28:32 UTC
OpenStack gerrit 889752 0 None MERGED dvr: Avoid installing non-dvr openflow rule on startup 2023-08-14 11:58:40 UTC
Red Hat Issue Tracker OSP-20340 0 None None None 2022-11-21 13:22:36 UTC
Red Hat Product Errata RHBA-2023:5138 0 None None None 2023-09-20 00:30:21 UTC

Comment 27 Jakub Libosvar 2023-07-25 15:50:21 UTC
I understand it now.
  - This is reproducible by having live traffic, ICMP every 0.1 seconds is enough, and restarting OVS agent of compute node hosting a FIP and OVS agent on a network node hosting the gateway for the snat traffic. 
  - The OVS agent with DVR creates a local loop between tunneling and external network. When 2 agents are restarted at the same time, there is a very small window of about 0.5 seconds where both agents have this loop, creating full network loop. When there is a live traffic, the reply traffic gets flooded to the external network, reaches network node and through the loop gets to the tunnel. The tunnel reaches back the compute node and the normal action on br-int learns the source mac address, which is in this case the GW port mac address (fa:16:3e:3c:e6:41 from the comment 20). 
  - The OVS learns in fdb that the GW port MAC belongs to the patch port to the br-tun, since it was observed to arrive from the tunnel.
  - All reply traffic goes to the GW port first, and OVS normal action no longer floods the traffic, since it knows the MAC now and sends it to the patch port to the br-tun bridge and it's dropped there because it's not expected there.
  - Since there is no traffic with source MAC of the gw port, the MAC entry expires.
  - After the expiration, the traffic is renewed.

This is a bug on OVS DVR code and it's a question if it's worth fixing the loop itself or just the use of OVS restarts in migration procedure. I'll treat this BZ as the latter and I'm gonna open a new BZ on OVS agent, just to kick off the discussion but I'd be in favor of not fixing it given that it's a deprecated driver and likely the bug has been present since the DVR was introduced.

Comment 37 Jakub Libosvar 2023-08-03 15:58:38 UTC
*** Bug 2225666 has been marked as a duplicate of this bug. ***

Comment 53 errata-xmlrpc 2023-09-20 00:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:5138

Comment 54 Red Hat Bugzilla 2024-01-19 04:25:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.