Bug 2053026

Summary: [ovn] Stale ports in OVN database
Product: Red Hat OpenStack Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: python-networking-ovnAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED ERRATA QA Contact: Maor <mblue>
Severity: high Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: apevec, bcafarel, chrisw, jlibosva, jpretori, lhh, majopela, pgodwin, scohen
Target Milestone: z9Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-networking-ovn-7.3.1-1.20220212033901.4e24f4c.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2122791 (view as bug list) Environment:
Last Closed: 2022-12-07 20:25:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2122791    
Bug Blocks:    

Description Daniel Alvarez Sanchez 2022-02-10 13:38:15 UTC
This BZ is to track the backport of: https://review.opendev.org/c/openstack/neutron/+/827834


There are situations where, under a lot of control plane activity, OVN ports will stale and won't get cleaned up (unless the neutron-ovn-db-sync tool is run manually).

A possible scenario for this is:

a) Port creation
  a.1) Port created in Neutron DB
  a.b) Port created in OVN Northbound (NB) database.
  a.c) NB ovsdb-server will notify of the port creation to all the connected workers
  a.d) Each worker will eventually process this event and update their in-memory copy of the NB database

Immediately, the port gets deleted via API but the previous a.d) step hasn't been completed by all workers. Then the port deletion API request falls into one of those workers that haven't yet updated their in-memory OVN NB database copy with the newly created port.

b) Port deletion
  b.1) Port deleted from Neutron DB
  b.2) Port attempted to be deleted from OVN NB but lookup fails and its revision number is deleted [0]

At this point, the port will stale forever in the OVN database causing other issues that we have mitigated (eg. [1]) but ultimately the number of OVN resources may grow to a point that can affect very negatively to the overall cluster stability and performance.

A potential workaround to this problem might be to run the neutron-ovn-db-sync tool periodically to get rid of those but it is not recommended to do so while the API is operational.

[0] https://github.com/openstack/neutron/blob/f5030b0bc25216d80b09f7ac3938c9a902b655e3/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L698
[1] https://bugs.launchpad.net/neutron/+bug/1874733

Comment 2 Jakub Libosvar 2022-02-16 14:16:39 UTC
*** Bug 2053585 has been marked as a duplicate of this bug. ***

Comment 4 Jakub Libosvar 2022-06-30 14:05:07 UTC
*** Bug 2102636 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2022-12-07 20:25:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795