Bug 2195898 - High CPU usage observed for OVS and BFD state is flapping
Summary: High CPU usage observed for OVS and BFD state is flapping
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Ihar Hrachyshka
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks: 2209090 2209092 2209100 2218465
TreeView+ depends on / blocked
 
Reported: 2023-05-06 09:51 UTC by Sukhendu Kar
Modified: 2023-06-29 09:08 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2209090 2209100 (view as bug list)
Environment:
Last Closed: 2023-05-22 13:02:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NFV-2847 0 None None None 2023-05-06 10:22:49 UTC
Red Hat Issue Tracker OSP-24814 0 None None None 2023-05-06 09:52:27 UTC

Comment 7 Ihar Hrachyshka 2023-05-08 20:01:45 UTC
Switching component to OVN driver for neutron because the root cause is in how it handles the case where a router port is attached to an empty AZ. (It then lands it to all chassis.)

Comment 10 Ihar Hrachyshka 2023-05-08 20:26:40 UTC
AFAIU the workaround that can be tried in the environment with no code changed is making sure that if a router is assigned to AZ, the AZ has at least one chassis. The issue in the env is triggered by ports with AZ set that has no chassis. If you avoid setting AZ for such ports, then neutron should fall back to assigning chassis that are explicitly marked with ovn-cms-options=enable-chassis-as-gw, which are controller nodes in the cluster. This will stop neutron from landing the ports to compute nodes.

Comment 27 Ihar Hrachyshka 2023-05-22 13:00:00 UTC
I don't think we need more logs collected at this point, the issue is clear and it's not in OVN / OVS BFD implementation. It's an upstream switch misconfiguration / bond issue of some sort.

Comment 28 Ihar Hrachyshka 2023-05-22 13:02:28 UTC
The original issue reported here - BFD flapping - was fixed by migrating gw ports back to where they belong - to controller nodes. The remaining issue turned out to be an upstream switch misconfiguration of some sort. I am closing the bug. If there are other issues to follow up on, a new bz should be created.

Comment 29 Ihar Hrachyshka 2023-05-22 15:18:28 UTC
Created https://bugzilla.redhat.com/show_bug.cgi?id=2195898#c9 to follow up on neutron AZ scheduler behavior as mentioned in comment 6.

Comment 30 Ihar Hrachyshka 2023-05-22 15:51:30 UTC
Created documentation bz to follow up on recommendation to not locate gw ports on compute nodes: https://bugzilla.redhat.com/show_bug.cgi?id=2209100


Note You need to log in before you can comment on or make changes to this bug.