Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2195898

Summary: High CPU usage observed for OVS and BFD state is flapping
Product: Red Hat OpenStack Reporter: Sukhendu Kar <sukar>
Component: python-networking-ovnAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED NOTABUG QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.2 (Train)CC: aandrade, apevec, camorris, chrisw, fesilva, ftaylor, gkadam, hakhande, ihrachys, ldenny, lhh, lseki, majopela, rycaputo, scohen, spapa, vkhitrin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2209090 2209100 (view as bug list) Environment:
Last Closed: 2023-05-22 13:02:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2209090, 2209092, 2209100, 2218465    

Comment 7 Ihar Hrachyshka 2023-05-08 20:01:45 UTC
Switching component to OVN driver for neutron because the root cause is in how it handles the case where a router port is attached to an empty AZ. (It then lands it to all chassis.)

Comment 10 Ihar Hrachyshka 2023-05-08 20:26:40 UTC
AFAIU the workaround that can be tried in the environment with no code changed is making sure that if a router is assigned to AZ, the AZ has at least one chassis. The issue in the env is triggered by ports with AZ set that has no chassis. If you avoid setting AZ for such ports, then neutron should fall back to assigning chassis that are explicitly marked with ovn-cms-options=enable-chassis-as-gw, which are controller nodes in the cluster. This will stop neutron from landing the ports to compute nodes.

Comment 27 Ihar Hrachyshka 2023-05-22 13:00:00 UTC
I don't think we need more logs collected at this point, the issue is clear and it's not in OVN / OVS BFD implementation. It's an upstream switch misconfiguration / bond issue of some sort.

Comment 28 Ihar Hrachyshka 2023-05-22 13:02:28 UTC
The original issue reported here - BFD flapping - was fixed by migrating gw ports back to where they belong - to controller nodes. The remaining issue turned out to be an upstream switch misconfiguration of some sort. I am closing the bug. If there are other issues to follow up on, a new bz should be created.

Comment 29 Ihar Hrachyshka 2023-05-22 15:18:28 UTC
Created https://bugzilla.redhat.com/show_bug.cgi?id=2195898#c9 to follow up on neutron AZ scheduler behavior as mentioned in comment 6.

Comment 30 Ihar Hrachyshka 2023-05-22 15:51:30 UTC
Created documentation bz to follow up on recommendation to not locate gw ports on compute nodes: https://bugzilla.redhat.com/show_bug.cgi?id=2209100