Bug 2195898
| Summary: | High CPU usage observed for OVS and BFD state is flapping | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Sukhendu Kar <sukar> | |
| Component: | python-networking-ovn | Assignee: | Ihar Hrachyshka <ihrachys> | |
| Status: | CLOSED NOTABUG | QA Contact: | Eran Kuris <ekuris> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 16.2 (Train) | CC: | aandrade, apevec, camorris, chrisw, fesilva, ftaylor, gkadam, hakhande, ihrachys, ldenny, lhh, lseki, majopela, rycaputo, scohen, spapa, vkhitrin | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2209090 2209100 (view as bug list) | Environment: | ||
| Last Closed: | 2023-05-22 13:02:28 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2209090, 2209092, 2209100, 2218465 | |||
|
Comment 7
Ihar Hrachyshka
2023-05-08 20:01:45 UTC
AFAIU the workaround that can be tried in the environment with no code changed is making sure that if a router is assigned to AZ, the AZ has at least one chassis. The issue in the env is triggered by ports with AZ set that has no chassis. If you avoid setting AZ for such ports, then neutron should fall back to assigning chassis that are explicitly marked with ovn-cms-options=enable-chassis-as-gw, which are controller nodes in the cluster. This will stop neutron from landing the ports to compute nodes. I don't think we need more logs collected at this point, the issue is clear and it's not in OVN / OVS BFD implementation. It's an upstream switch misconfiguration / bond issue of some sort. The original issue reported here - BFD flapping - was fixed by migrating gw ports back to where they belong - to controller nodes. The remaining issue turned out to be an upstream switch misconfiguration of some sort. I am closing the bug. If there are other issues to follow up on, a new bz should be created. Created https://bugzilla.redhat.com/show_bug.cgi?id=2195898#c9 to follow up on neutron AZ scheduler behavior as mentioned in comment 6. Created documentation bz to follow up on recommendation to not locate gw ports on compute nodes: https://bugzilla.redhat.com/show_bug.cgi?id=2209100 |