Bug 1974898
| Summary: | tug-of-war between ovn-controllers for external gateway port causes havoc for ml2-ovn | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | ffernand <ffernand> | |
| Component: | ovn22.03 | Assignee: | OVN Team <ovnteam> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jianlin Shi <jishi> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | FDP 20.H | CC: | ctrautma, ihrachys, jamsmith, jiji, lmartins, mmichels, ralongi, twilson, ykarel | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | ovn22.03-22.03.0-95.el8fdp ovn22.03-22.03.0-95.el9fdp | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2189267 2196286 (view as bug list) | Environment: | ||
| Last Closed: | 2023-03-13 07:13:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1728282, 1994427, 2081631, 2189267, 2196286 | |||
|
Description
ffernand
2021-06-22 18:06:54 UTC
We also seeing it with ovn-2021 in RHOSP-16.2 release https://bugzilla.redhat.com/show_bug.cgi?id=2081631 A discussion in irc suggested that one way of addressing the churn is to introduce a "delay" / backoff mechanism for re-claiming a port after a "recent" claim of the port by the same controller. (Terms "recent" and "delay" would be subject to discussion or maybe even configuration.) This should be doable by making each lport structure carry a timestamp of the latest successful claim. Bumping the severity/priority due to 2081631 and also because one of the most frequent causes of issues we have in ml2/ovn end up being related to southbound ovsdb performance. When ovsdb-server has to processes 100s of transactions/sec while ports are fought over, that extra load can cause issues. In addition, since neutron subscribes to the SB Port_Binding, that means it has to process all of those events as well. So fixing this could be a pretty big deal, performance-wise. Posted the fix upstream here: https://patchwork.ozlabs.org/project/ovn/list/?series=308808 @Terry this is fixed in master + I backported up to 22.03. Do we really need to see this in 2.13 and it can be left for FDP improvements? Mark, please build a new FPD release for 22.03+ / 22.06+, this bug was fixed in 22.06 https://github.com/ovn-org/ovn/commit/8f1d63bbf6f67ab2bc4eb3d59ba1de43a4f6548f 22.03 https://github.com/ovn-org/ovn/commit/2c98163e024f0543d84df44f9c0840ce0347e2bc Thank you. |