Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2137618

Summary: ovn-controller is running at ~ 100% load across the cluster
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Terry Wilson <twilson>
Component: ovn-2021Assignee: OVN Team <ovnteam>
Status: CLOSED CURRENTRELEASE QA Contact: Jianlin Shi <jishi>
Severity: urgent Docs Contact:
Priority: high    
Version: FDP 22.ECC: astupnik, bcafarel, ctrautma, dalvarez, dhill, jiji, jveiraca, ltamagno, mlavalle, mmichels, nusiddiq, shtiwari, twilson, ushkalim
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2131295 Environment:
Last Closed: 2023-04-12 19:30:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2131295    
Bug Blocks:    

Comment 2 Mark Michelson 2023-01-24 16:42:08 UTC
Hi all,

There's a fix that has been added to OVN since this issue was filed and that are intended to help with this situation.

Ales added a delay for multicast ARP packets. This ensures that full recomputes are not required due to race conditions that occur when multiple controllers receive a GARP simultaneously. This was backported to 21.12 already and is first present in ovn-2021-21.12.0-94 .

Updating to this version *should* prevent the constant 100% CPU issue.

Also, there is a secondary patch that is present in ovn22.09+ that adds MAC_Binding aging. This ensures that MAC_Bindings are eventually deleted after a certain time. This was alluded to in the copied comments above. This has not been backported to ovn-2021 because it's a new feature, not a bug fix. While it is likely to result in smaller SB database sizes, it's not expected to contribute directly to fixing the 100% CPU issue.

It would be good to know if an update to ovn-2021 to -94 or newer fixes the problem. The original issue was opened on 30 September 2022, and the fix in -94 was committed 20 October 2022. So it's reasonable to assume that an update could alleviate the problem. Please let us know if this helps.

Comment 4 Red Hat Bugzilla 2023-10-19 04:25:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days