Bug 1992172

Summary: [ovn][metrics/alerts] Expose longer poll intervals on OVN components to determine load
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Surya Seetharaman <surya>
Component: OVNAssignee: OVN Team <ovnteam>
Status: NEW --- QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: medium    
Version: FDP 21.CCC: ctrautma, jiji, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Surya Seetharaman 2021-08-10 17:18:27 UTC
Description of problem:

We see lots of “WARN|Unreasonably long 13636ms poll interval” logs in the scaled up clusters [OCP-OVN-K]. This can be attributed to sbdb or controller or northd doing something that makes it busy and could thereby indicate load on the cluster.

It would be good if we could expose this somehow as a metric or alert and configure a decent threshold value based on the size of the cluster and document this to somehow indicate in an easier way if OVN is busy or not.

Its more of a feature/rfe than a bug. This was discussed briefly during the OVN-OVN-K sync up on August 10th.