Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2009379

Summary: Very high variance in OVN performance metrics when running make perf-test
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Anton Ivanov <anivanov>
Component: OVNAssignee: OVN Team <ovnteam>
Status: CLOSED WONTFIX QA Contact: Jianlin Shi <jishi>
Severity: low Docs Contact:
Priority: low    
Version: FDP 21.GCC: ctrautma, jiji, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-02-14 21:13:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anton Ivanov 2021-09-30 13:54:18 UTC
Very high variance in OVN performance metrics when running make perf-test in the OVN test suite.

Standard deviation is ~ 10% 

Highest/Lowest values are +/- 20% from average.

This makes the test non-informative. At the same time, people have started using it.

We should either improve it and make the results reproducible or disable it.

Comment 1 Mark Gray 2021-10-04 17:19:11 UTC
For which test (or test number) and which test metric do you see the variance. I don't think I see the same variance as you but I would like to compare over multiple test iterations.

Also, are you running from master using the single-threaded northd implementation?

Comment 2 Anton Ivanov 2021-10-13 08:39:00 UTC
I am looking at the Average. 

The variance can be seen in both single and multi-threaded tests.

If you run the test in a loop grepping for Average you can see it.

Comment 3 Mark Gray 2021-10-13 17:15:14 UTC
These are the results I see:

********* Test Case 1 *********
`$ for i in {1..10}; do make check-perf TESTSUITEFLAGS="1" | grep "Average (northd"; done
  Average (northd-loop in msec): 127.422379
  Average (northd-loop in msec): 119.656266
  Average (northd-loop in msec): 59.249756
  Average (northd-loop in msec): 111.437500
  Average (northd-loop in msec): 116.752015
  Average (northd-loop in msec): 110.089783
  Average (northd-loop in msec): 128.250003
  Average (northd-loop in msec): 114.156252
  Average (northd-loop in msec): 137.252017
  Average (northd-loop in msec): 119.343783
`

********* Test Case 5 *********
`
$ for i in {1..10}; do make check-perf TESTSUITEFLAGS="5" | grep "Average (northd"; done
  Average (northd-loop in msec): 728.265642
  Average (northd-loop in msec): 753.437517
  Average (northd-loop in msec): 721.379891
  Average (northd-loop in msec): 743.377024
  Average (northd-loop in msec): 709.804704
  Average (northd-loop in msec): 731.797876
  Average (northd-loop in msec): 747.079173
  Average (northd-loop in msec): 715.250008
  Average (northd-loop in msec): 711.859383
  Average (northd-loop in msec): 711.281258
`

I think the variance is not relative to the overall Average but is an absolute variance. To me it looks like +/- 25ms instead of +/- 20%. In "Test Case 1", the average loop time is a lot shorter than "Test Case 5" but the absolute variance is about the same. IMO, that is probably due to general operating system noise (scheduler, paging, etc) and I wouldn't expect us to be able to get much better than that without tuning the operating system. As our current loop time is some real deployments is ~10 seconds, maybe we could modify the tests to model something like that or, alternatively, we could remove the tests - if they do not add value.

Comment 5 OVN Bot 2024-02-14 21:13:48 UTC
This issue is being closed as an automatic process due to the issue's age. If you wish to re-open this issue, please do so in Jira (https://issues.redhat.com) in the 'FDP' project. Please be sure to set the component to the latest OVN version where this issue is known to occur. If this is a feature request or improvement, please set the component to 'OVN'.