Bug 2009379

Summary: Very high variance in OVN performance metrics when running make perf-test
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Anton Ivanov <anivanov>
Component: OVNAssignee: OVN Team <ovnteam>
Status: NEW --- QA Contact: Jianlin Shi <jishi>
Severity: low Docs Contact:
Priority: low    
Version: FDP 21.GCC: ctrautma, jiji, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anton Ivanov 2021-09-30 13:54:18 UTC
Very high variance in OVN performance metrics when running make perf-test in the OVN test suite.

Standard deviation is ~ 10% 

Highest/Lowest values are +/- 20% from average.

This makes the test non-informative. At the same time, people have started using it.

We should either improve it and make the results reproducible or disable it.

Comment 1 Mark Gray 2021-10-04 17:19:11 UTC
For which test (or test number) and which test metric do you see the variance. I don't think I see the same variance as you but I would like to compare over multiple test iterations.

Also, are you running from master using the single-threaded northd implementation?

Comment 2 Anton Ivanov 2021-10-13 08:39:00 UTC
I am looking at the Average. 

The variance can be seen in both single and multi-threaded tests.

If you run the test in a loop grepping for Average you can see it.

Comment 3 Mark Gray 2021-10-13 17:15:14 UTC
These are the results I see:

********* Test Case 1 *********
`$ for i in {1..10}; do make check-perf TESTSUITEFLAGS="1" | grep "Average (northd"; done
  Average (northd-loop in msec): 127.422379
  Average (northd-loop in msec): 119.656266
  Average (northd-loop in msec): 59.249756
  Average (northd-loop in msec): 111.437500
  Average (northd-loop in msec): 116.752015
  Average (northd-loop in msec): 110.089783
  Average (northd-loop in msec): 128.250003
  Average (northd-loop in msec): 114.156252
  Average (northd-loop in msec): 137.252017
  Average (northd-loop in msec): 119.343783
`

********* Test Case 5 *********
`
$ for i in {1..10}; do make check-perf TESTSUITEFLAGS="5" | grep "Average (northd"; done
  Average (northd-loop in msec): 728.265642
  Average (northd-loop in msec): 753.437517
  Average (northd-loop in msec): 721.379891
  Average (northd-loop in msec): 743.377024
  Average (northd-loop in msec): 709.804704
  Average (northd-loop in msec): 731.797876
  Average (northd-loop in msec): 747.079173
  Average (northd-loop in msec): 715.250008
  Average (northd-loop in msec): 711.859383
  Average (northd-loop in msec): 711.281258
`

I think the variance is not relative to the overall Average but is an absolute variance. To me it looks like +/- 25ms instead of +/- 20%. In "Test Case 1", the average loop time is a lot shorter than "Test Case 5" but the absolute variance is about the same. IMO, that is probably due to general operating system noise (scheduler, paging, etc) and I wouldn't expect us to be able to get much better than that without tuning the operating system. As our current loop time is some real deployments is ~10 seconds, maybe we could modify the tests to model something like that or, alternatively, we could remove the tests - if they do not add value.