Bug 2009379 - Very high variance in OVN performance metrics when running make perf-test
Summary: Very high variance in OVN performance metrics when running make perf-test
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 21.G
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: OVN Team
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-30 13:54 UTC by Anton Ivanov
Modified: 2023-07-13 07:25 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1583 0 None None None 2021-10-01 14:00:16 UTC

Description Anton Ivanov 2021-09-30 13:54:18 UTC
Very high variance in OVN performance metrics when running make perf-test in the OVN test suite.

Standard deviation is ~ 10% 

Highest/Lowest values are +/- 20% from average.

This makes the test non-informative. At the same time, people have started using it.

We should either improve it and make the results reproducible or disable it.

Comment 1 Mark Gray 2021-10-04 17:19:11 UTC
For which test (or test number) and which test metric do you see the variance. I don't think I see the same variance as you but I would like to compare over multiple test iterations.

Also, are you running from master using the single-threaded northd implementation?

Comment 2 Anton Ivanov 2021-10-13 08:39:00 UTC
I am looking at the Average. 

The variance can be seen in both single and multi-threaded tests.

If you run the test in a loop grepping for Average you can see it.

Comment 3 Mark Gray 2021-10-13 17:15:14 UTC
These are the results I see:

********* Test Case 1 *********
`$ for i in {1..10}; do make check-perf TESTSUITEFLAGS="1" | grep "Average (northd"; done
  Average (northd-loop in msec): 127.422379
  Average (northd-loop in msec): 119.656266
  Average (northd-loop in msec): 59.249756
  Average (northd-loop in msec): 111.437500
  Average (northd-loop in msec): 116.752015
  Average (northd-loop in msec): 110.089783
  Average (northd-loop in msec): 128.250003
  Average (northd-loop in msec): 114.156252
  Average (northd-loop in msec): 137.252017
  Average (northd-loop in msec): 119.343783
`

********* Test Case 5 *********
`
$ for i in {1..10}; do make check-perf TESTSUITEFLAGS="5" | grep "Average (northd"; done
  Average (northd-loop in msec): 728.265642
  Average (northd-loop in msec): 753.437517
  Average (northd-loop in msec): 721.379891
  Average (northd-loop in msec): 743.377024
  Average (northd-loop in msec): 709.804704
  Average (northd-loop in msec): 731.797876
  Average (northd-loop in msec): 747.079173
  Average (northd-loop in msec): 715.250008
  Average (northd-loop in msec): 711.859383
  Average (northd-loop in msec): 711.281258
`

I think the variance is not relative to the overall Average but is an absolute variance. To me it looks like +/- 25ms instead of +/- 20%. In "Test Case 1", the average loop time is a lot shorter than "Test Case 5" but the absolute variance is about the same. IMO, that is probably due to general operating system noise (scheduler, paging, etc) and I wouldn't expect us to be able to get much better than that without tuning the operating system. As our current loop time is some real deployments is ~10 seconds, maybe we could modify the tests to model something like that or, alternatively, we could remove the tests - if they do not add value.


Note You need to log in before you can comment on or make changes to this bug.