Very high variance in OVN performance metrics when running make perf-test in the OVN test suite. Standard deviation is ~ 10% Highest/Lowest values are +/- 20% from average. This makes the test non-informative. At the same time, people have started using it. We should either improve it and make the results reproducible or disable it.
For which test (or test number) and which test metric do you see the variance. I don't think I see the same variance as you but I would like to compare over multiple test iterations. Also, are you running from master using the single-threaded northd implementation?
I am looking at the Average. The variance can be seen in both single and multi-threaded tests. If you run the test in a loop grepping for Average you can see it.
These are the results I see: ********* Test Case 1 ********* `$ for i in {1..10}; do make check-perf TESTSUITEFLAGS="1" | grep "Average (northd"; done Average (northd-loop in msec): 127.422379 Average (northd-loop in msec): 119.656266 Average (northd-loop in msec): 59.249756 Average (northd-loop in msec): 111.437500 Average (northd-loop in msec): 116.752015 Average (northd-loop in msec): 110.089783 Average (northd-loop in msec): 128.250003 Average (northd-loop in msec): 114.156252 Average (northd-loop in msec): 137.252017 Average (northd-loop in msec): 119.343783 ` ********* Test Case 5 ********* ` $ for i in {1..10}; do make check-perf TESTSUITEFLAGS="5" | grep "Average (northd"; done Average (northd-loop in msec): 728.265642 Average (northd-loop in msec): 753.437517 Average (northd-loop in msec): 721.379891 Average (northd-loop in msec): 743.377024 Average (northd-loop in msec): 709.804704 Average (northd-loop in msec): 731.797876 Average (northd-loop in msec): 747.079173 Average (northd-loop in msec): 715.250008 Average (northd-loop in msec): 711.859383 Average (northd-loop in msec): 711.281258 ` I think the variance is not relative to the overall Average but is an absolute variance. To me it looks like +/- 25ms instead of +/- 20%. In "Test Case 1", the average loop time is a lot shorter than "Test Case 5" but the absolute variance is about the same. IMO, that is probably due to general operating system noise (scheduler, paging, etc) and I wouldn't expect us to be able to get much better than that without tuning the operating system. As our current loop time is some real deployments is ~10 seconds, maybe we could modify the tests to model something like that or, alternatively, we could remove the tests - if they do not add value.