Bug 1875965

Summary: [RFE] Improve ovs statistics accuracy
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Flavio Leitner <fleitner>
Component: openvswitch2.16Assignee: Kevin Traynor <ktraynor>
Status: ON_QA --- QA Contact: liting <tli>
Severity: medium Docs Contact:
Priority: medium    
Version: FDP 20.ECC: ctrautma, dmarchan, echaudro, jhsiao, ralongi, tli, tredaelli
Target Milestone: ---Flags: ktraynor: needinfo? (tli)
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: FDP 21.H openvswitch2.16-2.16.0-8.el8fdp.x86_64.rpm Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Flavio Leitner 2020-09-04 18:15:46 UTC
Description of problem:

When you dump some statistics from OVS, sometimes the sum is not 100%.
They are not captured at the same time.

This bug is a request to review statistics in OvS to make sure they are accurate and captured at one point in time.


Version-Release number of selected component (if applicable):
openvswitch2.13

Comment 2 Eelco Chaudron 2022-04-15 14:11:03 UTC
Adding some more background information on this BZ, as this is related to the queue rebalance statistics:

I'll assign it to Kevin for now, as he is working on the queue rebalance project. Please update or close this BZ if you already have one for this work.

- Rxq % stats improvement:
"It takes 1 min for tail length of rxq to see the settled load of an rxq due to the 1 min history tail being completed. This is done so it shows what is used for rebalance.

It may not be clear when getting stats from Cu/QE or sos if the stats have settled or are transitory.

It is also annoying to have to wait for 1 min to see the results while debugging/observing.
"

- Detailed Rxq stats:
"PMD showing avg cycles processing per pkt is a useful stat for comparing performance and identifying issues on a traffic path.

However, it is an average all the packets processed by that PMD from all the Rxqs it polls from.

It means if there are different ports and different traffic paths, the cycles/pkt may not give a good indication for particular traffic path (i.e. NIC->VM), so it can be more difficult to identify a bottleneck on a specific traffic path."


I'll assign it to Kevin for now, as he is working on the queue rebalance project. Please update or close this BZ if you already have one for this work.

Comment 3 Kevin Traynor 2022-04-28 10:09:02 UTC
IIRC, this Bz came from a report by Franck about the % for each Rxq under pmd-rxq-show stats not summing to 100%, while the pmd-stats-show were saying the PMD was at 100%.

The reason for this was that the % overhead of the PMD was not shown under the pmd-rxq-show stat. That is now resolved.

The other stats improvements listed in comment 2 are separate items which can get their own RFEs. I will close this Bz as resolved and open new RFEs for the other items.

Comment 4 Kevin Traynor 2022-04-28 10:18:36 UTC
The reported issue was resolved by David Marchand in OVS 2.16 with commit
https://github.com/openvswitch/ovs/commit/3222a89d9aa2bc3ef317611d0a252f372a690d4a