Bug 1066570 (Cumulative_RX_TX_Statistics_VDSM)

Summary: [RFE] Report actual rx_byte instead of a false rxRate
Product: [oVirt] vdsm Reporter: Dan Kenigsberg <danken>
Component: RFEsAssignee: Dan Kenigsberg <danken>
Status: CLOSED CURRENTRELEASE QA Contact: Meni Yakove <myakove>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: ---CC: bazulay, bugs, danken, gklein, iheim, mburman, mgoldboi, pdwyer, rbalakri, s.kieske, sradco, yeylon, ylavi
Target Milestone: ovirt-3.6.0-rcKeywords: FutureFeature
Target Release: 4.17.8Flags: rule-engine: ovirt-3.6.0+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: network
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-27 07:49:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1080494    
Bug Blocks: 1063343    

Description Dan Kenigsberg 2014-02-18 16:19:26 UTC
Description of problem:
Vdsm (as well as layers above it) reports the rxRate for each network device. This number is calculated by dividing the bits transferred over the device per second, by the "speed" of the device. For nics, this speed is estimated based on the device driver. For vlan devices or nics that does not support speed (virtio, ib) it is faked as 1000mbps.

The speed is fake in the sense that on a host overloaded with multiple VMs, no vNIC has true 1000mbps. On the other hand, inter-VM communication can easily go above 100% of the virtio "speed".

In several cases (e.g. when calculating total VM traffic), Engine multiplies the two values to re-produce the actual values. Introducing meaningless numbers into the computation only to remove them later, is bound to cause trouble (e.g. https://bugzilla.redhat.com/show_bug.cgi?id=996678#c7 )

Vdsm reports "speed" and rxRate/txRate since ever, and it would continue to do so in oVirt-3.y.z. However, it's a piece of API we should get straight.

Comment 1 Dan Kenigsberg 2014-02-27 11:50:08 UTC
Actually, we'd better report the actual rx_byte and the sample time, and not the calculated rate. The actual rx_byte/tx_byte are important for accounting. The only problem in that is that Linux byte counters may reset once they pass their 64 bit maximum, and are harder to maintain during migration.

Comment 2 Lior Vernia 2014-12-17 10:12:16 UTC
Dan, please note the current patch doesn't include sampling times - we'd also want those reported, as you mentioned in Comment 1, to enable the engine to properly compute rate on its own, so in the future the reported rates could be dropped (speed would still be required, though).

Comment 3 Yaniv Lavi 2015-04-05 13:24:59 UTC
Please check how this change affects DWH collection.

Comment 4 Shirly Radco 2015-04-13 12:04:23 UTC
This is a part of the Cummulative_RX_TX_Statistics feature. Will discuss the issue with Lior.

Comment 5 Red Hat Bugzilla Rules Engine 2015-10-18 08:34:52 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 6 Michael Burman 2015-11-05 08:06:02 UTC
Verified on -  3.6.0.3-0.1.el6 and vdsm-4.17.10.1-0.el7ev.noarch

Comment 7 Sandro Bonazzola 2015-11-27 07:49:14 UTC
Since oVirt 3.6.0 has been released, moving from verified to closed current release.