1007860 – Bond "speed" does not reflect the correct speed

Bug 1007860 - Bond "speed" does not reflect the correct speed

Summary: Bond "speed" does not reflect the correct speed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.2.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.4.0
Assignee:	Amador Pahim
QA Contact:	GenadiC
Docs Contact:
URL:
Whiteboard:	network
Depends On:
Blocks:	996678 1037557 rhev3.4beta 1142926
TreeView+	depends on / blocked

Reported:	2013-09-13 13:06 UTC by Amador Pahim
Modified:	2018-12-06 15:17 UTC (History)
CC List:	11 users (show)
Fixed In Version:	ovirt-3.4.0-alpha1
Doc Type:	Bug Fix
Doc Text:	Bond speeds are now reported accurately.
Clone Of:
Clones:	1037557 (view as bug list)
Environment:
Last Closed:	2014-06-09 13:25:14 UTC
oVirt Team:	Network
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:0504	0	normal	SHIPPED_LIVE	vdsm 3.4.0 bug fix and enhancement update	2014-06-09 17:21:35 UTC
oVirt gerrit	19297	0	None	None	None	Never

Description Amador Pahim 2013-09-13 13:06:13 UTC

Regardless the slaved physical devices, bond always have 1000 as speed:

vdsClient -s 0 getVdsStats | grep network
    network = {'bond4': {'macAddr': '', 'name': 'bond4', 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'down', 'speed': '1000', 'rxDropped': '0'} ...

Checking the upstream code:

vdsm/sampling.py
...
439             ifrate = ifrate or 1000
...

According with the code, bond interfaces are always assumed to have 1GB of speed.

Depending on the network arrangement on the Host, the txRate/rxRate related with the traffic goes to the bond interface. This will reflect in inaccurate network usage in Admin Portal graph. e.g. 1GB of txRate for a bond device in active-backup mode with two 10GB nics will cause 100% of network usage in Admin Portal graph. The correct is 10%. If bond mode is Link Aggregation, the correct would be 5%.

Please report bond speed in a function of slaved nics and bond mode.

Comment 6 GenadiC 2014-01-29 12:36:24 UTC

Verified in ovirt-engine-3.4.0-0.5.beta1.el6.noarch

Comment 9 Dan Kenigsberg 2014-02-20 15:17:27 UTC

We now report smarter speeds on Vdsm, but we should make sure that Engine makes proper use of them. Genady, would you verify that a bonded host with VMs consuming more than one nic's bandwidth is no longer reported as choked?

Based on what Moti said in today's team meeting, I am a bit worried that the updated bond speed is not used by Engine. If it does not, please clone the bug to Engine, to have the work continued there.

Comment 10 Moti Asayag 2014-02-24 08:46:17 UTC

(In reply to Dan Kenigsberg from comment #9)
> We now report smarter speeds on Vdsm, but we should make sure that Engine
> makes proper use of them. Genady, would you verify that a bonded host with
> VMs consuming more than one nic's bandwidth is no longer reported as choked?
> 
> Based on what Moti said in today's team meeting, I am a bit worried that the
> updated bond speed is not used by Engine. If it does not, please clone the
> bug to Engine, to have the work continued there.

There is already an open Bug 980363 for the engine to reflect the actual speed as reported by vdsm for the non-physical devices.

This bug should be closed, as from vdsm side - it was already verified.
It seems that the customer's info should be moved to Bug 980363.

Comment 11 errata-xmlrpc 2014-06-09 13:25:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html

Note You need to log in before you can comment on or make changes to this bug.