Bug 1322529

Summary: nested virt: speed on host network interfaces is not reported in Webadmin
Product: [oVirt] ovirt-engine Reporter: jniederm
Component: BLL.NetworkAssignee: Marcin Mirecki <mmirecki>
Status: CLOSED WONTFIX QA Contact: Meni Yakove <myakove>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.5CC: bugs, danken, jniederm, michal.skrivanek, ylavi
Target Milestone: ---Flags: sbonazzo: ovirt-4.2-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-16 15:20:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1252426    
Attachments:
Description Flags
network interfaces subtab
none
caps
none
stats
none
sys_class_net_star_speeds
none
vdsm.log none

Description jniederm 2016-03-30 16:23:47 UTC
Created attachment 1141838 [details]
network interfaces subtab

Description of problem:
All NIC appears to have speed 0 Mbps.

Version-Release number of selected component (if applicable):
3.6 master, commit d35c04d1686850dd03708377

How reproducible:
100%

Steps to Reproduce:
1. Add a host and select it
2. Select Network Interfaces subtab
3. Check "Speed" column

Actual results:
zeros

Expected results:
real link speeds, eg 100Mpbs, 1000Mbps

Comment 1 Dan Kenigsberg 2016-03-30 18:56:13 UTC
Please share `vdsClient -s 0 getVdsCaps` and `vdsClient -s 0 getVdsStats` from that host, as well as the output of `more /sys/class/net/*/speed |cat`

Comment 2 Dan Kenigsberg 2016-03-30 18:56:40 UTC
oh, vdsm.log might be of interest, too.

Comment 3 jniederm 2016-03-31 11:38:22 UTC
Created attachment 1142168 [details]
caps

Comment 4 jniederm 2016-03-31 11:38:58 UTC
Created attachment 1142182 [details]
stats

Comment 5 jniederm 2016-03-31 11:40:12 UTC
Created attachment 1142184 [details]
sys_class_net_star_speeds

It did not work

Comment 6 jniederm 2016-03-31 11:42:50 UTC
Created attachment 1142185 [details]
vdsm.log

Comment 7 Dan Kenigsberg 2016-03-31 12:06:19 UTC
Are the speeds updated when you press the "refresh capabilities" button?

Comment 8 jniederm 2016-03-31 12:10:17 UTC
no

Comment 9 jniederm 2016-03-31 12:12:46 UTC
The problem is in engine. I use following patch fixing it: https://gerrit.ovirt.org/#/c/53670. The patch itself was rejected, see gerrit comments for details.

Comment 10 Dan Kenigsberg 2016-04-01 20:26:29 UTC
The patch was rejected because refreshing capabilities was thought to pull fresh speeds from the host. in comment 8 you say that this is not the case so we need a deeper look

Comment 11 jniederm 2016-04-26 11:22:04 UTC
Properly reported speed is required for computation of migration bandwidth. Rising severity.

Comment 12 Michal Skrivanek 2016-04-28 07:32:24 UTC
rising once more as this is a crucial component for one of the high profile 4.0 features, which is already under test

Comment 13 Marcin Mirecki 2016-04-29 08:49:28 UTC
The problem only occurs on nested hosts which use "virtio" nics.
For these, the zero speed is the "expected" behavior. This is because there is no "set speed" that a virtio-net interface can obtain. The nic is a software implementation of a network interface and obtains the fastest possible speed it can. That speed will vary depending on the network/CPU/memory load on the virtualisation host, as well as the destination of any particular traffic.
For example, when moving packets between 2 VMs on the same host, the packets can be sent as fast as the host's cpu can move them from one process to another.

The question is whether we should report this as 0, or some other message (like 'varying')?

Since this is only a problem for nested hosts, I suppose this is not a critical issue, as such a setup is probably not very often used in production environments.

Comment 14 jniederm 2016-04-29 13:05:45 UTC
How is then possible that getVdsStats is able to report the speed (1000) whereas getVdsCaps behaves probably as you described (it reports 0)?
See attachment 1142182 [details] and attachment 1142168 [details].

Comment 15 Marcin Mirecki 2016-04-29 14:11:40 UTC
The way we calculate speed, is we look at: /sys/class/net/%s/speed
and in case we fail we return 0.
This is what we return in caps.
In stats we also use the same value, but it later goes through this line of code:
        ifrate = last_sample.interfaces[ifid].speed or 1000
which is the cause of the different values.

This is definitely a bug. The two values should be the same.
The question is which value would you like to have? 
0, 1000, -1 (to hint an undetermined value)?
Let me know what you would expect.

Comment 16 Dan Kenigsberg 2016-05-01 05:57:05 UTC
Marcin, can you fix this problem on Engine side, by inventing a silly default (e.g. 1000mbps) when speed is unknown? Than, we can drop the lying on Vdsm's getVdsStats.

Comment 17 Sandro Bonazzola 2016-05-02 10:08:17 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 18 jniederm 2016-05-02 11:54:48 UTC
I'd rather go with reporting -1 and change UI to report "n/a". Silly value could be confusing for both engine devs and users.

E.g. this bug was reported to be able link speed for estimation of migration bandwidth. Till comment 13 I was sure that link speed of virtio NICs actually is 1000 Mbps.

Comment 19 Marcin Mirecki 2016-05-05 16:50:33 UTC
Submitted 2 patches (1 vdsm, 1 engine) done to Jakub's suggestions.
Vdsm now reports unknown speed as -1. This will be the case when network speed can not be determined (error on read). This is done for both vdsStats and vdsCaps. 

If the nic is down, the speed will be reported as 0 in vdsCaps.For vdsStats, it will still be 1000. I suppose this is also an error, but want to ask fromani (the author) if there is no secret purpose of this.

Engine will display this -1 as N/A. This will affect the speed, rxrate and txrate (since speed is used to calculate it).

Let me know if this suits you.

Comment 20 jniederm 2016-05-06 18:49:53 UTC
Looks good to me. If Francesco allows it, it would be great to have the speed reporting unified.

Comment 21 Dan Kenigsberg 2016-05-08 09:50:14 UTC
I'm afraid that I own the original lie regarding unknown->1000. If Engine (including Engine-3.6) can handle that, make sure that speeds are reported uniformly by all verbs.

Comment 22 Yaniv Lavi 2016-05-23 13:22:52 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 23 Yaniv Lavi 2016-05-23 13:25:18 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 24 Dan Kenigsberg 2017-07-16 15:20:57 UTC
I'd love to clean this piece of ugly behavior, but I am afraid that nested virt is not supported in RHV, and the effect is not very important to fix.