Created attachment 1141838 [details] network interfaces subtab Description of problem: All NIC appears to have speed 0 Mbps. Version-Release number of selected component (if applicable): 3.6 master, commit d35c04d1686850dd03708377 How reproducible: 100% Steps to Reproduce: 1. Add a host and select it 2. Select Network Interfaces subtab 3. Check "Speed" column Actual results: zeros Expected results: real link speeds, eg 100Mpbs, 1000Mbps
Please share `vdsClient -s 0 getVdsCaps` and `vdsClient -s 0 getVdsStats` from that host, as well as the output of `more /sys/class/net/*/speed |cat`
oh, vdsm.log might be of interest, too.
Created attachment 1142168 [details] caps
Created attachment 1142182 [details] stats
Created attachment 1142184 [details] sys_class_net_star_speeds It did not work
Created attachment 1142185 [details] vdsm.log
Are the speeds updated when you press the "refresh capabilities" button?
no
The problem is in engine. I use following patch fixing it: https://gerrit.ovirt.org/#/c/53670. The patch itself was rejected, see gerrit comments for details.
The patch was rejected because refreshing capabilities was thought to pull fresh speeds from the host. in comment 8 you say that this is not the case so we need a deeper look
Properly reported speed is required for computation of migration bandwidth. Rising severity.
rising once more as this is a crucial component for one of the high profile 4.0 features, which is already under test
The problem only occurs on nested hosts which use "virtio" nics. For these, the zero speed is the "expected" behavior. This is because there is no "set speed" that a virtio-net interface can obtain. The nic is a software implementation of a network interface and obtains the fastest possible speed it can. That speed will vary depending on the network/CPU/memory load on the virtualisation host, as well as the destination of any particular traffic. For example, when moving packets between 2 VMs on the same host, the packets can be sent as fast as the host's cpu can move them from one process to another. The question is whether we should report this as 0, or some other message (like 'varying')? Since this is only a problem for nested hosts, I suppose this is not a critical issue, as such a setup is probably not very often used in production environments.
How is then possible that getVdsStats is able to report the speed (1000) whereas getVdsCaps behaves probably as you described (it reports 0)? See attachment 1142182 [details] and attachment 1142168 [details].
The way we calculate speed, is we look at: /sys/class/net/%s/speed and in case we fail we return 0. This is what we return in caps. In stats we also use the same value, but it later goes through this line of code: ifrate = last_sample.interfaces[ifid].speed or 1000 which is the cause of the different values. This is definitely a bug. The two values should be the same. The question is which value would you like to have? 0, 1000, -1 (to hint an undetermined value)? Let me know what you would expect.
Marcin, can you fix this problem on Engine side, by inventing a silly default (e.g. 1000mbps) when speed is unknown? Than, we can drop the lying on Vdsm's getVdsStats.
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
I'd rather go with reporting -1 and change UI to report "n/a". Silly value could be confusing for both engine devs and users. E.g. this bug was reported to be able link speed for estimation of migration bandwidth. Till comment 13 I was sure that link speed of virtio NICs actually is 1000 Mbps.
Submitted 2 patches (1 vdsm, 1 engine) done to Jakub's suggestions. Vdsm now reports unknown speed as -1. This will be the case when network speed can not be determined (error on read). This is done for both vdsStats and vdsCaps. If the nic is down, the speed will be reported as 0 in vdsCaps.For vdsStats, it will still be 1000. I suppose this is also an error, but want to ask fromani (the author) if there is no secret purpose of this. Engine will display this -1 as N/A. This will affect the speed, rxrate and txrate (since speed is used to calculate it). Let me know if this suits you.
Looks good to me. If Francesco allows it, it would be great to have the speed reporting unified.
I'm afraid that I own the original lie regarding unknown->1000. If Engine (including Engine-3.6) can handle that, make sure that speeds are reported uniformly by all verbs.
oVirt 4.0 beta has been released, moving to RC milestone.
I'd love to clean this piece of ugly behavior, but I am afraid that nested virt is not supported in RHV, and the effect is not very important to fix.