1643476 – Wrong 'maxBandwidth' sent to vdsm on migration

Bug 1643476 - Wrong 'maxBandwidth' sent to vdsm on migration

Summary: Wrong 'maxBandwidth' sent to vdsm on migration

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	4.2.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.3.0
Target Release:	---
Assignee:	Miguel Martin
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1643486
TreeView+	depends on / blocked

Reported:	2018-10-26 10:43 UTC by Miguel Martin
Modified:	2019-02-13 07:44 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ovirt-engine-4.3.0_alpha
Clone Of:
Clones:	1643486 (view as bug list)
Environment:
Last Closed:	2019-02-13 07:44:59 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.3+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	95149	0	'None'	MERGED	core: Fix maxBandwidth units to MiB/s	2020-06-17 17:13:46 UTC

Description Miguel Martin 2018-10-26 10:43:08 UTC

Description of problem:

According to the'Table 5.4. Bandwidth Explained' documentation [1]:
~~~
Defined by user (in Mbps). This value is divided by the number of concurrent migrations (default is 2, to account for ingoing and outgoing migration). Therefore, the user-defined bandwidth must be large enough to accommodate all concurrent migrations.

For example, if the Custom bandwidth is defined as 600 Mbps, a virtual machine migration’s maximum bandwidth is actually 300 Mbps. 
~~~

According to the 'lib/vdsm/api/vdsm-api.yml':
~~~
 3312   ┆ ┆ ┆ -   defaultvalue: needs updating                                                                                                                                                              
 3313   ┆ ┆ ┆ ┆ ┆ description: maximal bandwidth used by the migration (MiB/s)                                                                                                                              
 3314             name: maxBandwidth                                                                                                                                                                        
 3315   ┆ ┆ ┆ ┆ ┆ type: int                                                                                                                                                                                 
 3316   ┆ ┆ ┆ ┆ ┆ added: '4.0'                                                                                                                                                                              
 3317                                                                                                                                                         
~~~

The engine is sending the migration bandwidth limit as Mbps but the vdsm interprets it as MiB/s.

[1] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/administration_guide/sect-cluster_tasks#Cluster_Migration_Policy_Settings_Explained

Version-Release number of selected component (if applicable):
4.2.6.4

How reproducible:
Always

Steps to Reproduce:
1. Configure the cluster's migration policy and set the migration bandwidth limit to 10 Mbps

Actual results:
The actual bandwidth consumed is 40Mbps

Expected results:
According to the documentation, and with a max concurrent migrations equals to 2, the actual bandwidth consumed should be 5Mbps.

Comment 1 meital avital 2019-01-03 13:43:43 UTC

Reassigned:
ovirt-engine-4.3.0-0.4.master.20181230173049.gitef04cb4.el7.noarch
qemu-kvm-ev-2.12.0-18.el7_6.1.1.x86_64
vdsm-4.30.4-81.gitad6147e.el7.x86_64
libvirt-client-4.5.0-10.el7_6.3.x86_64

When setting migration bandwidth limit to 10Mbps, the actual value in source host  virsh domjobinfo is: Memory bandwidth: 52.016 MiB/s

Observing REST API  /ovirt-engine/api/clusters/ shows that the 10Mbps value was set successfully:
      <bandwidth>
        <assignment_method>custom</assignment_method>
        <custom_value>10</custom_value>
      </bandwidth>

Also, need UI fix, the value in Webadmin edit cluster dialog, migration policy => migration bandwidth limit is in Mbps instead of MiB/s

Comment 2 Red Hat Bugzilla Rules Engine 2019-01-03 13:43:47 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 Miguel Martin 2019-01-03 16:25:23 UTC

Maybe I am wrong but 10 Mbps divided by 2 (number of concurrent migrations) and divided by 8 (to change the units to MiB/s) is equal to 0:

And according to './lib/vdsm/virt/migration.py', the actual '_maxBandwidth' will get the default value of 'migration_max_bandwidth' in that case.
~~~
 120         self._maxBandwidth = int(                                                                                                                                                                      
 121             kwargs.get('maxBandwidth') or                                                                                                                                                              
 122             config.getint('vars', 'migration_max_bandwidth')                                                                                                                                           
 123         )
~~~  

Which in fact is '52' according to './lib/vdsm/common/config.py.in':
~~~
 98         ('migration_max_bandwidth', '52',                                                                                                                                                               
 99             'Maximum bandwidth for migration, in MiBps, 0 means libvirt\'s '                                                                                                                            
100             'default, since 0.10.x default in libvirt is unlimited'),                                                                                                                                   
101                                                                       
~~~

So I would say this behavior is expected and we are getting the correct bandwidth: 52 MiB/s.

To test this you need to select a maxBandWidth >= 16 Mbps or you will get the value of the 'migration_max_bandwidth' configuration variable.

Comment 4 Milan Zamazal 2019-01-08 17:45:41 UTC

I can confirm what Miguel writes in the last sentence above. When I set migration bandwidth to 30, the bandwidth reported by virsh is around 1 MiB/s, which is the expected limit (30 // 2 // 8 == 1). If I change the migration bandwidth to 33, the actual bandwidth is around 2 MiB/s (33 // 2 // 8 == 2).

The values reported in the UI and the REST API are the same and they should be in Mbps according to the documentation cited by Miguel at the beginning of the bug description.

Together with Miguel's explanation in Comment 0 and Comment 3, migration bandwidth behavior looks correct to me. Meital, could you please clarify what's wrong exactly and why?

Comment 5 meital avital 2019-01-17 12:27:42 UTC

Hi Milan and Miguel,

After your explanation now I understand the behavior, 
As a customer, I would like to see an explanation about the value and the scale units(Mib/s / Mbps)
so my suggestion is to add this explanation to the tool-tip (i) or on the mouseover on the test box check.

Miguel, I didn't understand how you saw 40 when you set 10 Mbps - in this case it should be 52, right?

Milan, why not that the engine will send Mib/s instead of Mbps? (meaning why the engine and VDSM not using the same scale units.)

Comment 6 Milan Zamazal 2019-01-17 13:21:34 UTC

(In reply to meital avital from comment #5)

> Milan, why not that the engine will send Mib/s instead of Mbps? (meaning why
> the engine and VDSM not using the same scale units.)

I don't know, perhaps the units have been changed in Engine some time without changing the API (nobody wants to change the API, to not break cross-version compatibility).

Since the original bug is apparently fixed, I'm moving the bug back to modify. If UI improvements are needed, let's have a separate bug for them.

Comment 7 Miguel Martin 2019-01-17 14:12:52 UTC

> Hi Milan and Miguel,
> 
> After your explanation now I understand the behavior, 
> As a customer, I would like to see an explanation about the value and the
> scale units(Mib/s / Mbps)
> so my suggestion is to add this explanation to the tool-tip (i) or on the
> mouseover on the test box check

I am not sure the customer needs to know about the unit change performed by the engine. 
Regarding the value, I guess we could explain the lower limit but I wouldn't do it either because nobody is going to use that low bandwidth.
Have in mind that the hypervisors have at least 1000Mbps interfaces (I think 10000Mbps is the most common nowadays) and I would say having VMs with 8 or 16 GB of RAM is quite common too.
Migrating those VMs with a high amount of RAM with that low bandwidth will make the migration to fail or at least, in the best case, to take ages.

> 
> Miguel, I didn't understand how you saw 40 when you set 10 Mbps - in this
> case it should be 52, right?

Actually, I didn't use 10 Mbps in my local environment but 400Mbps. Obviously, I wasn't aware of the lower limit at that time or I wouldn't have written 10Mbps in the bug report
 
> 
> Milan, why not that the engine will send Mib/s instead of Mbps? (meaning why
> the engine and VDSM not using the same scale units.)

I thought the same when I found the problem and that is another possibility, certainly. But if we do that, we would need to change not only the message in the dialog, we would need to change all the references in the documentation which IIRC were a lot.

Comment 8 meital avital 2019-01-22 09:56:48 UTC

Verified on:
ovirt-engine-4.3.0-0.4.master.20181230173049.gitef04cb4.el7.noarch
qemu-kvm-ev-2.12.0-18.el7_6.1.1.x86_64
vdsm-4.30.4-81.gitad6147e.el7.x86_64
libvirt-client-4.5.0-10.el7_6.3.x86_64

Comment 9 Sandro Bonazzola 2019-02-13 07:44:59 UTC

This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.