This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1302020 - [Host QoS] - Set maximum link share('ls') value for all classes on the default class
[Host QoS] - Set maximum link share('ls') value for all classes on the defaul...
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
x86_64 Linux
high Severity medium (vote)
: ovirt-4.1.1
: 4.19.6
Assigned To: Edward Haas
Michael Burman
Depends On: 1359484
  Show dependency treegraph
Reported: 2016-01-26 09:26 EST by Michael Burman
Modified: 2017-04-21 05:45 EDT (History)
6 users (show)

See Also:
Fixed In Version: v4.18.999-67
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-04-21 05:45:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Network
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+
rule-engine: ovirt‑4.2+
rule-engine: planning_ack+
rule-engine: devel_ack+
myakove: testing_ack+

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 53623 master MERGED net hostQos: maintain default class ls value 2016-06-10 00:25 EDT
oVirt gerrit 71676 master MERGED net: QoS - preserve explicit default class link share 2017-02-10 15:12 EST
oVirt gerrit 72108 ovirt-4.1 MERGED net test: Add net qos test to the new func tests. 2017-02-14 05:13 EST
oVirt gerrit 72109 ovirt-4.1 MERGED net: QoS - preserve explicit default class link share 2017-02-14 05:13 EST

  None (edit)
Description Michael Burman 2016-01-26 09:26:49 EST
Description of problem:
[Host QoS] - link share('ls') is applied on a network without hostQos if attaching other network with hostQos to the same NIC using vdscli (no engine involved).

If attaching a network to NIC on host without any hostQos defined for example:
from vdsm import vdscli
s.setupNetworks({'m3': {'nic': 'eno2', 'ipaddr': '', 'netmask': '', 'bridged': True}}, {}, {'connectivityCheck': False})

vdsClient -s 0 getVdsCaps --> report 'm3' network without any hostQoS.

If attaching second network to the same NIC with hostQos, the 'ls' link share is applied to first network as well. for example :
from vdsm import vdscli
s.setupNetworks({'m2': {'nic': 'eno2', 'vlan': '163', 'ipaddr': '', 'netmask': '', 'hostQos': {'out': {'rt': {'m2': '100000000'}, 'ul': {'m2': '200000000'}, 'ls': {'m2': 10}}}, 'bridged': True}}, {}, {'connectivityCheck': False})

now vdsClient -s 0 getVdsCaps --> report 'm3' network with 'ls':10 

networks = {'m2': {'addr': '',
                           'bridged': True,
                    'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 10},
                                               'rt': {'d': 0, 'm1': 0, 'm2': 100000000},
                                               'ul': {'d': 0, 'm1': 0, 'm2': 200000000}}},

m3': {'addr': '',
                           'bridged': True,
'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 10}}},

Steps to Reproduce:
1. Manually attach network to NIC on host without hostQos 
2. Manually attach a second network(vlan) to the same NIC on host with a hostQos defined
3. run vdsClient -s 0 getVdsCaps

Actual results:
vdsCaps report the first network have link share(the same as the second one), although it was created without any hostQos 

Expected results:
Not sure if it's the expected behavior of the kernel/vdsm or it's a bug.
Comment 1 Edward Haas 2016-02-11 03:50:14 EST
Currently, vdsm implements the following logic in regards to the default class:

- Non-vlan traffic is directed to the default class.
- The default class is created in a lazy stile: On the first network that requires QoS, the default class is created.
- The default class is set with the first network 'ls' value. 
  If the first network has no 'ls' value, it will raise an exception.
- A non-vlan network will override the default class and will set its own values.

The arbitrary 'ls' value of the default class is a bug.
Having an 'ls' value on the default class can be reasoned as follows: Traffic that does not fit any of the networks which defined QoS should be handled with fairness. 
We better find other reasoning for setting it, otherwise I recommend to remove it. The user controls the behaviour, except warning him that there is a network that has no QoS defined, we should not interfere.
Comment 2 Edward Haas 2016-02-11 04:32:36 EST
Fix on the previous comment:
The default class (or any other class) under HFSC qdisc must have one of the behaviour settings (rt, ls or sc).
Open question: What should be set?
Comment 3 Edward Haas 2016-02-11 08:17:30 EST
We will proceed with the following logic:
- The default class represents traffic that is for a network without a vlan or for networks without QoS.
- Any class under the hfsc qdisc must set one of the ls/rt/sc values.
  Therefore, we will continue the current logic of setting the default class upon creation, with the first qos network ls value.
- The default class ls value will get updated with the maximum ls value from all other classes defined.

Per this logic, the original bug opened here can be closed with 'as expected' resolution.

We will post a patch for keeping the maximum ls value for all classes on the default class.
Comment 4 Mike McCune 2016-03-28 18:32:18 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see with any questions
Comment 5 Sandro Bonazzola 2016-05-02 06:03:44 EDT
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Comment 6 Yaniv Lavi (Dary) 2016-05-23 09:18:31 EDT
oVirt 4.0 beta has been released, moving to RC milestone.
Comment 7 Yaniv Lavi (Dary) 2016-05-23 09:22:35 EDT
oVirt 4.0 beta has been released, moving to RC milestone.
Comment 8 Simone Tiraboschi 2016-06-13 10:14:25 EDT
Does it need to be backported?
Comment 9 Dan Kenigsberg 2016-06-15 08:47:07 EDT
it's no RC blocker. it can wait for 4.0.1.
Comment 10 Eyal Edri 2016-06-23 10:15:53 EDT
moving back to POST since I don't see a 4.0 backport
Comment 11 Michael Burman 2016-07-20 08:54:08 EDT
Hi Dan,

This bug is now reproduced involving the engine and i just want to be clear about the behavior and to understand if this is the same bug. 

1) Attach network 'net1' without host QoS to NIC via setup networks
2) Attach vlan network 'net2' with some host QoS ls=70, rt=200, ul=200 via setup networks to the same NIC

1) caps reports now that 'net1' has ls=70
hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 70}}}

2) Engine report that 'net1' is out-of-sync with the host, because there is a difference between the DC and the host.

It is seems to be the same bug, but, it is a new behavior when doing it via the engine.
Comment 12 Dan Kenigsberg 2016-07-22 07:14:43 EDT
Ah yes, we should have updated the summary line. We still (have to) add an "ls" value to the base nic. The only change is that the value is not arbitrary: it is maximum of all other "ls"s.

QoS being out-of-sync can (and should) be fixed by setting an explicit QoS on the base nic.
Comment 13 Edward Haas 2016-07-22 13:27:19 EDT
Hi Michael,
The patch corresponds to the logic described in comment 3.
In general, if one of the networks on a specific nic/bond has QoS defined, all other networks better set their QoS. 
We no longer enforce it explicitly in Engine, but do the minimum at the host to make it work.

We may consider introducing a similar logic on Engine that fills up defaults.
Comment 14 Michael Burman 2016-12-05 04:51:04 EST
Even when explicitly setting the non-vlan network with 'ls' value it is being overridden by the maximum 'ls' value of the vlan network.

The scenario is:

1) Attach network 'n2' with 'ls'=95 via setup networks
2) Attach vlan network 'm1' with 'ls=100' via setup networks to the same interface

'n1' network is got overridden with 'ls=100' and now reported as out -of-sync

There is an inconsistent between what reported on current run and caps. 

'n2': {'addr': '',
                           'bridged': True,
                           'dhcpv4': False,
                           'dhcpv6': False,
                           'gateway': '',
                           'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 100}}},

[root@orchid-vds2 ~]# cat /var/run/vdsm/netconf/nets/n2 
    "ipv6autoconf": false, 
    "bridged": true, 
    "nameservers": [], 
    "nic": "ens1f0", 
    "mtu": 1500, 
    "switch": "legacy", 
    "dhcpv6": false, 
    "stp": false, 
    "hostQos": {
        "out": {
            "ls": {
                "m2": 95
    "defaultRoute": false

[root@orchid-vds2 ~]# tc class show dev ens1f0
class hfsc 1389: root 
class hfsc 1389:5 parent 1389: leaf 5: rt m1 0bit d 0us m2 500000Kbit ls m1 0bit d 0us m2 800bit ul m1 0bit d 0us m2 500000Kbit 
class hfsc 1389:1388 parent 1389: leaf 1388: ls m1 0bit d 0us m2 800bit

This report is not properly fixed. 

Note, that we currently blocking again from attaching network with qos and without to the same interface.
Comment 15 Dan Kenigsberg 2017-02-06 08:51:14 EST
posted fix seems simple and harmless.
Comment 16 Michael Burman 2017-02-19 08:23:22 EST
Please note that this fix is only for vdsm side. The engine still blocking from attaching non-QoS + QoS networks to the same interface. See BZ 1359484.

Verified on vdsm-4.19.6-1.el7ev.x86_64

Note You need to log in before you can comment on or make changes to this bug.