Bug 1302020

Summary: [Host QoS] - Set maximum link share('ls') value for all classes on the default class
Product: [oVirt] vdsm Reporter: Michael Burman <mburman>
Component: CoreAssignee: Edward Haas <edwardh>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: medium Docs Contact:
Priority: high    
Version: 4.17.18CC: bugs, danken, edwardh, eedri, stirabos, ylavi
Target Milestone: ovirt-4.1.1Flags: rule-engine: ovirt-4.1+
rule-engine: ovirt-4.2+
rule-engine: planning_ack+
rule-engine: devel_ack+
myakove: testing_ack+
Target Release: 4.19.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: v4.18.999-67 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-21 09:45:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1359484    
Bug Blocks:    

Description Michael Burman 2016-01-26 14:26:49 UTC
Description of problem:
[Host QoS] - link share('ls') is applied on a network without hostQos if attaching other network with hostQos to the same NIC using vdscli (no engine involved).

If attaching a network to NIC on host without any hostQos defined for example:
from vdsm import vdscli
s=vdscli.connect()
s.setupNetworks({'m3': {'nic': 'eno2', 'ipaddr': '7.7.7.7', 'netmask': '255.255.255.0', 'bridged': True}}, {}, {'connectivityCheck': False})

vdsClient -s 0 getVdsCaps --> report 'm3' network without any hostQoS.

If attaching second network to the same NIC with hostQos, the 'ls' link share is applied to first network as well. for example :
from vdsm import vdscli
s=vdscli.connect()
s.setupNetworks({'m2': {'nic': 'eno2', 'vlan': '163', 'ipaddr': '6.6.6.7', 'netmask': '255.255.255.0', 'hostQos': {'out': {'rt': {'m2': '100000000'}, 'ul': {'m2': '200000000'}, 'ls': {'m2': 10}}}, 'bridged': True}}, {}, {'connectivityCheck': False})

now vdsClient -s 0 getVdsCaps --> report 'm3' network with 'ls':10 

networks = {'m2': {'addr': '6.6.6.7',
                           'bridged': True,
                    'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 10},
                                               'rt': {'d': 0, 'm1': 0, 'm2': 100000000},
                                               'ul': {'d': 0, 'm1': 0, 'm2': 200000000}}},


m3': {'addr': '7.7.7.7',
                           'bridged': True,
       
'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 10}}},


Steps to Reproduce:
1. Manually attach network to NIC on host without hostQos 
2. Manually attach a second network(vlan) to the same NIC on host with a hostQos defined
3. run vdsClient -s 0 getVdsCaps

Actual results:
vdsCaps report the first network have link share(the same as the second one), although it was created without any hostQos 

Expected results:
Not sure if it's the expected behavior of the kernel/vdsm or it's a bug.

Comment 1 Edward Haas 2016-02-11 08:50:14 UTC
Currently, vdsm implements the following logic in regards to the default class:

- Non-vlan traffic is directed to the default class.
- The default class is created in a lazy stile: On the first network that requires QoS, the default class is created.
- The default class is set with the first network 'ls' value. 
  If the first network has no 'ls' value, it will raise an exception.
- A non-vlan network will override the default class and will set its own values.

The arbitrary 'ls' value of the default class is a bug.
Having an 'ls' value on the default class can be reasoned as follows: Traffic that does not fit any of the networks which defined QoS should be handled with fairness. 
We better find other reasoning for setting it, otherwise I recommend to remove it. The user controls the behaviour, except warning him that there is a network that has no QoS defined, we should not interfere.

Comment 2 Edward Haas 2016-02-11 09:32:36 UTC
Fix on the previous comment:
The default class (or any other class) under HFSC qdisc must have one of the behaviour settings (rt, ls or sc).
Open question: What should be set?

Comment 3 Edward Haas 2016-02-11 13:17:30 UTC
We will proceed with the following logic:
- The default class represents traffic that is for a network without a vlan or for networks without QoS.
- Any class under the hfsc qdisc must set one of the ls/rt/sc values.
  Therefore, we will continue the current logic of setting the default class upon creation, with the first qos network ls value.
- The default class ls value will get updated with the maximum ls value from all other classes defined.

Per this logic, the original bug opened here can be closed with 'as expected' resolution.

We will post a patch for keeping the maximum ls value for all classes on the default class.

Comment 4 Mike McCune 2016-03-28 22:32:18 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 5 Sandro Bonazzola 2016-05-02 10:03:44 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 6 Yaniv Lavi 2016-05-23 13:18:31 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 7 Yaniv Lavi 2016-05-23 13:22:35 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 8 Simone Tiraboschi 2016-06-13 14:14:25 UTC
Does it need to be backported?

Comment 9 Dan Kenigsberg 2016-06-15 12:47:07 UTC
it's no RC blocker. it can wait for 4.0.1.

Comment 10 Eyal Edri 2016-06-23 14:15:53 UTC
moving back to POST since I don't see a 4.0 backport

Comment 11 Michael Burman 2016-07-20 12:54:08 UTC
Hi Dan,

This bug is now reproduced involving the engine and i just want to be clear about the behavior and to understand if this is the same bug. 

Steps:
1) Attach network 'net1' without host QoS to NIC via setup networks
2) Attach vlan network 'net2' with some host QoS ls=70, rt=200, ul=200 via setup networks to the same NIC

Result:
1) caps reports now that 'net1' has ls=70
hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 70}}}

2) Engine report that 'net1' is out-of-sync with the host, because there is a difference between the DC and the host.

It is seems to be the same bug, but, it is a new behavior when doing it via the engine.

Comment 12 Dan Kenigsberg 2016-07-22 11:14:43 UTC
Ah yes, we should have updated the summary line. We still (have to) add an "ls" value to the base nic. The only change is that the value is not arbitrary: it is maximum of all other "ls"s.

QoS being out-of-sync can (and should) be fixed by setting an explicit QoS on the base nic.

Comment 13 Edward Haas 2016-07-22 17:27:19 UTC
Hi Michael,
The patch corresponds to the logic described in comment 3.
In general, if one of the networks on a specific nic/bond has QoS defined, all other networks better set their QoS. 
We no longer enforce it explicitly in Engine, but do the minimum at the host to make it work.

We may consider introducing a similar logic on Engine that fills up defaults.

Comment 14 Michael Burman 2016-12-05 09:51:04 UTC
Even when explicitly setting the non-vlan network with 'ls' value it is being overridden by the maximum 'ls' value of the vlan network.

The scenario is:

1) Attach network 'n2' with 'ls'=95 via setup networks
2) Attach vlan network 'm1' with 'ls=100' via setup networks to the same interface

'n1' network is got overridden with 'ls=100' and now reported as out -of-sync

There is an inconsistent between what reported on current run and caps. 

'n2': {'addr': '',
                           'bridged': True,
                           'dhcpv4': False,
                           'dhcpv6': False,
                           'gateway': '',
                           'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 100}}},


[root@orchid-vds2 ~]# cat /var/run/vdsm/netconf/nets/n2 
{
    "ipv6autoconf": false, 
    "bridged": true, 
    "nameservers": [], 
    "nic": "ens1f0", 
    "mtu": 1500, 
    "switch": "legacy", 
    "dhcpv6": false, 
    "stp": false, 
    "hostQos": {
        "out": {
            "ls": {
                "m2": 95
            }
        }
    }, 
    "defaultRoute": false


[root@orchid-vds2 ~]# tc class show dev ens1f0
class hfsc 1389: root 
class hfsc 1389:5 parent 1389: leaf 5: rt m1 0bit d 0us m2 500000Kbit ls m1 0bit d 0us m2 800bit ul m1 0bit d 0us m2 500000Kbit 
class hfsc 1389:1388 parent 1389: leaf 1388: ls m1 0bit d 0us m2 800bit

This report is not properly fixed. 

Note, that we currently blocking again from attaching network with qos and without to the same interface.

Comment 15 Dan Kenigsberg 2017-02-06 13:51:14 UTC
posted fix seems simple and harmless.

Comment 16 Michael Burman 2017-02-19 13:23:22 UTC
Please note that this fix is only for vdsm side. The engine still blocking from attaching non-QoS + QoS networks to the same interface. See BZ 1359484.

Verified on vdsm-4.19.6-1.el7ev.x86_64