Description of problem: [Host QoS] - link share('ls') is applied on a network without hostQos if attaching other network with hostQos to the same NIC using vdscli (no engine involved). If attaching a network to NIC on host without any hostQos defined for example: from vdsm import vdscli s=vdscli.connect() s.setupNetworks({'m3': {'nic': 'eno2', 'ipaddr': '7.7.7.7', 'netmask': '255.255.255.0', 'bridged': True}}, {}, {'connectivityCheck': False}) vdsClient -s 0 getVdsCaps --> report 'm3' network without any hostQoS. If attaching second network to the same NIC with hostQos, the 'ls' link share is applied to first network as well. for example : from vdsm import vdscli s=vdscli.connect() s.setupNetworks({'m2': {'nic': 'eno2', 'vlan': '163', 'ipaddr': '6.6.6.7', 'netmask': '255.255.255.0', 'hostQos': {'out': {'rt': {'m2': '100000000'}, 'ul': {'m2': '200000000'}, 'ls': {'m2': 10}}}, 'bridged': True}}, {}, {'connectivityCheck': False}) now vdsClient -s 0 getVdsCaps --> report 'm3' network with 'ls':10 networks = {'m2': {'addr': '6.6.6.7', 'bridged': True, 'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 10}, 'rt': {'d': 0, 'm1': 0, 'm2': 100000000}, 'ul': {'d': 0, 'm1': 0, 'm2': 200000000}}}, m3': {'addr': '7.7.7.7', 'bridged': True, 'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 10}}}, Steps to Reproduce: 1. Manually attach network to NIC on host without hostQos 2. Manually attach a second network(vlan) to the same NIC on host with a hostQos defined 3. run vdsClient -s 0 getVdsCaps Actual results: vdsCaps report the first network have link share(the same as the second one), although it was created without any hostQos Expected results: Not sure if it's the expected behavior of the kernel/vdsm or it's a bug.
Currently, vdsm implements the following logic in regards to the default class: - Non-vlan traffic is directed to the default class. - The default class is created in a lazy stile: On the first network that requires QoS, the default class is created. - The default class is set with the first network 'ls' value. If the first network has no 'ls' value, it will raise an exception. - A non-vlan network will override the default class and will set its own values. The arbitrary 'ls' value of the default class is a bug. Having an 'ls' value on the default class can be reasoned as follows: Traffic that does not fit any of the networks which defined QoS should be handled with fairness. We better find other reasoning for setting it, otherwise I recommend to remove it. The user controls the behaviour, except warning him that there is a network that has no QoS defined, we should not interfere.
Fix on the previous comment: The default class (or any other class) under HFSC qdisc must have one of the behaviour settings (rt, ls or sc). Open question: What should be set?
We will proceed with the following logic: - The default class represents traffic that is for a network without a vlan or for networks without QoS. - Any class under the hfsc qdisc must set one of the ls/rt/sc values. Therefore, we will continue the current logic of setting the default class upon creation, with the first qos network ls value. - The default class ls value will get updated with the maximum ls value from all other classes defined. Per this logic, the original bug opened here can be closed with 'as expected' resolution. We will post a patch for keeping the maximum ls value for all classes on the default class.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
oVirt 4.0 beta has been released, moving to RC milestone.
Does it need to be backported?
it's no RC blocker. it can wait for 4.0.1.
moving back to POST since I don't see a 4.0 backport
Hi Dan, This bug is now reproduced involving the engine and i just want to be clear about the behavior and to understand if this is the same bug. Steps: 1) Attach network 'net1' without host QoS to NIC via setup networks 2) Attach vlan network 'net2' with some host QoS ls=70, rt=200, ul=200 via setup networks to the same NIC Result: 1) caps reports now that 'net1' has ls=70 hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 70}}} 2) Engine report that 'net1' is out-of-sync with the host, because there is a difference between the DC and the host. It is seems to be the same bug, but, it is a new behavior when doing it via the engine.
Ah yes, we should have updated the summary line. We still (have to) add an "ls" value to the base nic. The only change is that the value is not arbitrary: it is maximum of all other "ls"s. QoS being out-of-sync can (and should) be fixed by setting an explicit QoS on the base nic.
Hi Michael, The patch corresponds to the logic described in comment 3. In general, if one of the networks on a specific nic/bond has QoS defined, all other networks better set their QoS. We no longer enforce it explicitly in Engine, but do the minimum at the host to make it work. We may consider introducing a similar logic on Engine that fills up defaults.
Even when explicitly setting the non-vlan network with 'ls' value it is being overridden by the maximum 'ls' value of the vlan network. The scenario is: 1) Attach network 'n2' with 'ls'=95 via setup networks 2) Attach vlan network 'm1' with 'ls=100' via setup networks to the same interface 'n1' network is got overridden with 'ls=100' and now reported as out -of-sync There is an inconsistent between what reported on current run and caps. 'n2': {'addr': '', 'bridged': True, 'dhcpv4': False, 'dhcpv6': False, 'gateway': '', 'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 100}}}, [root@orchid-vds2 ~]# cat /var/run/vdsm/netconf/nets/n2 { "ipv6autoconf": false, "bridged": true, "nameservers": [], "nic": "ens1f0", "mtu": 1500, "switch": "legacy", "dhcpv6": false, "stp": false, "hostQos": { "out": { "ls": { "m2": 95 } } }, "defaultRoute": false [root@orchid-vds2 ~]# tc class show dev ens1f0 class hfsc 1389: root class hfsc 1389:5 parent 1389: leaf 5: rt m1 0bit d 0us m2 500000Kbit ls m1 0bit d 0us m2 800bit ul m1 0bit d 0us m2 500000Kbit class hfsc 1389:1388 parent 1389: leaf 1388: ls m1 0bit d 0us m2 800bit This report is not properly fixed. Note, that we currently blocking again from attaching network with qos and without to the same interface.
posted fix seems simple and harmless.
Please note that this fix is only for vdsm side. The engine still blocking from attaching non-QoS + QoS networks to the same interface. See BZ 1359484. Verified on vdsm-4.19.6-1.el7ev.x86_64