Bug 1316568 - backport 64bit ratelimit
Summary: backport 64bit ratelimit
Keywords:
Status: CLOSED DUPLICATE of bug 1515877
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.1.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ovirt-4.3.0
: ---
Assignee: Nobody
QA Contact: Meni Yakove
URL:
Whiteboard:
: 1464734 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-10 13:50 UTC by Michael Burman
Modified: 2018-05-15 07:11 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-21 16:25:22 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.3?
rule-engine: planning_ack?
mburman: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
Logs (1.39 MB, application/x-gzip)
2016-03-10 13:50 UTC, Michael Burman
no flags Details
support u64 parameters in hfsc (5.58 KB, patch)
2016-04-27 21:58 UTC, Hannes Frederic Sowa
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 90799 0 master MERGED core: QoS - support larger bit rates 2018-05-29 09:02:17 UTC
oVirt gerrit 90813 0 master MERGED core: extend QoS max bit rate to 34Gbit/sec 2018-05-29 09:03:26 UTC
oVirt gerrit 91223 0 master MERGED webadmin: fix VM network QoS displayed units 2018-05-29 09:03:28 UTC
oVirt gerrit 91245 0 master MERGED api-model: Network QoS - explain bit rate units 2018-06-04 12:11:18 UTC

Description Michael Burman 2016-03-10 13:50:59 UTC
Created attachment 1134894 [details]
Logs

Description of problem:
[Host QoS] - Configuring 'Rate Limit' and Committed Rate' values higher then the default(2048) on the management network will end up with host loosing connectivity with the engine. 
Host will loose ip and only rebooting the server will recover him(restarting the network service or giving dhclient manually on the 'ovirtmgmt' won't help).
"Error while executing action HostSetupNetworks: Network error during communication with the Host."


- On other networks, that are not the management one, such operation will end up with -->  
'Error while executing action HostSetupNetworks: Unexpected exception'

The default values for 'Rate Limit' and 'Committed Rate' in the engine-config are 2048 :
MaxAverageNetworkQoSValue: 2048 version: general
MaxPeakNetworkQoSValue: 2048 version: general

If we have a wide bandwidth nic(10Gb) and we would like to set some QoS capping higher then 2048, we need to configure this values via the engine-config and restart the engine.

Once entering values higher then 2048 in the UI(rate limit, committed rate) on the management network we will loose connectivity with the host and the ip is gone. 

From supervdsm.log it seems like the values that are sent are negative -->

aise TrafficControlException(retcode, err, command)
TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n', ['/usr/sbin/tc', 'class', 'add', 'dev', u'enp7s0f0', 'parent', '1389:', 'classid', '1389:1388', 'hfsc', u'rt', 'm2', '-1294967296bit', u'ul', 'm2', '-1294967296bit', u'ls', 'm2', '800bit'])


Version-Release number of selected component (if applicable):
vdsm-4.17.23-0.el7ev.noarch
rhevm-3.6.3.4-0.1.el6.noarch

How reproducible:
100

Steps to Reproduce:
1. Increase the default values in the engine-config to:
MaxAverageNetworkQoSValue=10000
MaxPeakNetworkQoSValue=10000 ans restart engine
2. Via the Setup Networks dialog, Override QoS on the management network 'ovirtmgmt' with:
Link Share=100
Rate Limit=3000
Committed Rate=3000 and try to approve operation

Actual results:
Setup Networks operation hangs up for a long time and then,
Host lost connectivity with the engine and it's has no ip. Rebooting the server will bring him up.

Expected results:
Configuring QoS values higher then default should be possible on the management network and on other networks. 
We shouldn't loose connectivity and ip with such operation.

Additional info:
On regular networks we get an error in vdsm.log:
jsonrpc.Executor/6::ERROR::2016-03-10 15:18:47,700::__init__::526::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 521, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 279, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1498, in setupNetworks
    except hooks.HookError as e:
AttributeError: 'module' object has no attribute 'HookError'

supervdsm.log :
raise TrafficControlException(retcode, err, command)
TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n', ['/usr/sbin/tc', 'class', 'add', 'dev', 'eno2', 'parent', '1389:', 'classid', '1389:1388', 'hfsc', 'rt', 'm2', '-1294967296bit', 'ul', 'm2', '-1294967296bit', 'ls', 'm2', '800bit'])

engine -
'Error while executing action HostSetupNetworks: Unexpected exception'

Attaching logs.

Comment 1 Dan Kenigsberg 2016-03-21 12:17:12 UTC
Jiri, do you know if tc and hfsc are expected to support bitrates higher than 2Gbps?

Comment 2 Dan Kenigsberg 2016-04-27 14:41:07 UTC
I'm told by Hannes Sowa that upstream kernel and tc accept 64bit rate limit. If would be nice if this is easily backportable to el7.

Comment 3 Phil Sutter 2016-04-27 18:16:44 UTC
Hi,

(In reply to Michael Burman from comment #0)
> From supervdsm.log it seems like the values that are sent are negative -->
> 
> aise TrafficControlException(retcode, err, command)
> TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n',
> ['/usr/sbin/tc', 'class', 'add', 'dev', u'enp7s0f0', 'parent', '1389:',
> 'classid', '1389:1388', 'hfsc', u'rt', 'm2', '-1294967296bit', u'ul', 'm2',
> '-1294967296bit', u'ls', 'm2', '800bit'])

This really looks like tc is being called with negative values in the first place, so the error message it prints is expected behaviour. Not sure if I am stating the obvious here though, that's not clear from reading the comments above.

(In reply to Dan Kenigsberg from comment #2)
> I'm told by Hannes Sowa that upstream kernel and tc accept 64bit rate limit.
> If would be nice if this is easily backportable to el7.

Is this a separate request? If so, how is it related to this ticket? Or are you just highjacking the ticket? :)

Looking at tc/q_hfsc.c in upstream iproute2 I don't see any differences to RHEL7 in matters of parameter parsing. It just uses get_rate(), which interprets the values as unsigned integers. Though this should already allow for values up to 4gb.

Comment 4 Hannes Frederic Sowa 2016-04-27 19:48:07 UTC
Hi,

for hfsc I was wrong, it seems we don't yet support 64 bit numbers. But that shouldn't be a hard problem to fix upstream. We simply have to add a new netlink attribute and a new definition of tc_service_curve plus enhance some helpers.

In tc we need to switch over to get_rate64 at some places then and send compatibility values in the other attribute still.

Sorry, missed that this is hfsc, but we could still do this before 7.3.

Sorry,
Hannes

Comment 5 Hannes Frederic Sowa 2016-04-27 21:58:50 UTC
Created attachment 1151633 [details]
support u64 parameters in hfsc

Preliminary patch for 64 bit parameters to hfsc kernel support.

Comment 6 Dan Kenigsberg 2016-04-28 11:53:00 UTC
Yes, Phil, this was a (failed) hijack attempt. For a minute I thought that there is no need for any change on the Vdsm application level, but we'd need changes in libnl3, kernel and vdsm, besides of iproute.

Comment 7 Phil Sutter 2016-04-29 11:48:34 UTC
Hi,

(In reply to Dan Kenigsberg from comment #6)
> Yes, Phil, this was a (failed) hijack attempt. For a minute I thought that
> there is no need for any change on the Vdsm application level, but we'd need
> changes in libnl3, kernel and vdsm, besides of iproute.

Thanks for being honest with us. ;)

So is Michael's problem already resolved? If not, can we keep this ticket about the integer overflow issue in vdsm and have a dedicated one for 64bit support in HFSC? Otherwise we might lose track of that vdsm issue, no?

Cheers, Phil

Comment 8 Phil Sutter 2016-06-21 21:35:05 UTC
Michael,

Does your problem with calling tc from vdsm still exist? Can you validate if my assumption is correct and a negative value is passed to tc? If so, which component to move this ticket to?

Thanks, Phil

Comment 9 Michael Burman 2016-06-22 05:30:49 UTC
Hi Phil, yes it is still exist -->

jsonrpc.Executor/2::ERROR::2016-06-22 08:24:47,284::__init__::543::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 538, in _serveRequest
    res = method(**params)
  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 195, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1459, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 53, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 51, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n', ['/usr/sbin/tc', 'class', 'add', 'dev', 'enp4s0f1', 'parent', '1389:', 'classid', '1389:1388', 'hfsc', 'rt', 'm2', '-1294967296bit', 'ul', '
m2', '-1294967296bit', 'ls', 'm2', '800bit'])

yes, it is seems like negative values passed to tc indeed^^
I really don't know to which component this report should be moved.

Comment 10 Phil Sutter 2016-06-22 09:43:16 UTC
Hi Michael,

(In reply to Michael Burman from comment #9)
> yes, it is seems like negative values passed to tc indeed^^
> I really don't know to which component this report should be moved.

OK, thanks. I'll just move it back to where it came from then. :)

Dan, if you're still interested in those 64bit counters for HFSC, please clone this ticket, set the component back to iproute and adjust the description accordingly.

Thanks, Phil

Comment 11 Meni Yakove 2016-07-04 08:35:54 UTC
Indeed, there's an integer overflow on Engine side: vdsm receives negative values.

jsonrpc.Executor/7::DEBUG::2016-07-04 11:27:31,177::__init__::522::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setupNetworks' in bridge with {'bondings': {}, 'networks': {'net1': {'
ipv6autoconf': True, 'hostQos': {'out': {'rt': {'m2': -1294967296}, 'ul': {'m2': -1294967296}, 'ls': {'m2': 100}}}, 'nic': 'enp1s0f1', 'mtu': 1500, 'dhcpv6': False, 'STP': 'no', 'bridged': '
true'}}, 'options': {'connectivityCheck': 'true', 'connectivityTimeout': 120}}

Comment 12 Yevgeny Zaspitsky 2016-07-04 10:27:06 UTC
Looks like the overflow is on the engine side. It holds values as Integers, which are signed 32-bit in Java.
The ul.m2 and rt.m2 values are multiplied by 8 before sending to VDSM, so that could be an additional limitation.

Comment 13 Yaniv Lavi 2017-07-05 07:40:47 UTC
*** Bug 1464734 has been marked as a duplicate of this bug. ***

Comment 14 Dan Kenigsberg 2017-11-21 16:25:22 UTC

*** This bug has been marked as a duplicate of bug 1515877 ***


Note You need to log in before you can comment on or make changes to this bug.