Bug 1316568 - backport 64bit ratelimit
backport 64bit ratelimit
Status: NEW
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network (Show other bugs)
4.1.0
x86_64 Linux
low Severity low (vote)
: ovirt-4.3.0
: ---
Assigned To: nobody nobody
Meni Yakove
:
: 1464734 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-10 08:50 EST by Michael Burman
Modified: 2017-07-16 11:30 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Network
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.3?
rule-engine: planning_ack?
mburman: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
Logs (1.39 MB, application/x-gzip)
2016-03-10 08:50 EST, Michael Burman
no flags Details
support u64 parameters in hfsc (5.58 KB, patch)
2016-04-27 17:58 EDT, Hannes Frederic Sowa
no flags Details | Diff

  None (edit)
Description Michael Burman 2016-03-10 08:50:59 EST
Created attachment 1134894 [details]
Logs

Description of problem:
[Host QoS] - Configuring 'Rate Limit' and Committed Rate' values higher then the default(2048) on the management network will end up with host loosing connectivity with the engine. 
Host will loose ip and only rebooting the server will recover him(restarting the network service or giving dhclient manually on the 'ovirtmgmt' won't help).
"Error while executing action HostSetupNetworks: Network error during communication with the Host."


- On other networks, that are not the management one, such operation will end up with -->  
'Error while executing action HostSetupNetworks: Unexpected exception'

The default values for 'Rate Limit' and 'Committed Rate' in the engine-config are 2048 :
MaxAverageNetworkQoSValue: 2048 version: general
MaxPeakNetworkQoSValue: 2048 version: general

If we have a wide bandwidth nic(10Gb) and we would like to set some QoS capping higher then 2048, we need to configure this values via the engine-config and restart the engine.

Once entering values higher then 2048 in the UI(rate limit, committed rate) on the management network we will loose connectivity with the host and the ip is gone. 

From supervdsm.log it seems like the values that are sent are negative -->

aise TrafficControlException(retcode, err, command)
TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n', ['/usr/sbin/tc', 'class', 'add', 'dev', u'enp7s0f0', 'parent', '1389:', 'classid', '1389:1388', 'hfsc', u'rt', 'm2', '-1294967296bit', u'ul', 'm2', '-1294967296bit', u'ls', 'm2', '800bit'])


Version-Release number of selected component (if applicable):
vdsm-4.17.23-0.el7ev.noarch
rhevm-3.6.3.4-0.1.el6.noarch

How reproducible:
100

Steps to Reproduce:
1. Increase the default values in the engine-config to:
MaxAverageNetworkQoSValue=10000
MaxPeakNetworkQoSValue=10000 ans restart engine
2. Via the Setup Networks dialog, Override QoS on the management network 'ovirtmgmt' with:
Link Share=100
Rate Limit=3000
Committed Rate=3000 and try to approve operation

Actual results:
Setup Networks operation hangs up for a long time and then,
Host lost connectivity with the engine and it's has no ip. Rebooting the server will bring him up.

Expected results:
Configuring QoS values higher then default should be possible on the management network and on other networks. 
We shouldn't loose connectivity and ip with such operation.

Additional info:
On regular networks we get an error in vdsm.log:
jsonrpc.Executor/6::ERROR::2016-03-10 15:18:47,700::__init__::526::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 521, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 279, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1498, in setupNetworks
    except hooks.HookError as e:
AttributeError: 'module' object has no attribute 'HookError'

supervdsm.log :
raise TrafficControlException(retcode, err, command)
TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n', ['/usr/sbin/tc', 'class', 'add', 'dev', 'eno2', 'parent', '1389:', 'classid', '1389:1388', 'hfsc', 'rt', 'm2', '-1294967296bit', 'ul', 'm2', '-1294967296bit', 'ls', 'm2', '800bit'])

engine -
'Error while executing action HostSetupNetworks: Unexpected exception'

Attaching logs.
Comment 1 Dan Kenigsberg 2016-03-21 08:17:12 EDT
Jiri, do you know if tc and hfsc are expected to support bitrates higher than 2Gbps?
Comment 2 Dan Kenigsberg 2016-04-27 10:41:07 EDT
I'm told by Hannes Sowa that upstream kernel and tc accept 64bit rate limit. If would be nice if this is easily backportable to el7.
Comment 3 Phil Sutter 2016-04-27 14:16:44 EDT
Hi,

(In reply to Michael Burman from comment #0)
> From supervdsm.log it seems like the values that are sent are negative -->
> 
> aise TrafficControlException(retcode, err, command)
> TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n',
> ['/usr/sbin/tc', 'class', 'add', 'dev', u'enp7s0f0', 'parent', '1389:',
> 'classid', '1389:1388', 'hfsc', u'rt', 'm2', '-1294967296bit', u'ul', 'm2',
> '-1294967296bit', u'ls', 'm2', '800bit'])

This really looks like tc is being called with negative values in the first place, so the error message it prints is expected behaviour. Not sure if I am stating the obvious here though, that's not clear from reading the comments above.

(In reply to Dan Kenigsberg from comment #2)
> I'm told by Hannes Sowa that upstream kernel and tc accept 64bit rate limit.
> If would be nice if this is easily backportable to el7.

Is this a separate request? If so, how is it related to this ticket? Or are you just highjacking the ticket? :)

Looking at tc/q_hfsc.c in upstream iproute2 I don't see any differences to RHEL7 in matters of parameter parsing. It just uses get_rate(), which interprets the values as unsigned integers. Though this should already allow for values up to 4gb.
Comment 4 Hannes Frederic Sowa 2016-04-27 15:48:07 EDT
Hi,

for hfsc I was wrong, it seems we don't yet support 64 bit numbers. But that shouldn't be a hard problem to fix upstream. We simply have to add a new netlink attribute and a new definition of tc_service_curve plus enhance some helpers.

In tc we need to switch over to get_rate64 at some places then and send compatibility values in the other attribute still.

Sorry, missed that this is hfsc, but we could still do this before 7.3.

Sorry,
Hannes
Comment 5 Hannes Frederic Sowa 2016-04-27 17:58 EDT
Created attachment 1151633 [details]
support u64 parameters in hfsc

Preliminary patch for 64 bit parameters to hfsc kernel support.
Comment 6 Dan Kenigsberg 2016-04-28 07:53:00 EDT
Yes, Phil, this was a (failed) hijack attempt. For a minute I thought that there is no need for any change on the Vdsm application level, but we'd need changes in libnl3, kernel and vdsm, besides of iproute.
Comment 7 Phil Sutter 2016-04-29 07:48:34 EDT
Hi,

(In reply to Dan Kenigsberg from comment #6)
> Yes, Phil, this was a (failed) hijack attempt. For a minute I thought that
> there is no need for any change on the Vdsm application level, but we'd need
> changes in libnl3, kernel and vdsm, besides of iproute.

Thanks for being honest with us. ;)

So is Michael's problem already resolved? If not, can we keep this ticket about the integer overflow issue in vdsm and have a dedicated one for 64bit support in HFSC? Otherwise we might lose track of that vdsm issue, no?

Cheers, Phil
Comment 8 Phil Sutter 2016-06-21 17:35:05 EDT
Michael,

Does your problem with calling tc from vdsm still exist? Can you validate if my assumption is correct and a negative value is passed to tc? If so, which component to move this ticket to?

Thanks, Phil
Comment 9 Michael Burman 2016-06-22 01:30:49 EDT
Hi Phil, yes it is still exist -->

jsonrpc.Executor/2::ERROR::2016-06-22 08:24:47,284::__init__::543::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 538, in _serveRequest
    res = method(**params)
  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 195, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1459, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 53, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 51, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
TrafficControlException: (1, 'HFSC: Illegal "m2"\nHFSC: Illegal "rt"\n', ['/usr/sbin/tc', 'class', 'add', 'dev', 'enp4s0f1', 'parent', '1389:', 'classid', '1389:1388', 'hfsc', 'rt', 'm2', '-1294967296bit', 'ul', '
m2', '-1294967296bit', 'ls', 'm2', '800bit'])

yes, it is seems like negative values passed to tc indeed^^
I really don't know to which component this report should be moved.
Comment 10 Phil Sutter 2016-06-22 05:43:16 EDT
Hi Michael,

(In reply to Michael Burman from comment #9)
> yes, it is seems like negative values passed to tc indeed^^
> I really don't know to which component this report should be moved.

OK, thanks. I'll just move it back to where it came from then. :)

Dan, if you're still interested in those 64bit counters for HFSC, please clone this ticket, set the component back to iproute and adjust the description accordingly.

Thanks, Phil
Comment 11 Meni Yakove 2016-07-04 04:35:54 EDT
Indeed, there's an integer overflow on Engine side: vdsm receives negative values.

jsonrpc.Executor/7::DEBUG::2016-07-04 11:27:31,177::__init__::522::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setupNetworks' in bridge with {'bondings': {}, 'networks': {'net1': {'
ipv6autoconf': True, 'hostQos': {'out': {'rt': {'m2': -1294967296}, 'ul': {'m2': -1294967296}, 'ls': {'m2': 100}}}, 'nic': 'enp1s0f1', 'mtu': 1500, 'dhcpv6': False, 'STP': 'no', 'bridged': '
true'}}, 'options': {'connectivityCheck': 'true', 'connectivityTimeout': 120}}
Comment 12 Yevgeny Zaspitsky 2016-07-04 06:27:06 EDT
Looks like the overflow is on the engine side. It holds values as Integers, which are signed 32-bit in Java.
The ul.m2 and rt.m2 values are multiplied by 8 before sending to VDSM, so that could be an additional limitation.
Comment 13 Yaniv Lavi (Dary) 2017-07-05 03:40:47 EDT
*** Bug 1464734 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.