Bug 1464055

Summary: bond has no speed set in active_migration_network_interfaces
Product: [oVirt] ovirt-engine Reporter: Tomas Jelinek <tjelinek>
Component: Backend.CoreAssignee: Edward Haas <edwardh>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.0CC: bugs, danken, klaas, mburman, mmucha, myakove, tjelinek, ylavi
Target Milestone: ovirt-4.2.0Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: engine does not read bond speed from refreshcapabilities call. refreshcapabilities does not contain speed for bonds. Allegedly refreshcapabilities should not even contain that, but there's no simple way for engine to read it from stats call. Consequence: bond does not report speed. Fix: Engine code was fixed to read speed for bonds from refreshcapabilities, so bonds speed is read from it just like any other iface. Result: I don't know how vdsm should and will behave. If vdsm is modified so that refreshcapabilities reports bonds speed, this reported speed will propagate through engine to the user.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-20 11:26:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
supervdsm.log none

Description Tomas Jelinek 2017-06-22 10:54:45 UTC
If the VM is being migrated, the max bandwidth of the migration is calculated based on the getActiveMigrationNetworkInterfaceForHost. If this interface happens to be a bond, the speed of this bond is set to 0 and the migration max bandwidth than falls back to the default value configured in vdsm.conf.

Expected results:
The "speed" property will be populated based on the bond type and its nics so the max bandwidth will be than populated by a more realistic value.

Comment 1 Michael Burman 2017-07-05 07:43:04 UTC
vdsm already report speed for bond - 

"bond0": {
            "rxErrors": "0", 
            "name": "bond0", 
            "tx": "2566194", 
            "txDropped": "0", 
            "sampleTime": 1499240322.044425, 
            "rx": "3900344926", 
            "txErrors": "0", 
            "state": "up", 
            "speed": "2000", 
            "rxDropped": "17807"

We should add this for the engine as well.

Comment 2 Martin Mucha 2017-07-11 18:10:35 UTC
(In reply to Tomas Jelinek from comment #0)
> If the VM is being migrated, the max bandwidth of the migration is
> calculated based on the getActiveMigrationNetworkInterfaceForHost. If this
> interface happens to be a bond, the speed of this bond is set to 0 and the
> migration max bandwidth than falls back to the default value configured in
> vdsm.conf.
> 
> Expected results:
> The "speed" property will be populated based on the bond type and its nics
> so the max bandwidth will be than populated by a more realistic value.

I never saw this code, but so far I did not see any calculation of speed done in engine code, value reported by vdsm seem to just flow through several layers...

If bond speed is reported by vdsm (it's not when bond is created from dummies for example), then it's stored in vds_interface table. But we're not reading this values from vds_interface, but from active_migration_network_interfaces which does some filtering and if record is not found, null is used. Also, this view is queried for both source and target host; if any of them reports null, speed will be null ~ default one.


So to simplify, iiuc, speed will be reported as null (default speed will be used) when:
a) vdsm does not report it (bond from dummies)
b) source or target host does not have 'active migration network interface', as obtained via:

SELECT *
    FROM active_migration_network_interfaces
    WHERE vds_id = v_host_id;


Tomas, can you confirm, that a) or b) is your case? If not, can you describe your bond?

Comment 4 Meni Yakove 2017-09-12 06:41:01 UTC
This is what we get from DB.
engine=# select * from active_migration_network_interfaces where vds_id='f697e2d1-8535-4b61-8b76-81bb7faad549';
-[ RECORD 1 ]--------+-------------------------------------
rx_rate              | 0.0003
tx_rate              | 0.0000
rx_drop              | 0.0000
tx_drop              | 0.0000
rx_total             | 63992
tx_total             | 496
rx_offset            | -1838162
tx_offset            | -19980
iface_status         | 1
sample_time          | 1505198353.29212
type                 | 0
gateway              | 
ipv4_default_route   | f
ipv6_gateway         | ::
subnet               | 255.255.255.0
ipv6_prefix          | 
addr                 | 1.2.3.4
ipv6_address         | 
speed                | 
base_interface       | 
vlan_id              | 
bond_type            | 
bond_name            | 
is_bond              | t
bond_opts            | mode=4 miimon=100 xmit_hash_policy=2
mac_addr             | 00:e0:ed:33:c0:92
network_name         | migration-network
name                 | bond0
vds_id               | f697e2d1-8535-4b61-8b76-81bb7faad549
vds_name             | host_mixed_1
id                   | 352ba7e2-c5c2-4a1e-be76-86c1967be2da
boot_protocol        | 2
ipv6_boot_protocol   | 0
mtu                  | 1500
bridged              | t
reported_switch_type | legacy
is_vds               | 1
qos_overridden       | f
labels               | 
cluster_id           | 7be9dc31-32be-4fde-a908-7cb10a4994a5
ad_partner_mac       | 00:0a:b8:38:6a:80
ad_aggregator_id     | 3
bond_active_slave    | 

Should 'speed' be the BOND speed?

ovirt-engine-4.2.0-0.0.master.20170911195522.git5ddba50.el7.centos.noarch

Comment 5 Martin Mucha 2017-09-12 07:34:39 UTC
Yes. This might be problem, but need not. Please check host itself, what it reports. This is example from my host:

ethtool bond0
Settings for bond0:
…
	Speed: Unknown!
	Duplex: Unknown! (255)
…

——
for such hosts speed won't be reported(because it can't be). But if host reports it, you called refresh caps to be sure, and then speed is not in vds_interface table or in active_migration_network_interfaces, then it is something which has to be further checked.

Comment 6 Michael Burman 2017-09-12 07:56:22 UTC
[root@camel-vdsa ~]# ethtool bond0
Settings for bond0:
        Supported ports: [ ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 2000Mb/s
        Duplex: Full
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: off
        Link detected: yes

But speed is not reported in active_migration_network_interfaces(see previous comment^^) and in the vds_interface table as well -

-[ RECORD 5 ]--------+-------------------------------------
id                   | 45e2c35a-d132-43f4-924b-061528bb2556
name                 | bond0
network_name         | ip6_migration_n
vds_id               | c005ae05-7015-4a1d-bcdf-ae10c0921aef
mac_addr             | 90:e2:ba:18:de:7c
is_bond              | t
bond_name            | 
bond_type            | 
bond_opts            | mode=4 miimon=100 xmit_hash_policy=2
vlan_id              | 
speed                | 
addr                 | 
subnet               | 
gateway              | 
boot_protocol        | 0
type                 | 0
_create_date         | 2017-08-31 11:12:21.197101+03
_update_date         | 2017-09-12 10:48:44.617346+03
mtu                  | 1500
bridged              | t
labels               | 
qos_overridden       | f
base_interface       | 
ipv6_boot_protocol   | 3
ipv6_address         | 2001::5
ipv6_prefix          | 64
ipv6_gateway         | ::
ad_partner_mac       | 18:ef:63:a1:6c:80
reported_switch_type | legacy
ad_aggregator_id     | 1
bond_active_slave    | 
ipv4_default_route   | f

Comment 7 Martin Mucha 2017-09-12 09:38:21 UTC
Sorry about wrong status, one of the patches wasn't even merged.

Comment 8 Martin Mucha 2017-09-12 10:48:07 UTC
Missing patch was merged to master, and I used burmans host to verify once I already have it — missing java code was added, but when running refresh capabilities, vdsm does not report speed on bonds, only on nics.

Is this error, or should be speed calculated for bonds on java side when processing refresh capabilities response?

Comment 9 Martin Mucha 2017-09-12 10:57:25 UTC
Created attachment 1324830 [details]
supervdsm.log

Comment 10 Michael Burman 2017-09-12 11:04:26 UTC
(In reply to Martin Mucha from comment #8)
> Missing patch was merged to master, and I used burmans host to verify once I
> already have it — missing java code was added, but when running refresh
> capabilities, vdsm does not report speed on bonds, only on nics.
> 
> Is this error, or should be speed calculated for bonds on java side when
> processing refresh capabilities response?

vdsm does report speed for bonds on getStats for a long time - 

 "bond0": {
            "rxErrors": "0", 
            "name": "bond0", 
            "tx": "2566194", 
            "txDropped": "0", 
            "sampleTime": 1499240322.044425, 
            "rx": "3900344926", 
            "txErrors": "0", 
            "state": "up", 
            "speed": "2000"

Comment 11 Martin Mucha 2017-09-12 13:31:35 UTC
I attached supervdsm.log, which I took from your host, where, if I'm looking correctly, there is no speed reported on bond0. And if I am, then it's no surprise, that speed on bond are not available further in processing.

From where does your log excerpt originate? I suppose it's not from supervdsm.log in attachments of this bug, right? It's possible, that 'somewhere' else it is reported, but on host you provided me, it seems from attached log, that vdsm does not report speed on bonds.

Comment 12 Michael Burman 2017-09-12 13:36:01 UTC
(In reply to Martin Mucha from comment #11)
> I attached supervdsm.log, which I took from your host, where, if I'm looking
> correctly, there is no speed reported on bond0. And if I am, then it's no
> surprise, that speed on bond are not available further in processing.
> 
> From where does your log excerpt originate? I suppose it's not from
> supervdsm.log in attachments of this bug, right? It's possible, that
> 'somewhere' else it is reported, but on host you provided me, it seems from
> attached log, that vdsm does not report speed on bonds.

I meant that vdsm-client Host getStats report the bond's speed. not from log.

Comment 13 Martin Mucha 2017-09-12 14:11:36 UTC
What do you think dan, based on attached log — is it a vdsm bug?

Comment 14 Dan Kenigsberg 2017-09-18 10:35:33 UTC
Correct, speed is not reported in getCaps; it a dynamic property that can change often and is thus reported via getVdsStats.

Comment 15 Martin Mucha 2017-09-18 13:26:44 UTC
uhm, ok. The thing is, IIUC, that VDSCommandType.GetStats is *only* used to update columns table of vds_interface_statistics, which are:

rx_rate
tx_rate
rx_drop
tx_drop
iface_status
_update_date
rx_total
rx_offset
tx_total
tx_offset
sample_time

and that's it. You said, that VDSCommandType#GetCapabilities reports speeds of all NICs but bonds for backward compatibility reasons, but that's not true. This is the only way how speed gets from vdsm to vds_interface table, from where it's used by engine.

GetStats are used for updating iterface statistics, and speed is not statistic, it's a property of nic, right? Anyways, GetStats is not designed to store information about interfaces into vds_interface table. We can hack it somewhere into this flow (I see some other things like VdsDynamic update are hacked-in there already), but it does not feel right.

I can be totally wrong, since this is the most spaghetti code hardest to read we have in product. But the fact, that everything but bonds is reported in getCaps and everything but bonds work is nice correlation somewhat implying, that what I'm saying is correct.

So if possible, please report bonds speed in getCaps as well, 'for backward compatibility reasons' ;) Otherwise we have to somehow update existing nics as a part of getStats call, which seems to be never expected (as far as I can understand the code).

Comment 16 Dan Kenigsberg 2017-09-18 20:01:45 UTC
You are probably right about Engine code. Historically, getVdsStats was used to report asynchronously-modifiable data - both statistics and VdsDynamic.

Bond speed belongs there, since it can change without notice (e.g. when one slave goes dark); yet repeated updates must be costly. We'd probably want a notification when speed changes.

Martin, please check if something like https://gerrit.ovirt.org/#/c/81948/ (which should report bond speed in caps) makes speeds available after Refresh Capabilities is manually requested.

Comment 17 Martin Mucha 2017-09-19 14:28:06 UTC
I just edited this file on host and hit refresh capabilities. Speed still does not seem to be reported on bonds...

Comment 18 Dan Kenigsberg 2017-10-09 18:13:26 UTC
(In reply to Martin Mucha from comment #17)
> I just edited this file on host and hit refresh capabilities. Speed still
> does not seem to be reported on bonds...

I've missed this needinfo for 3 weeks, sorry. I don't know why my patch failed, but Edy produced another one https://gerrit.ovirt.org/#/c/82504/.

We have not tested it all together in dev, but would QA agree to test it all for us?

Comment 19 Michael Burman 2017-10-15 05:46:42 UTC
Same result - no speed reported for the bond in DB on the active network interface and for sure not in the UI - 

engine=# select * from active_migration_network_interfaces where vds_id='931c94f3-b452-4597-9989-8d6b48e232cb';
-[ RECORD 1 ]--------+-------------------------------------
rx_rate              | 0.0004
tx_rate              | 0.0000
rx_drop              | 0.0000
tx_drop              | 0.0000
rx_total             | 1361306
tx_total             | 2920
rx_offset            | -13874
tx_offset            | -1408
iface_status         | 1
sample_time          | 1508046231.59942
type                 | 0
gateway              | 
ipv4_default_route   | f
ipv6_gateway         | ::
subnet               | 
ipv6_prefix          | 64
addr                 | 
ipv6_address         | 2001::5
speed                | 
base_interface       | 
vlan_id              | 
bond_type            | 
bond_name            | 
is_bond              | t
bond_opts            | mode=4 miimon=100 xmit_hash_policy=2
mac_addr             | 00:10:18:24:4a:fc
network_name         | ip6_migration_n
name                 | bond0
vds_id               | 931c94f3-b452-4597-9989-8d6b48e232cb
vds_name             | navy-vds1.qa.lab.tlv.redhat.com
id                   | 642646a2-95a1-49af-a203-b2eea1fa696c
boot_protocol        | 0
ipv6_boot_protocol   | 3
mtu                  | 1500
bridged              | t
reported_switch_type | legacy
is_vds               | 1
qos_overridden       | f
labels               | 
cluster_id           | a650acd4-f441-45bb-9a0a-13d81080896e
ad_partner_mac       | 18:ef:63:a1:75:00
ad_aggregator_id     | 3
bond_active_slave

Tested on - 4.2.0-0.0.master.20171013142622.git15e767c.el7.centos and vdsm-4.20.3-176.git6a1c9f4.el7.centos.x86_64

Comment 20 Red Hat Bugzilla Rules Engine 2017-10-15 05:46:48 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 21 Michael Burman 2017-10-31 06:38:41 UTC
Verified on - 4.2.0-0.0.master.20171030210714.gitef6bb9c.el7.centos

Bond has now speed on the active_migration_network_interface and the bond's speed shown in the UI under Host's Network Interfaces.

Comment 22 Sandro Bonazzola 2017-12-20 11:26:02 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 23 Klaas Demter 2018-03-16 06:45:18 UTC
is this getting a 4.1 backport?