If the VM is being migrated, the max bandwidth of the migration is calculated based on the getActiveMigrationNetworkInterfaceForHost. If this interface happens to be a bond, the speed of this bond is set to 0 and the migration max bandwidth than falls back to the default value configured in vdsm.conf. Expected results: The "speed" property will be populated based on the bond type and its nics so the max bandwidth will be than populated by a more realistic value.
vdsm already report speed for bond - "bond0": { "rxErrors": "0", "name": "bond0", "tx": "2566194", "txDropped": "0", "sampleTime": 1499240322.044425, "rx": "3900344926", "txErrors": "0", "state": "up", "speed": "2000", "rxDropped": "17807" We should add this for the engine as well.
(In reply to Tomas Jelinek from comment #0) > If the VM is being migrated, the max bandwidth of the migration is > calculated based on the getActiveMigrationNetworkInterfaceForHost. If this > interface happens to be a bond, the speed of this bond is set to 0 and the > migration max bandwidth than falls back to the default value configured in > vdsm.conf. > > Expected results: > The "speed" property will be populated based on the bond type and its nics > so the max bandwidth will be than populated by a more realistic value. I never saw this code, but so far I did not see any calculation of speed done in engine code, value reported by vdsm seem to just flow through several layers... If bond speed is reported by vdsm (it's not when bond is created from dummies for example), then it's stored in vds_interface table. But we're not reading this values from vds_interface, but from active_migration_network_interfaces which does some filtering and if record is not found, null is used. Also, this view is queried for both source and target host; if any of them reports null, speed will be null ~ default one. So to simplify, iiuc, speed will be reported as null (default speed will be used) when: a) vdsm does not report it (bond from dummies) b) source or target host does not have 'active migration network interface', as obtained via: SELECT * FROM active_migration_network_interfaces WHERE vds_id = v_host_id; Tomas, can you confirm, that a) or b) is your case? If not, can you describe your bond?
This is what we get from DB. engine=# select * from active_migration_network_interfaces where vds_id='f697e2d1-8535-4b61-8b76-81bb7faad549'; -[ RECORD 1 ]--------+------------------------------------- rx_rate | 0.0003 tx_rate | 0.0000 rx_drop | 0.0000 tx_drop | 0.0000 rx_total | 63992 tx_total | 496 rx_offset | -1838162 tx_offset | -19980 iface_status | 1 sample_time | 1505198353.29212 type | 0 gateway | ipv4_default_route | f ipv6_gateway | :: subnet | 255.255.255.0 ipv6_prefix | addr | 1.2.3.4 ipv6_address | speed | base_interface | vlan_id | bond_type | bond_name | is_bond | t bond_opts | mode=4 miimon=100 xmit_hash_policy=2 mac_addr | 00:e0:ed:33:c0:92 network_name | migration-network name | bond0 vds_id | f697e2d1-8535-4b61-8b76-81bb7faad549 vds_name | host_mixed_1 id | 352ba7e2-c5c2-4a1e-be76-86c1967be2da boot_protocol | 2 ipv6_boot_protocol | 0 mtu | 1500 bridged | t reported_switch_type | legacy is_vds | 1 qos_overridden | f labels | cluster_id | 7be9dc31-32be-4fde-a908-7cb10a4994a5 ad_partner_mac | 00:0a:b8:38:6a:80 ad_aggregator_id | 3 bond_active_slave | Should 'speed' be the BOND speed? ovirt-engine-4.2.0-0.0.master.20170911195522.git5ddba50.el7.centos.noarch
Yes. This might be problem, but need not. Please check host itself, what it reports. This is example from my host: ethtool bond0 Settings for bond0: … Speed: Unknown! Duplex: Unknown! (255) … —— for such hosts speed won't be reported(because it can't be). But if host reports it, you called refresh caps to be sure, and then speed is not in vds_interface table or in active_migration_network_interfaces, then it is something which has to be further checked.
[root@camel-vdsa ~]# ethtool bond0 Settings for bond0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 2000Mb/s Duplex: Full Port: Other PHYAD: 0 Transceiver: internal Auto-negotiation: off Link detected: yes But speed is not reported in active_migration_network_interfaces(see previous comment^^) and in the vds_interface table as well - -[ RECORD 5 ]--------+------------------------------------- id | 45e2c35a-d132-43f4-924b-061528bb2556 name | bond0 network_name | ip6_migration_n vds_id | c005ae05-7015-4a1d-bcdf-ae10c0921aef mac_addr | 90:e2:ba:18:de:7c is_bond | t bond_name | bond_type | bond_opts | mode=4 miimon=100 xmit_hash_policy=2 vlan_id | speed | addr | subnet | gateway | boot_protocol | 0 type | 0 _create_date | 2017-08-31 11:12:21.197101+03 _update_date | 2017-09-12 10:48:44.617346+03 mtu | 1500 bridged | t labels | qos_overridden | f base_interface | ipv6_boot_protocol | 3 ipv6_address | 2001::5 ipv6_prefix | 64 ipv6_gateway | :: ad_partner_mac | 18:ef:63:a1:6c:80 reported_switch_type | legacy ad_aggregator_id | 1 bond_active_slave | ipv4_default_route | f
Sorry about wrong status, one of the patches wasn't even merged.
Missing patch was merged to master, and I used burmans host to verify once I already have it — missing java code was added, but when running refresh capabilities, vdsm does not report speed on bonds, only on nics. Is this error, or should be speed calculated for bonds on java side when processing refresh capabilities response?
Created attachment 1324830 [details] supervdsm.log
(In reply to Martin Mucha from comment #8) > Missing patch was merged to master, and I used burmans host to verify once I > already have it — missing java code was added, but when running refresh > capabilities, vdsm does not report speed on bonds, only on nics. > > Is this error, or should be speed calculated for bonds on java side when > processing refresh capabilities response? vdsm does report speed for bonds on getStats for a long time - "bond0": { "rxErrors": "0", "name": "bond0", "tx": "2566194", "txDropped": "0", "sampleTime": 1499240322.044425, "rx": "3900344926", "txErrors": "0", "state": "up", "speed": "2000"
I attached supervdsm.log, which I took from your host, where, if I'm looking correctly, there is no speed reported on bond0. And if I am, then it's no surprise, that speed on bond are not available further in processing. From where does your log excerpt originate? I suppose it's not from supervdsm.log in attachments of this bug, right? It's possible, that 'somewhere' else it is reported, but on host you provided me, it seems from attached log, that vdsm does not report speed on bonds.
(In reply to Martin Mucha from comment #11) > I attached supervdsm.log, which I took from your host, where, if I'm looking > correctly, there is no speed reported on bond0. And if I am, then it's no > surprise, that speed on bond are not available further in processing. > > From where does your log excerpt originate? I suppose it's not from > supervdsm.log in attachments of this bug, right? It's possible, that > 'somewhere' else it is reported, but on host you provided me, it seems from > attached log, that vdsm does not report speed on bonds. I meant that vdsm-client Host getStats report the bond's speed. not from log.
What do you think dan, based on attached log — is it a vdsm bug?
Correct, speed is not reported in getCaps; it a dynamic property that can change often and is thus reported via getVdsStats.
uhm, ok. The thing is, IIUC, that VDSCommandType.GetStats is *only* used to update columns table of vds_interface_statistics, which are: rx_rate tx_rate rx_drop tx_drop iface_status _update_date rx_total rx_offset tx_total tx_offset sample_time and that's it. You said, that VDSCommandType#GetCapabilities reports speeds of all NICs but bonds for backward compatibility reasons, but that's not true. This is the only way how speed gets from vdsm to vds_interface table, from where it's used by engine. GetStats are used for updating iterface statistics, and speed is not statistic, it's a property of nic, right? Anyways, GetStats is not designed to store information about interfaces into vds_interface table. We can hack it somewhere into this flow (I see some other things like VdsDynamic update are hacked-in there already), but it does not feel right. I can be totally wrong, since this is the most spaghetti code hardest to read we have in product. But the fact, that everything but bonds is reported in getCaps and everything but bonds work is nice correlation somewhat implying, that what I'm saying is correct. So if possible, please report bonds speed in getCaps as well, 'for backward compatibility reasons' ;) Otherwise we have to somehow update existing nics as a part of getStats call, which seems to be never expected (as far as I can understand the code).
You are probably right about Engine code. Historically, getVdsStats was used to report asynchronously-modifiable data - both statistics and VdsDynamic. Bond speed belongs there, since it can change without notice (e.g. when one slave goes dark); yet repeated updates must be costly. We'd probably want a notification when speed changes. Martin, please check if something like https://gerrit.ovirt.org/#/c/81948/ (which should report bond speed in caps) makes speeds available after Refresh Capabilities is manually requested.
I just edited this file on host and hit refresh capabilities. Speed still does not seem to be reported on bonds...
(In reply to Martin Mucha from comment #17) > I just edited this file on host and hit refresh capabilities. Speed still > does not seem to be reported on bonds... I've missed this needinfo for 3 weeks, sorry. I don't know why my patch failed, but Edy produced another one https://gerrit.ovirt.org/#/c/82504/. We have not tested it all together in dev, but would QA agree to test it all for us?
Same result - no speed reported for the bond in DB on the active network interface and for sure not in the UI - engine=# select * from active_migration_network_interfaces where vds_id='931c94f3-b452-4597-9989-8d6b48e232cb'; -[ RECORD 1 ]--------+------------------------------------- rx_rate | 0.0004 tx_rate | 0.0000 rx_drop | 0.0000 tx_drop | 0.0000 rx_total | 1361306 tx_total | 2920 rx_offset | -13874 tx_offset | -1408 iface_status | 1 sample_time | 1508046231.59942 type | 0 gateway | ipv4_default_route | f ipv6_gateway | :: subnet | ipv6_prefix | 64 addr | ipv6_address | 2001::5 speed | base_interface | vlan_id | bond_type | bond_name | is_bond | t bond_opts | mode=4 miimon=100 xmit_hash_policy=2 mac_addr | 00:10:18:24:4a:fc network_name | ip6_migration_n name | bond0 vds_id | 931c94f3-b452-4597-9989-8d6b48e232cb vds_name | navy-vds1.qa.lab.tlv.redhat.com id | 642646a2-95a1-49af-a203-b2eea1fa696c boot_protocol | 0 ipv6_boot_protocol | 3 mtu | 1500 bridged | t reported_switch_type | legacy is_vds | 1 qos_overridden | f labels | cluster_id | a650acd4-f441-45bb-9a0a-13d81080896e ad_partner_mac | 18:ef:63:a1:75:00 ad_aggregator_id | 3 bond_active_slave Tested on - 4.2.0-0.0.master.20171013142622.git15e767c.el7.centos and vdsm-4.20.3-176.git6a1c9f4.el7.centos.x86_64
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Verified on - 4.2.0-0.0.master.20171030210714.gitef6bb9c.el7.centos Bond has now speed on the active_migration_network_interface and the bond's speed shown in the UI under Host's Network Interfaces.
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
is this getting a 4.1 backport?