Created attachment 1186664 [details] Logs Description of problem: [OVS] - Not possible to sync the ovirtmgmt network once moving from legacy type cluster to ovs type cluster. When moving host from legacy type cluster to ovs type, the management network is out-of-sync because of the legacy/ovs deference, when we trying to sync the management network(press the 'Sync Al Networks' button), the switch type synced with success, but now another parameter is out-of-sync and it's the host QoS. engine report that on the host there is no longer 50 ls configured, although there was before moving from legacy type to ovs. caps on legacy cluster: networks = {'ovirtmgmt': {'addr': '10.35.128.23', 'bridged': True, 'cfg': {'BOOTPROTO': 'dhcp', 'DEFROUTE': 'yes', 'DELAY': '0', 'DEVICE': 'ovirtmgmt', 'IPV6INIT': 'no', 'MTU': '1500', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'STP': 'off', 'TYPE': 'Bridge'}, 'dhcpv4': True, 'dhcpv6': False, 'gateway': '10.35.128.254', 'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 50}}}, 'iface': 'ovirtmgmt', 'ipv4addrs': ['10.35.128.23/24'], 'ipv6addrs': [], 'ipv6autoconf': False, 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '255.255.255.0', 'ports': ['ens1f0'], 'stp': 'off', 'switch': 'legacy'}} caps after moving to ovs type cluster and before syncing the switch type parameter: networks = {'ovirtmgmt': {'addr': '10.35.128.23', 'bridged': True, 'cfg': {'BOOTPROTO': 'dhcp', 'DEFROUTE': 'yes', 'DELAY': '0', 'DEVICE': 'ovirtmgmt', 'IPV6INIT': 'no', 'MTU': '1500', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'STP': 'off', 'TYPE': 'Bridge'}, 'dhcpv4': True, 'dhcpv6': False, 'gateway': '10.35.128.254', 'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 50}}}, 'iface': 'ovirtmgmt', 'ipv4addrs': ['10.35.128.23/24'], 'ipv6addrs': [], 'ipv6autoconf': False, 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '255.255.255.0', 'ports': ['ens1f0'], 'stp': 'off', 'switch': 'legacy'}} caps after syncing the switch type parameter: networks = {'ovirtmgmt': {'addr': '10.35.128.23', 'bond': '', 'bridged': True, 'dhcpv4': True, 'dhcpv6': False, 'gateway': '10.35.128.254', 'iface': 'ovirtmgmt', 'ipv4addrs': ['10.35.128.23/24'], 'ipv6addrs': [], 'ipv6autoconf': False, 'ipv6gateway': '::', 'mtu': 1500, 'netmask': '255.255.255.0', 'nics': ['ens1f0'], 'ports': ['ens1f0'], 'stp': False, 'switch': 'ovs'}} the host QoS parameter is no longer reported in caps and this is why network considered as out-of-sync Version-Release number of selected component (if applicable): 4.0.2.3-0.1.el7ev vdsm-4.18.9-1.el7ev.x86_64 openvswitch-2.4.1-1.git20160727.el7_2.x86_64 How reproducible: 100 Steps to Reproduce: 1. Add host to rhev-m 4.0.2 to legacy type cluster 2. Create ovs type cluster and move the host to this cluster(maintenance) 3. Sync the network/s on host Actual results: ovirtmgmt network is still reported as out-of-sync, cause of the host QoS(ls=50) parameter Expected results: management network should be synced
When moving the host back to legacy type cluster we still can't sync back the host QoS parameter.
This is seems to be the lack of host QoS support in the native ovs feature.
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Downstream clone: bz#1377912
*** Bug 1377912 has been marked as a duplicate of this bug. ***
Trying to sync ovirtmgmt network once moving from legacy to ovs ends up with no management network and no ip on the host every second attempt. Happen on 2 different servers. - Looks like trying to sync the management network ends up with no connection to the hots. Host lost it's ip and can't be reached out. Sometimes it works and sometimes we loosing connectivity during the sync attempt. This must get improved. Error while executing action SyncAllHostNetworks: Network error during communication with the Host. 2016-12-05 15:28:54,085+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (default task-4) [6d19735f] Error: VDSGenericException: VDSNetworkException: Vds timeout occured 2016-12-05 15:28:54,085+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (default task-4) [6d19735f] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Vds timeout occured 2016-12-05 15:26:14,192 ERROR (jsonrpc/1) [jsonrpc.JsonRpcServer] Internal server error (__init__:552) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod result = fn(*methodArgs) File "/usr/share/vdsm/API.py", line 1522, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 53, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 51, in <lambda> **kwargs) File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) IOError: [Errno 2] No such file or directory: u'/sys/class/net/ovirtmgmt/mtu' 2016-12-05 15:26:14,193 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.setupNetworks failed (error -32603) in 20.60 seconds (__init__:515) Moving back to assigned as this can't be verified at this point with current behavior.
Created attachment 1228307 [details] Logs
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
verification failed because we did not make sure we use ovs with a fix for bug 1397050. I hope we can retry in the future, but currently this bug is deferred.
New and fresh logs tested on latest rhv 4.2 - 4.2.0.2-0.1.el7 vdsm-4.20.9.3-1.el7ev openvswitch-2.7.3-2.git20171010.el7fdp.x86_64
- Add host to legacy type cluster - Move host to ovs type cluster - ovirtmgmt is out of sync - ovirtmgmt must be synced - we not sending setup networks to sync the network if moving from legacy to ovs. Trying to sync it manually, will result as failure on a SPM host and might success on a non-SPM host. Any how, if moving from legacy to ovs type we must sync the network and make it ovs bridge on the host.
Created attachment 1375147 [details] fresh logs
please add supervdsm.log. I see in vdsm.log that ovirtmgmt has successfully moved to ovs. Can you point me to the failure, or attach logs from an SPM host? 2018-01-01 14:16:01,221+0200 .... {u'ovirtmgmt': {'iface': u'ovirtmgmt', 'ipv6autoconf': True, 'addr': '10.35.128.15', 'nics': [u'enp4s0'], 'dhcpv6': False, 'ipv6addrs': [], 'switch': 'ovs', 'bridged': True, 'mtu': 1500, 'dhcpv4': True, 'netmask': '255.255.255.0', 'ipv4defaultroute': True, 'stp': False, 'ipv4addrs': ['10.35.128.15/24'], 'bond': '', 'ipv6gateway': '::', 'gateway': '10.35.128.254', 'ports': [u'enp4s0']}} please note that we have never expected seamless upgrade to ovs: a manual sync was always part of the deal.
(In reply to Dan Kenigsberg from comment #16) > please add supervdsm.log. > > I see in vdsm.log that ovirtmgmt has successfully moved to ovs. Can you > point me to the failure, or attach logs from an SPM host? > > 2018-01-01 14:16:01,221+0200 .... {u'ovirtmgmt': {'iface': u'ovirtmgmt', > 'ipv6autoconf': True, 'addr': '10.35.128.15', 'nics': [u'enp4s0'], 'dhcpv6': > False, 'ipv6addrs': [], 'switch': 'ovs', 'bridged': True, 'mtu': 1500, > 'dhcpv4': True, 'netmask': '255.255.255.0', 'ipv4defaultroute': True, 'stp': > False, 'ipv4addrs': ['10.35.128.15/24'], 'bond': '', 'ipv6gateway': '::', > 'gateway': '10.35.128.254', 'ports': [u'enp4s0']}} > > please note that we have never expected seamless upgrade to ovs: a manual > sync was always part of the deal. Error while executing action SyncAllHostNetworks: Network error during communication with the Host. Trying to sync this on a SPM host will make host to loose connection with engine. 100% Even it the change eventually done on the host, in the engine the host always loosing connection and became non-responsive for a minute or so. Engine report that he can't sync the network. - btw if you want to support upgrade then why i need to sync it manually? you should sync it for me.
Note that after few minutes host become non-operational, the ovirtmgmt is gone from the host. MainProcess|jsonrpc/2::ERROR::2018-01-01 15:56:03,066::supervdsm_server::98::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 96, in wrapper res = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 215, in setupNetworks _change_switch_type(networks, bondings, options, running_config) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 257, in _change_switch_type networks, bondings, options, in_rollback) File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 153, in _rollback six.reraise(excType, value, tb) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 136, in _rollback yield File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 257, in _change_switch_type networks, bondings, options, in_rollback) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 142, in setup _setup_ovs(ovs_nets, ovs_bonds, options, in_rollback) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 206, in _setup_ovs connectivity.check(options) File "/usr/lib/python2.7/site-packages/vdsm/network/netconfpersistence.py", line 239, in __exit__ raise ne.RollbackIncomplete(config_diff, ex_type, ex_value) ConfigNetworkError: (10, 'connectivity check failed')
Created attachment 1375213 [details] supervdsm on spm host_host lost connection and stay non-operational Trying to sync ovirtmgmt when moving from legacy to ovs, ends up when host lost connection and ovirtmgmt is gone from the host
To summarize this: If you try to sync the SPM host ovs<>legacy you will end up with non-operational host for few minutes and the the rollback will bring you back to your start point. sync will fail and host will remain as legacy type. Same result will be for the vise-versa direction from ovs>legacy In buttom line, you can't sync SPM host no matter what
We will only support setting the switch type in cluster creation.
>We will only support setting the switch type in cluster creation. That much is clear, but how do you migrate hosts into OVS switched clusters?