Bug 1362393 - [OVS] SPM host: Not possible to sync the ovirtmgmt network once moving from legacy type cluster to ovs type cluster
Summary: [OVS] SPM host: Not possible to sync the ovirtmgmt network once moving from l...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: ---
Hardware: x86_64
OS: Linux
medium
high with 1 vote
Target Milestone: ---
: ---
Assignee: Petr Horáček
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On: 1346232 1397050
Blocks: OpenVswitch_Support
TreeView+ depends on / blocked
 
Reported: 2016-08-02 06:33 UTC by Michael Burman
Modified: 2022-06-27 08:01 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
: 1377912 (view as bug list)
Environment:
Last Closed: 2018-01-03 11:27:05 UTC
oVirt Team: Network
Embargoed:
sbonazzo: ovirt-4.3-


Attachments (Terms of Use)
Logs (1.08 MB, application/x-gzip)
2016-08-02 06:33 UTC, Michael Burman
no flags Details
Logs (706.33 KB, application/x-gzip)
2016-12-06 07:57 UTC, Michael Burman
no flags Details
fresh logs (79.40 KB, application/x-gzip)
2018-01-01 12:29 UTC, Michael Burman
no flags Details
supervdsm on spm host_host lost connection and stay non-operational (57.77 KB, application/x-gzip)
2018-01-01 14:00 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-46603 0 None None None 2022-06-27 08:00:59 UTC

Description Michael Burman 2016-08-02 06:33:05 UTC
Created attachment 1186664 [details]
Logs

Description of problem:
[OVS] - Not possible to sync the ovirtmgmt network once moving from legacy type cluster to ovs type cluster.

When moving host from legacy type cluster to ovs type, the management network is out-of-sync because of the legacy/ovs deference, when we trying to sync the management network(press the 'Sync Al Networks' button), the switch type synced with success, but now another parameter is out-of-sync and it's the host QoS.
engine report that on the host there is no longer 50 ls configured, although there was before moving from legacy type to ovs.   

caps on legacy cluster:
networks = {'ovirtmgmt': {'addr': '10.35.128.23',
                                  'bridged': True,
                                  'cfg': {'BOOTPROTO': 'dhcp',
                                          'DEFROUTE': 'yes',
                                          'DELAY': '0',
                                          'DEVICE': 'ovirtmgmt',
                                          'IPV6INIT': 'no',
                                          'MTU': '1500',
                                          'NM_CONTROLLED': 'no',
                                          'ONBOOT': 'yes',
                                          'STP': 'off',
                                          'TYPE': 'Bridge'},
                                  'dhcpv4': True,
                                  'dhcpv6': False,
                                  'gateway': '10.35.128.254',
                                  'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 50}}},
                                  'iface': 'ovirtmgmt',
                                  'ipv4addrs': ['10.35.128.23/24'],
                                  'ipv6addrs': [],
                                  'ipv6autoconf': False,
                                  'ipv6gateway': '::',
                                  'mtu': '1500',
                                  'netmask': '255.255.255.0',
                                  'ports': ['ens1f0'],
                                  'stp': 'off',
                                  'switch': 'legacy'}}

caps after moving to ovs type cluster and before syncing the switch type parameter:

networks = {'ovirtmgmt': {'addr': '10.35.128.23',
                                  'bridged': True,
                                  'cfg': {'BOOTPROTO': 'dhcp',
                                          'DEFROUTE': 'yes',
                                          'DELAY': '0',
                                          'DEVICE': 'ovirtmgmt',
                                          'IPV6INIT': 'no',
                                          'MTU': '1500',
                                          'NM_CONTROLLED': 'no',
                                          'ONBOOT': 'yes',
                                          'STP': 'off',
                                          'TYPE': 'Bridge'},
                                  'dhcpv4': True,
                                  'dhcpv6': False,
                                  'gateway': '10.35.128.254',
                                  'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 50}}},
                                  'iface': 'ovirtmgmt',
                                  'ipv4addrs': ['10.35.128.23/24'],
                                  'ipv6addrs': [],
                                  'ipv6autoconf': False,
                                  'ipv6gateway': '::',
                                  'mtu': '1500',
                                  'netmask': '255.255.255.0',
                                  'ports': ['ens1f0'],
                                  'stp': 'off',
                                  'switch': 'legacy'}}

caps after syncing the switch type parameter:

networks = {'ovirtmgmt': {'addr': '10.35.128.23',
                                  'bond': '',
                                  'bridged': True,
                                  'dhcpv4': True,
                                  'dhcpv6': False,
                                  'gateway': '10.35.128.254',
                                  'iface': 'ovirtmgmt',
                                  'ipv4addrs': ['10.35.128.23/24'],
                                  'ipv6addrs': [],
                                  'ipv6autoconf': False,
                                  'ipv6gateway': '::',
                                  'mtu': 1500,
                                  'netmask': '255.255.255.0',
                                  'nics': ['ens1f0'],
                                  'ports': ['ens1f0'],
                                  'stp': False,
                                  'switch': 'ovs'}}

the host QoS parameter is no longer reported in caps and this is why network considered as out-of-sync

Version-Release number of selected component (if applicable):
4.0.2.3-0.1.el7ev
vdsm-4.18.9-1.el7ev.x86_64
openvswitch-2.4.1-1.git20160727.el7_2.x86_64

How reproducible:
100

Steps to Reproduce:
1. Add host to rhev-m 4.0.2 to legacy type cluster
2. Create ovs type cluster and move the host to this cluster(maintenance)  
3. Sync the network/s on host

Actual results:
ovirtmgmt network is still reported as out-of-sync, cause of the host QoS(ls=50) parameter

Expected results:
management network should be synced

Comment 1 Michael Burman 2016-08-02 06:47:38 UTC
When moving the host back to legacy type cluster we still can't sync back the host QoS parameter.

Comment 2 Michael Burman 2016-08-02 11:14:19 UTC
This is seems to be the lack of host QoS support in the native ovs feature.

Comment 3 Red Hat Bugzilla Rules Engine 2016-08-04 12:41:16 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 7 Marina Kalinin 2016-11-04 19:48:56 UTC
Downstream clone: bz#1377912

Comment 8 Yaniv Kaul 2016-11-21 07:13:15 UTC
*** Bug 1377912 has been marked as a duplicate of this bug. ***

Comment 9 Michael Burman 2016-12-06 07:50:31 UTC
Trying to sync ovirtmgmt network once moving from legacy to ovs ends up with no management network and no ip on the host every second attempt.
Happen on 2 different servers. 

- Looks like trying to sync the management network ends up with no connection to the hots. Host lost it's ip and can't be reached out. Sometimes it works and sometimes we loosing connectivity during the sync attempt.
This must get improved.  

Error while executing action SyncAllHostNetworks: Network error during communication with the Host.

2016-12-05 15:28:54,085+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (default task-4) [6d19735f] Error: VDSGenericException: VDSNetworkException: Vds timeout occured
2016-12-05 15:28:54,085+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (default task-4) [6d19735f] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Vds timeout occured


2016-12-05 15:26:14,192 ERROR (jsonrpc/1) [jsonrpc.JsonRpcServer] Internal server error (__init__:552)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in _handle_request
    res = method(**params)
  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod
    result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1522, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 53, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 51, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
IOError: [Errno 2] No such file or directory: u'/sys/class/net/ovirtmgmt/mtu'
2016-12-05 15:26:14,193 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.setupNetworks failed (error -32603) in 20.60 seconds (__init__:515)

Moving back to assigned as this can't be verified at this point with current behavior.

Comment 10 Michael Burman 2016-12-06 07:57:29 UTC
Created attachment 1228307 [details]
Logs

Comment 11 Red Hat Bugzilla Rules Engine 2016-12-12 12:37:12 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 12 Dan Kenigsberg 2017-01-04 11:23:34 UTC
verification failed because we did not make sure we use ovs with a fix for bug 1397050.

I hope we can retry in the future, but currently this bug is deferred.

Comment 13 Michael Burman 2018-01-01 12:11:15 UTC
New and fresh logs tested on latest rhv 4.2 - 4.2.0.2-0.1.el7
vdsm-4.20.9.3-1.el7ev
openvswitch-2.7.3-2.git20171010.el7fdp.x86_64

Comment 14 Michael Burman 2018-01-01 12:27:37 UTC
- Add host to legacy type cluster
- Move host to ovs type cluster
- ovirtmgmt is out of sync

- ovirtmgmt must be synced - we not sending setup networks to sync the network if moving from legacy to ovs.

Trying to sync it manually, will result as failure on a SPM host and might success on a non-SPM host. 

Any how, if moving from legacy to ovs type we must sync the network and make it ovs bridge on the host.

Comment 15 Michael Burman 2018-01-01 12:29:04 UTC
Created attachment 1375147 [details]
fresh logs

Comment 16 Dan Kenigsberg 2018-01-01 13:32:47 UTC
please add supervdsm.log.

I see in vdsm.log that ovirtmgmt has successfully moved to ovs. Can you point me to the failure, or attach logs from an SPM host?

2018-01-01 14:16:01,221+0200 .... {u'ovirtmgmt': {'iface': u'ovirtmgmt', 'ipv6autoconf': True, 'addr': '10.35.128.15', 'nics': [u'enp4s0'], 'dhcpv6': False, 'ipv6addrs': [], 'switch': 'ovs', 'bridged': True, 'mtu': 1500, 'dhcpv4': True, 'netmask': '255.255.255.0', 'ipv4defaultroute': True, 'stp': False, 'ipv4addrs': ['10.35.128.15/24'], 'bond': '', 'ipv6gateway': '::', 'gateway': '10.35.128.254', 'ports': [u'enp4s0']}}

please note that we have never expected seamless upgrade to ovs: a manual sync was always part of the deal.

Comment 17 Michael Burman 2018-01-01 13:56:50 UTC
(In reply to Dan Kenigsberg from comment #16)
> please add supervdsm.log.
> 
> I see in vdsm.log that ovirtmgmt has successfully moved to ovs. Can you
> point me to the failure, or attach logs from an SPM host?
> 
> 2018-01-01 14:16:01,221+0200 .... {u'ovirtmgmt': {'iface': u'ovirtmgmt',
> 'ipv6autoconf': True, 'addr': '10.35.128.15', 'nics': [u'enp4s0'], 'dhcpv6':
> False, 'ipv6addrs': [], 'switch': 'ovs', 'bridged': True, 'mtu': 1500,
> 'dhcpv4': True, 'netmask': '255.255.255.0', 'ipv4defaultroute': True, 'stp':
> False, 'ipv4addrs': ['10.35.128.15/24'], 'bond': '', 'ipv6gateway': '::',
> 'gateway': '10.35.128.254', 'ports': [u'enp4s0']}}
> 
> please note that we have never expected seamless upgrade to ovs: a manual
> sync was always part of the deal.

Error while executing action SyncAllHostNetworks: Network error during communication with the Host. Trying to sync this on a SPM host will make host to loose connection with engine. 100%


Even it the change eventually done on the host, in the engine the host always loosing connection and became non-responsive for a minute or so. Engine report that he can't sync the network.

- btw if you want to support upgrade then why i need to sync it manually? you should sync it for me.

Comment 18 Michael Burman 2018-01-01 13:58:38 UTC
Note that after few minutes host become non-operational, the ovirtmgmt is gone from the host.

MainProcess|jsonrpc/2::ERROR::2018-01-01 15:56:03,066::supervdsm_server::98::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 96, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 215, in setupNetworks
    _change_switch_type(networks, bondings, options, running_config)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 257, in _change_switch_type
    networks, bondings, options, in_rollback)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 153, in _rollback
    six.reraise(excType, value, tb)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 136, in _rollback
    yield
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 257, in _change_switch_type
    networks, bondings, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 142, in setup
    _setup_ovs(ovs_nets, ovs_bonds, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 206, in _setup_ovs
    connectivity.check(options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netconfpersistence.py", line 239, in __exit__
    raise ne.RollbackIncomplete(config_diff, ex_type, ex_value)
ConfigNetworkError: (10, 'connectivity check failed')

Comment 19 Michael Burman 2018-01-01 14:00:57 UTC
Created attachment 1375213 [details]
supervdsm on spm host_host lost connection and stay non-operational

Trying to sync ovirtmgmt when moving from legacy to ovs, ends up when host lost connection and ovirtmgmt is gone from the host

Comment 20 Michael Burman 2018-01-01 14:34:34 UTC
To summarize this:
If you try to sync the SPM host ovs<>legacy you will end up with non-operational host for few minutes and the the rollback will bring you back to your start point. sync will fail and host will remain as legacy type.

Same result will be for the vise-versa direction from ovs>legacy

In buttom line, you can't sync SPM host no matter what

Comment 21 Yaniv Lavi 2018-01-03 11:27:05 UTC
We will only support setting the switch type in cluster creation.

Comment 22 Mike Goodwin 2018-07-31 22:18:32 UTC
>We will only support setting the switch type in cluster creation.

That much is clear, but how do you migrate hosts into OVS switched clusters?


Note You need to log in before you can comment on or make changes to this bug.