Bug 1379115

Summary: [OVS] Use Linux bonds with OVS networks (instead of OVS Bonds)
Product: [oVirt] vdsm Reporter: Edward Haas <edwardh>
Component: GeneralAssignee: Petr Horáček <phoracek>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.18.0CC: bugs, danken, mburman, phoracek, ylavi
Target Milestone: ovirt-4.1.0-alphaFlags: rule-engine: ovirt-4.1+
Target Release: 4.19.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-01 14:49:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1195208    

Description Edward Haas 2016-09-25 06:19:51 UTC
The integrated OVS implementation needs to use the Linux Bond instead of the OVS Bond.

OVS bonds have several major limitations which brings us to use Linux
bonds instead.

Some Known Limitations with OVS bonds:
- Unable to apply QoS rules.
- Does not support all bond mode options (compared to the Linux bond).

Comment 1 Michael Burman 2016-10-05 14:43:13 UTC
We not there yet.

- Add host to ovs cluster over bond - failed
- Create bond and attach network to the bond on ovs cluster - failed.
Bond is broken and network didn't attached to the host. 

Please contact me and i will provide the env for further investigation.

Comment 2 Red Hat Bugzilla Rules Engine 2016-10-05 14:43:17 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 Michael Burman 2016-11-09 15:27:36 UTC
Some more critical scenarios should be fixed regarding bond+ovs - 

[1] - Currently bond mode options aren't implement yet. When creating the bond it ends up as mode=0 always and vm networks can't be attached to him.

[2] - vdsm can't start after reboot - it failing to restore the bond

Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/share/vdsm/vdsm-restore-net-config", line 479, in <module>
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: restore(args)
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/share/vdsm/vdsm-restore-net-config", line 442, in restore
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: unified_restoration()
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/share/vdsm/vdsm-restore-net-config", line 134, in unified_restoration
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: changed_config = _filter_changed_nets_bonds(available_config)
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/share/vdsm/vdsm-restore-net-config", line 261, in _filter_changed_nets_bonds
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: kernel_config = kernelconfig.KernelConfig(NetInfo(netswitch.netinfo()))
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 308, in netinfo
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: ovs_netinfo, _netinfo, bridgeless_ovs_nets)
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/lib/python2.7/site-packages/vdsm/network/ovs/info.py", line 298, in fake_bridgeless
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: devtype_netinfo[iface_name].update(_shared_net_attrs(net_attrs))
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: KeyError: u'bond0'
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: Traceback (most recent call last):
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/bin/vdsm-tool", line 219, in main
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: return tool_command[cmd]["command"](*args)
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 41, in restore_command
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: exec_restore(cmd)
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: File "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 54, in exec_restore
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: raise EnvironmentError('Failed to restore the persisted networks')
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com vdsm-tool[13979]: EnvironmentError: Failed to restore the persisted networks
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com systemd[1]: vdsm-network.service: main process exited, code=exited, status=1/FAILURE
Nov 03 12:43:43 camel-vdsa.qa.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager network restoration.

Port "bond0"
            Interface "bond0"
                error: "could not open network device bond0 (No such device)"


[3] - vdsm generates multiple comments in ifcfg-* of NM_CONTROLLED=no and ONBOOT=no. As well it generates ONBOOT=no although it is already there. 

[root@zeus-vds1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-enp4s0
# This device is now owned by VDSM.
# Please do not do any changes here while the device is used by VDSM.
# Once it is detached from VDSM, remove this prefix before applying
# any changes.
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=enp4s0
UUID=2d0f6519-51c8-4421-927f-0832f68074a9
DEVICE=enp4s0
ONBOOT=no
NM_CONTROLLED=no  # Set by VDSM
ONBOOT=no  # Set by VDSM
NM_CONTROLLED=no  # Set by VDSM
ONBOOT=no  # Set by VDSM
NM_CONTROLLED=no  # Set by VDSM
ONBOOT=no  # Set by VDSM
NM_CONTROLLED=no  # Set by VDSM
ONBOOT=no  # Set by VDSM
NM_CONTROLLED=no  # Set by VDSM
ONBOOT=no  # Set by VDSM
NM_CONTROLLED=no  # Set by VDSM
ONBOOT=no  # Set by VDSM
NM_CONTROLLED=no  # Set by VDSM
ONBOOT=no  # Set by VDSM

Comment 4 Dan Kenigsberg 2016-12-05 09:08:21 UTC
Can be tested on 4.1.alpha builds.

Comment 5 Michael Burman 2016-12-11 16:13:59 UTC
Tested on - 4.1.0-0.2.master.20161210231201.git26a385e.el7.centos and vdsm-4.18.999-1128.git6b50e40.el7.centos

Scenarios that PASS:
[1] - Create ovs bond
[2] - Attach network to bond
[3] - Set static ip + prefix/netmask 
[4] - Change bond mode
[5] - Host survive reboot

Scenario that FAILED:
[1] - Add host over bond to ovs cluster is failed

Dan, how would you like to go on with this? do you want separate bug for the scenario that failed? or keep it here?

Comment 6 Michael Burman 2016-12-11 16:21:50 UTC
Another issue that relevant is:

get this error once trying to move host with bond from ovs to legacy:

2016-12-11 18:16:46,765 ERROR (jsonrpc/6) [vds] All bondings must be reconfigured on switch type change (API:1526)
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1523, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 53, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 51, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
ConfigNetworkError: (21, 'All bondings must be reconfigured on switch type change')

Comment 7 Michael Burman 2016-12-12 06:46:35 UTC
This is correct for both directions 
ovs >> legacy
legacy >> ovs

Error while executing action SyncAllHostNetworks: Illegal Network parameters

I believe we should fail it at the moment.

Comment 8 Michael Burman 2016-12-12 09:49:42 UTC
As agreed with Dan, this bug can be considered as verified. 

- The add host over bond scenario will covered in a new bug.

- ConfigNetworkError: (21, 'All bondings must be reconfigured on switch type change') issue will be handled by BZ 1362399

Verified on 4.1.0-0.2.master.20161210231201.git26a385e.el7.centos

Comment 9 Sandro Bonazzola 2016-12-12 11:04:31 UTC
This bug is targeted 4.1 but it appears that the fix has been included in 4.0.6. Please crosscheck and re-target if it's fixed in 4.0.6.

Comment 10 Michael Burman 2016-12-12 11:13:17 UTC
Sandro

We not testing ovs on 4.0.6
This bug will be tested only on 4.1. Thanks

Comment 11 Dan Kenigsberg 2016-12-26 15:03:02 UTC
Native OvS feature has failed to reach 4.1 (and let alone 4.0.6) even though this specific bug is fixed and verified.