Bug 1459229

Summary: Interface matching regular expression ignores interfaces with a '-' in the name
Product: [oVirt] ovirt-ansible-collection Reporter: Artyom <alukiano>
Component: hosted-engine-setupAssignee: Asaf Rachmani <arachman>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: arachman, bugs, danken, dfediuck, didi, mailinglists, mavital, pagranat, stirabos
Target Milestone: ovirt-4.4.0Keywords: Triaged
Target Release: 1.0.35Flags: sbonazzo: ovirt-4.4?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-ansible-hosted-engine-setup-1.0.35 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 20:00:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1776302    
Bug Blocks: 1452243, 1455606    
Attachments:
Description Flags
ovirt-hosted-engine-setup log
none
hosted-engine-setup.tar
none
ovirt-hosted-engine-setup log none

Description Artyom 2017-06-06 15:01:09 UTC
Created attachment 1285442 [details]
ovirt-hosted-engine-setup log

Description of problem:
Interface matching regular expression ignores interfaces with a '-' in the name so a bond named 'nm-bond1.122' will be ignored due to the '-' in its name.
The user instead could configure a device with that name from NetworkManager/Cockpit UI.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.1.2-2.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create bond on the host with -
my-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 00:14:5e:dd:05:55 brd ff:ff:ff:ff:ff:ff
    inet 10.35.72.13/24 brd 10.35.72.255 scope global dynamic my-bond
...
2. Run hosted-engine deploy
3.

Actual results:
Deployment fails with the traceback
2017-06-05 09:49:20 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.execute:926 execute-output: ('/sbin/ip', 'addr', 'show', 'my-bond') stderr:


2017-06-05 09:49:20 DEBUG otopi.plugins.gr_he_common.vm.cloud_init cloud_init._getMyIPAddress:132 address: None
2017-06-05 09:49:20 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 781, in _customize_vm_networking
    self._customize_vm_addressing()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 215, in _customize_vm_addressing
    my_ip = self._getMyIPAddress()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 136, in _getMyIPAddress
    _('Cannot acquire nic/bridge address')
RuntimeError: Cannot acquire nic/bridge address
2017-06-05 09:49:20 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Environment customization': Cannot acquire nic/bridge address

Expected results:
Deployment succeeds

Additional info:

Problem in the regular expression
_INET_ADDRESS_RE = re.compile(
        flags=re.VERBOSE,
        pattern=r"""
            \s+
            inet
            \s
            (?P<address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/\d{1,2})
            .+
            \s+
            (?P<interface>[a-zA-Z0-9_.]+)
            $
    """
    )

Comment 1 Artyom 2017-06-15 12:13:59 UTC
Verified on ovirt-hosted-engine-setup-2.1.3-1.el7ev.noarch

It is looking fine from hosted-engine perspective now, but guys from network team said that bond name with '-' not supported by VDSM and engine.
See the error:
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 388, in _setupNetworks
    'message: "%s"' % (networks, code, message))
RuntimeError: Failed to setup networks {'ovirtmgmt': {'bonding': 'my-bond', 'bootproto': 'dhcp', 'blockingdhcp': True, 'defaultRoute': True}}. Error code: "25" message: "u'my-bond' is not a valid bonding device name"

Comment 2 Red Hat Bugzilla Rules Engine 2017-07-05 11:58:35 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 Yaniv Kaul 2017-08-04 16:16:44 UTC
Is this on track to get into 4.1.5?

Comment 4 Simone Tiraboschi 2017-08-07 09:42:00 UTC
(In reply to Yaniv Kaul from comment #3)
> Is this on track to get into 4.1.5?

We fixed this once and reverted as for https://bugzilla.redhat.com/show_bug.cgi?id=1467733
Now we have a patch ( https://gerrit.ovirt.org/#/c/79012/ ) to properly fix it on hosted-engine-setup side but an interface name with a dash will still cause an issue on vdsm side as for https://bugzilla.redhat.com/show_bug.cgi?id=1459229#c1 so we are still not ready to merge it.

Comment 5 Sandro Bonazzola 2017-11-24 15:07:38 UTC
(In reply to Simone Tiraboschi from comment #4)
> (In reply to Yaniv Kaul from comment #3)
> > Is this on track to get into 4.1.5?
> 
> We fixed this once and reverted as for
> https://bugzilla.redhat.com/show_bug.cgi?id=1467733
> Now we have a patch ( https://gerrit.ovirt.org/#/c/79012/ ) to properly fix
> it on hosted-engine-setup side but an interface name with a dash will still
> cause an issue on vdsm side as for
> https://bugzilla.redhat.com/show_bug.cgi?id=1459229#c1 so we are still not
> ready to merge it.

Please open a bug on VDSM about this and make it blocking this one.

Comment 6 Sam McLeod 2018-01-07 23:32:13 UTC
FYI - I recently posted to the mailing list regarding a similar issue where our bonds are all named after the network / VLAN they exist on, I'll add my comments here to avoid creating another bug.

---

I'm having a problem where when setting up hosted engine deployment it fails stating that the selected bond name is bad.

"code=25, message=bad bond name(s): mgmt)"

- Is there a problem similar to https://bugzilla.redhat.com/show_bug.cgi?id=1519807 that's known?
- If it seems to be this bug, is it preferred that I simply update the existing, closed issue as I have done, or open a new bug?

---

--> install logs provided to Doron Fediuck and a few others at Redhat

---

Dan Kenigsberg from Redhat:

"I see that you are trying to use a bond interface named "mgmt".
To avoid confusion while debugging a system, Vdsm has opted to allow
only bond names starting with "bond" followed by one or more decimal
digits. Anything else is considered "bad bond".

I prefer keeping the "bond" prefix compulsory, but I'd like to hear
why using different names is useful.

You can reopen this bug, but please move it to vdsm and rename it: it
should be something like "Allow any bondXYZ name for bonds" or "Allow
any bond name" and explain there why it is a good idea.

Dominik, is there an Engine-side limitation on bond names?"

---

Yedidyah Bar David from Redhat:

"Please note that this is just but one bug in a series/tree of
related bugs, some of which are open. If you decide to follow
Dan's suggestion, perhaps reuse one of the others, or perhaps
even better - open a new one, and eventually one or more will
be closed as duplicate of one or more of the others. Sadly,
not all of them link properly to each other, and at least one
which was fixed caused another bug, so the fix was reverted.
See also e.g. all of the discussion in:

https://bugzilla.redhat.com/show_bug.cgi?id=1459229"

Comment 7 Sam McLeod 2018-01-07 23:36:13 UTC
By the way, it is very useful to name a bonded interface things other than bondXYZ, for example, you might have 6 bonds, each of a different network or native VLAN.

It helps with debugging, troubleshooting and logging if the interface is named after the (native) network, e.g. your iSCSI storage network might have a bond called 'storage', while your management or hypervisor network might have a bond named 'mgmt' then perhaps you have 'data' bond that might have several vlans off it such as 'db' (database), 'dmz', 'staff' etc... depending on how and where you chop your network up.

Comment 8 Sam McLeod 2018-01-17 22:41:59 UTC
FYI - Someone on #ovirt IRC just had a use case where they were also confused as to why this was failing, they had a bridge as they're using it in a test environment / for a proof of concept as well and didn't realise it had to be bondXYZ

Comment 9 Sandro Bonazzola 2019-01-21 08:28:41 UTC
re-targeting to 4.3.1 since this BZ has not been proposed as blocker for 4.3.0.
If you think this bug should block 4.3.0 please re-target and set blocker flag.

Comment 10 Sandro Bonazzola 2019-02-18 07:54:54 UTC
Moving to 4.3.2 not being identified as blocker for 4.3.1.

Comment 11 Sandro Bonazzola 2019-11-13 08:43:40 UTC
Is this still reproducible?

Comment 12 Artyom 2019-11-13 08:55:03 UTC
To be honest, no clue did not work on HE a lot of time, maybe Polina can help with it.

Comment 13 Yedidyah Bar David 2019-11-13 10:04:24 UTC
From checking vdsm code, I think it's still reproducible [1].

I personally think there is no need to validate interface names. If the OS allows them, so should oVirt. At most, for sanity, use very liberal validation, similar or identical to the OS's. I wonder if anyone can come up with a concrete flow/scenario where current validation actually helped prevent confusion. 

[1] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/network/link/validator.py#L36

Comment 14 Polina 2019-11-24 09:04:48 UTC
Hi, I've tested the issue on the versions:  

rhvm-appliance-4.3-20191113.0.el7.x86_64
ovirt-hosted-engine-setup-2.3.12-1.el7ev.noarch
vdsm-4.30.38-1.el7ev.x86_64
baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.3/rhv-4.3.7-5/el$releasever 

It looks that now the deployment fails on the earlier step.

The bond defined:

[root@ocelot01 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp23s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master my-bond state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff
3: enp23s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master my-bond state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff
11: my-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff
20: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:65:6a:75 brd ff:ff:ff:ff:ff:ff
21: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:65:6a:75 brd ff:ff:ff:ff:ff:ff

Deployment fails:
2019-11-24 10:18:15,377+0200 DEBUG otopi.plugins.gr_he_common.network.bridge bridge._customization:149 {u'otopi_host_net': {u'changed': False, u'ansible_facts': {u'otopi_host_net': []}, u'_ansible_no_log': False}}
2019-11-24 10:18:15,377+0200 DEBUG otopi.context context._executeMethod:145 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 159, in _customization
    raise RuntimeError(_('A Network interface is required'))
RuntimeError: A Network interface is required
2019-11-24 10:18:15,379+0200 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Environment customization': A Network interface is required

Comment 15 Polina 2019-11-24 09:06:20 UTC
Created attachment 1639148 [details]
hosted-engine-setup.tar

attached all the logs from ovirt-hosted-engine-setup/
The error is in ovirt-hosted-engine-setup-20191124101711-ng5psi.log

Comment 17 Asaf Rachmani 2019-12-29 19:20:04 UTC
Created attachment 1648412 [details]
ovirt-hosted-engine-setup log

I reproduced it using versions:

vdsm-4.30.38-1.el7.x86_64
ovirt-hosted-engine-setup-2.3.12-1.el7.noarch
ovirt-release-host-node-4.3.7-1.el7.noarch


# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:3b:61:df brd ff:ff:ff:ff:ff:ff


During the installation see the nic is wrong:

[ INFO  ] TASK [ovirt.hosted_engine_setup : Validate selected bridge interface if management bridge does not exists]
[ INFO  ] skipping: [localhost]
          Please indicate a nic to set ovirtmgmt bridge on: (eth_0) [eth_0]: 
          Please specify which way the network connectivityshould be checked (ping, dns, tcp, none) [dns]: 


Deployment fails with the following error:

2019-12-29 14:48:27,024+0000 DEBUG otopi.context context._executeMethod:127 Stage customization METHOD otopi.plugins.gr_he_common.vm.cloud_init.Plugin._customize_vm_networking
2019-12-29 14:48:27,025+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init cloud_init._getMyIPAddrList:108 Acquiring 'eth_0' address
2019-12-29 14:48:27,026+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr', 'show', 'eth_0'), executable='None', cwd='None', env=None
2019-12-29 14:48:27,039+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr', 'show', 'eth_0'), rc=1
2019-12-29 14:48:27,040+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr', 'show', 'eth_0') stdout:


2019-12-29 14:48:27,040+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr', 'show', 'eth_0') stderr:
Device "eth_0" does not exist.

2019-12-29 14:48:27,041+0000 DEBUG otopi.context context._executeMethod:145 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 850, in _customize_vm_networking
    self._customize_vm_addressing()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 226, in _customize_vm_addressing
    my_ip = self._getMyIPAddrList()[0]
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 116, in _getMyIPAddrList
    device,
  File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute
    command=args[0],
RuntimeError: Command '/usr/sbin/ip' failed to execute
2019-12-29 14:48:27,043+0000 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Environment customization': Command '/usr/sbin/ip' failed to execute
2019-12-29 14:48:27,043+0000 DEBUG otopi.context context.dumpEnvironment:731 ENVIRONMENT DUMP - BEGIN
2019-12-29 14:48:27,044+0000 DEBUG otopi.context context.dumpEnvironment:741 ENV BASE/error=bool:'True'

Comment 18 Polina 2020-04-20 20:24:43 UTC
verification on http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-31.

works with bond named bond0 [1]

and doesn't work with 'my-bond'. according to https://bugzilla.redhat.com/show_bug.cgi?id=1776302#c3 the bond with '-' is RedHat restriction.

[1]
[root@ocelot01 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp23s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff
3: enp23s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff
5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:a4:53:1c brd ff:ff:ff:ff:ff:ff
6: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:a4:53:1c brd ff:ff:ff:ff:ff:ff

Comment 19 Sandro Bonazzola 2020-05-20 20:00:54 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.