Bug 1459229
Summary: | Interface matching regular expression ignores interfaces with a '-' in the name | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-ansible-collection | Reporter: | Artyom <alukiano> | ||||||||
Component: | hosted-engine-setup | Assignee: | Asaf Rachmani <arachman> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Polina <pagranat> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | unspecified | CC: | arachman, bugs, danken, dfediuck, didi, mailinglists, mavital, pagranat, stirabos | ||||||||
Target Milestone: | ovirt-4.4.0 | Keywords: | Triaged | ||||||||
Target Release: | 1.0.35 | Flags: | sbonazzo:
ovirt-4.4?
|
||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | ovirt-ansible-hosted-engine-setup-1.0.35 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2020-05-20 20:00:54 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1776302 | ||||||||||
Bug Blocks: | 1452243, 1455606 | ||||||||||
Attachments: |
|
Verified on ovirt-hosted-engine-setup-2.1.3-1.el7ev.noarch It is looking fine from hosted-engine perspective now, but guys from network team said that bond name with '-' not supported by VDSM and engine. See the error: File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 388, in _setupNetworks 'message: "%s"' % (networks, code, message)) RuntimeError: Failed to setup networks {'ovirtmgmt': {'bonding': 'my-bond', 'bootproto': 'dhcp', 'blockingdhcp': True, 'defaultRoute': True}}. Error code: "25" message: "u'my-bond' is not a valid bonding device name" Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Is this on track to get into 4.1.5? (In reply to Yaniv Kaul from comment #3) > Is this on track to get into 4.1.5? We fixed this once and reverted as for https://bugzilla.redhat.com/show_bug.cgi?id=1467733 Now we have a patch ( https://gerrit.ovirt.org/#/c/79012/ ) to properly fix it on hosted-engine-setup side but an interface name with a dash will still cause an issue on vdsm side as for https://bugzilla.redhat.com/show_bug.cgi?id=1459229#c1 so we are still not ready to merge it. (In reply to Simone Tiraboschi from comment #4) > (In reply to Yaniv Kaul from comment #3) > > Is this on track to get into 4.1.5? > > We fixed this once and reverted as for > https://bugzilla.redhat.com/show_bug.cgi?id=1467733 > Now we have a patch ( https://gerrit.ovirt.org/#/c/79012/ ) to properly fix > it on hosted-engine-setup side but an interface name with a dash will still > cause an issue on vdsm side as for > https://bugzilla.redhat.com/show_bug.cgi?id=1459229#c1 so we are still not > ready to merge it. Please open a bug on VDSM about this and make it blocking this one. FYI - I recently posted to the mailing list regarding a similar issue where our bonds are all named after the network / VLAN they exist on, I'll add my comments here to avoid creating another bug. --- I'm having a problem where when setting up hosted engine deployment it fails stating that the selected bond name is bad. "code=25, message=bad bond name(s): mgmt)" - Is there a problem similar to https://bugzilla.redhat.com/show_bug.cgi?id=1519807 that's known? - If it seems to be this bug, is it preferred that I simply update the existing, closed issue as I have done, or open a new bug? --- --> install logs provided to Doron Fediuck and a few others at Redhat --- Dan Kenigsberg from Redhat: "I see that you are trying to use a bond interface named "mgmt". To avoid confusion while debugging a system, Vdsm has opted to allow only bond names starting with "bond" followed by one or more decimal digits. Anything else is considered "bad bond". I prefer keeping the "bond" prefix compulsory, but I'd like to hear why using different names is useful. You can reopen this bug, but please move it to vdsm and rename it: it should be something like "Allow any bondXYZ name for bonds" or "Allow any bond name" and explain there why it is a good idea. Dominik, is there an Engine-side limitation on bond names?" --- Yedidyah Bar David from Redhat: "Please note that this is just but one bug in a series/tree of related bugs, some of which are open. If you decide to follow Dan's suggestion, perhaps reuse one of the others, or perhaps even better - open a new one, and eventually one or more will be closed as duplicate of one or more of the others. Sadly, not all of them link properly to each other, and at least one which was fixed caused another bug, so the fix was reverted. See also e.g. all of the discussion in: https://bugzilla.redhat.com/show_bug.cgi?id=1459229" By the way, it is very useful to name a bonded interface things other than bondXYZ, for example, you might have 6 bonds, each of a different network or native VLAN. It helps with debugging, troubleshooting and logging if the interface is named after the (native) network, e.g. your iSCSI storage network might have a bond called 'storage', while your management or hypervisor network might have a bond named 'mgmt' then perhaps you have 'data' bond that might have several vlans off it such as 'db' (database), 'dmz', 'staff' etc... depending on how and where you chop your network up. FYI - Someone on #ovirt IRC just had a use case where they were also confused as to why this was failing, they had a bridge as they're using it in a test environment / for a proof of concept as well and didn't realise it had to be bondXYZ re-targeting to 4.3.1 since this BZ has not been proposed as blocker for 4.3.0. If you think this bug should block 4.3.0 please re-target and set blocker flag. Moving to 4.3.2 not being identified as blocker for 4.3.1. Is this still reproducible? To be honest, no clue did not work on HE a lot of time, maybe Polina can help with it. From checking vdsm code, I think it's still reproducible [1]. I personally think there is no need to validate interface names. If the OS allows them, so should oVirt. At most, for sanity, use very liberal validation, similar or identical to the OS's. I wonder if anyone can come up with a concrete flow/scenario where current validation actually helped prevent confusion. [1] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/network/link/validator.py#L36 Hi, I've tested the issue on the versions: rhvm-appliance-4.3-20191113.0.el7.x86_64 ovirt-hosted-engine-setup-2.3.12-1.el7ev.noarch vdsm-4.30.38-1.el7ev.x86_64 baseurl=http://bob.eng.lab.tlv.redhat.com/builds/4.3/rhv-4.3.7-5/el$releasever It looks that now the deployment fails on the earlier step. The bond defined: [root@ocelot01 ~]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp23s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master my-bond state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff 3: enp23s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master my-bond state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff 11: my-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff 20: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:65:6a:75 brd ff:ff:ff:ff:ff:ff 21: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:65:6a:75 brd ff:ff:ff:ff:ff:ff Deployment fails: 2019-11-24 10:18:15,377+0200 DEBUG otopi.plugins.gr_he_common.network.bridge bridge._customization:149 {u'otopi_host_net': {u'changed': False, u'ansible_facts': {u'otopi_host_net': []}, u'_ansible_no_log': False}} 2019-11-24 10:18:15,377+0200 DEBUG otopi.context context._executeMethod:145 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 159, in _customization raise RuntimeError(_('A Network interface is required')) RuntimeError: A Network interface is required 2019-11-24 10:18:15,379+0200 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Environment customization': A Network interface is required Created attachment 1639148 [details]
hosted-engine-setup.tar
attached all the logs from ovirt-hosted-engine-setup/
The error is in ovirt-hosted-engine-setup-20191124101711-ng5psi.log
added https://bugzilla.redhat.com/show_bug.cgi?id=1776302 related to https://bugzilla.redhat.com/show_bug.cgi?id=1459229#c14 Created attachment 1648412 [details]
ovirt-hosted-engine-setup log
I reproduced it using versions:
vdsm-4.30.38-1.el7.x86_64
ovirt-hosted-engine-setup-2.3.12-1.el7.noarch
ovirt-release-host-node-4.3.7-1.el7.noarch
# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:3b:61:df brd ff:ff:ff:ff:ff:ff
During the installation see the nic is wrong:
[ INFO ] TASK [ovirt.hosted_engine_setup : Validate selected bridge interface if management bridge does not exists]
[ INFO ] skipping: [localhost]
Please indicate a nic to set ovirtmgmt bridge on: (eth_0) [eth_0]:
Please specify which way the network connectivityshould be checked (ping, dns, tcp, none) [dns]:
Deployment fails with the following error:
2019-12-29 14:48:27,024+0000 DEBUG otopi.context context._executeMethod:127 Stage customization METHOD otopi.plugins.gr_he_common.vm.cloud_init.Plugin._customize_vm_networking
2019-12-29 14:48:27,025+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init cloud_init._getMyIPAddrList:108 Acquiring 'eth_0' address
2019-12-29 14:48:27,026+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.executeRaw:813 execute: ('/usr/sbin/ip', 'addr', 'show', 'eth_0'), executable='None', cwd='None', env=None
2019-12-29 14:48:27,039+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.executeRaw:863 execute-result: ('/usr/sbin/ip', 'addr', 'show', 'eth_0'), rc=1
2019-12-29 14:48:27,040+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.execute:921 execute-output: ('/usr/sbin/ip', 'addr', 'show', 'eth_0') stdout:
2019-12-29 14:48:27,040+0000 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.execute:926 execute-output: ('/usr/sbin/ip', 'addr', 'show', 'eth_0') stderr:
Device "eth_0" does not exist.
2019-12-29 14:48:27,041+0000 DEBUG otopi.context context._executeMethod:145 method exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
method['method']()
File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 850, in _customize_vm_networking
self._customize_vm_addressing()
File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 226, in _customize_vm_addressing
my_ip = self._getMyIPAddrList()[0]
File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 116, in _getMyIPAddrList
device,
File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute
command=args[0],
RuntimeError: Command '/usr/sbin/ip' failed to execute
2019-12-29 14:48:27,043+0000 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Environment customization': Command '/usr/sbin/ip' failed to execute
2019-12-29 14:48:27,043+0000 DEBUG otopi.context context.dumpEnvironment:731 ENVIRONMENT DUMP - BEGIN
2019-12-29 14:48:27,044+0000 DEBUG otopi.context context.dumpEnvironment:741 ENV BASE/error=bool:'True'
verification on http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-31. works with bond named bond0 [1] and doesn't work with 'my-bond'. according to https://bugzilla.redhat.com/show_bug.cgi?id=1776302#c3 the bond with '-' is RedHat restriction. [1] [root@ocelot01 ~]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp23s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff 3: enp23s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:57:ae:82 brd ff:ff:ff:ff:ff:ff 5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:a4:53:1c brd ff:ff:ff:ff:ff:ff 6: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:a4:53:1c brd ff:ff:ff:ff:ff:ff This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1285442 [details] ovirt-hosted-engine-setup log Description of problem: Interface matching regular expression ignores interfaces with a '-' in the name so a bond named 'nm-bond1.122' will be ignored due to the '-' in its name. The user instead could configure a device with that name from NetworkManager/Cockpit UI. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.1.2-2.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Create bond on the host with - my-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 00:14:5e:dd:05:55 brd ff:ff:ff:ff:ff:ff inet 10.35.72.13/24 brd 10.35.72.255 scope global dynamic my-bond ... 2. Run hosted-engine deploy 3. Actual results: Deployment fails with the traceback 2017-06-05 09:49:20 DEBUG otopi.plugins.gr_he_common.vm.cloud_init plugin.execute:926 execute-output: ('/sbin/ip', 'addr', 'show', 'my-bond') stderr: 2017-06-05 09:49:20 DEBUG otopi.plugins.gr_he_common.vm.cloud_init cloud_init._getMyIPAddress:132 address: None 2017-06-05 09:49:20 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 781, in _customize_vm_networking self._customize_vm_addressing() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 215, in _customize_vm_addressing my_ip = self._getMyIPAddress() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/cloud_init.py", line 136, in _getMyIPAddress _('Cannot acquire nic/bridge address') RuntimeError: Cannot acquire nic/bridge address 2017-06-05 09:49:20 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Environment customization': Cannot acquire nic/bridge address Expected results: Deployment succeeds Additional info: Problem in the regular expression _INET_ADDRESS_RE = re.compile( flags=re.VERBOSE, pattern=r""" \s+ inet \s (?P<address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/\d{1,2}) .+ \s+ (?P<interface>[a-zA-Z0-9_.]+) $ """ )