Bug 1379833 - setupNetworks works only with existing vlan interfaces with VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD name type
Summary: setupNetworks works only with existing vlan interfaces with VLAN_NAME_TYPE_RA...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: ---
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Edward Haas
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks: 1547768
TreeView+ depends on / blocked
 
Reported: 2016-09-27 19:40 UTC by Jason Woods
Modified: 2021-09-09 11:59 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-10-30 13:45:48 UTC
oVirt Team: Network
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43437 0 None None None 2021-09-09 11:59:08 UTC

Description Jason Woods 2016-09-27 19:40:37 UTC
Description of problem:
OS with network interface named 'rhevm' causes RHV 4.0 hosted-engine to fail network setup with following error:
[ INFO  ] Configuring the management bridge
[ ERROR ] Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {'nic': u'em1_1', 'vlan': 250, 'ipaddr': u'x.x.132.101', 'netmask': u'255.255.254.0', 'bootproto': u'none', 'gateway': u'x.x.132.1', 'defaultRoute': True}}. Error code: "29" message: "ERROR    : [/usr/sbin/ifup] ERROR: could not add vlan 250 as em1_1.250 on dev em1_1"
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160927184322.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160927184125-ninavq.log

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup 2.0.1.5-1 installed on RHVH-4.0-20160907.4-RHVH-x86_64-dvd1.iso deployed RHV host

How reproducible:
100%

Steps to Reproduce:
1. Use RHVH 4.0 ISO to build hypervisor
2. Name network interface 'rhevm' (vlan on interface was in use, this may not be required)

Actual results:
'hosted-engine --deploy' fails during stage 'Misc configuration' stating Error 29: "could not add vlan 250 as em1_1.250 on dev em1_1"

Expected results:
'hosted-engine --deploy' installs and configures hosted engine RHV manager

Additional info:
This was a new install of RHVH 4.0 host for a greenfield RHV environment.

The only way to fix the situation was to manually rename the interface 'rhevm' to 'em1_1.250', both in device name in the config file, and also the config file name in /etc/sysconfig/network-scripts/. For good measure, I also stopped and disabled NetworkManager on the RHVH 4.0 OS. After these changes, I was successful in attempt to run 'hosted-engine --deploy'.

Comment 1 Yedidyah Bar David 2016-09-28 05:08:33 UTC
Please attach: /var/log/ovirt-hosted-engine-setup/* /var/log/vdsm/* .

How did you rename it originally to 'rhevm'? Did you then reboot and verify that everything works (before starting hosted-engine --deploy)?

Comment 2 Dan Kenigsberg 2016-09-28 13:58:59 UTC
Jason, may I also ask how you've create the 'rhevm' interface, and why? Is it a bridge? Why is this of urgent severity?

Comment 3 Jason Woods 2016-09-28 16:50:49 UTC
The customer named the interface 'rhevm' during the original install via the RHVH 4.0 ISO, using the OS install anaconda interface.

The interface was not in a bond, but a single NIC (em1_1) with a single VLAN (em1_1 on vlan 250) in use.

The system was rebooted after changing the NIC interface name from 'rhevm' to 'em1_1.250'. After the reboot, the network functioned correctly as it was before, and I again worked with the customer to run the 'hosted-engine --deploy'. There we no problems with the RHVM deployment after the network interface was renamed and the system was rebooted.

I guess the severity could be lowered. But if a customer names an interface 'rhevm' during the install of the OS, the 'hosted-engine --deploy' process will fail, but not state why other than an error 29 about setting up network.

IMHO, at a minimum, the hosted-engine installer should report 'why' the deployment failed stating the network is called a reserved name, 'rhevm'. The other option would be to have the hosted-engine installer automatically fix this issue upon deployment (rename network interface to NIC_VLAN id name).

I have not tested if this is the case when not using a VLAN (naming interface em1_1 as 'rhevm'). Nor have I tested this with any bonding or bridging on the NIC.

Steps from original problem to fix:
1. Install OS via RHVH 4.0 ISO.
2. During OS install, name the management network interface 'rhevm'.
3. After OS install, network functions, but hosted-engine fails to deploy manager.
4. Rename 'rhevm' network interface to 'em1_1.250' (both the ifcfg file name, and the DEVICE name in the ifcfg file).
5. Reboot OS.
6. Verified network is working.
7. Run 'hosted-engine --deploy' again, this time it correctly deploys a RHV manager.
8. RHVM 4.0 hosted engine boots and is available for use.

I will be attaching the files requested.

Comment 6 Dan Kenigsberg 2016-09-29 05:58:40 UTC
Jason, I believe that you did not explain why was the nic renamed, and how. It is highly unusual.

'rhevm' used to be the of the management network. Having a NIC named that way is bound to fail at one point, just as is trying to define a VM network named "eth0". We recommend users to stick to systemd-allocated NIC names. Can your customer do the same?

Comment 8 Dan Kenigsberg 2016-09-30 20:06:21 UTC
It would be nicer for hosted-engine --deploy to fail more verbosely.

But in my perspective, the more important bit is to deter users of playing with NIC names.

Comment 9 Sandro Bonazzola 2016-10-13 08:10:21 UTC
Simone can you please review the logs and see what's the real failure here?

Comment 10 Simone Tiraboschi 2016-10-13 09:28:39 UTC
The issue seams here, the vlan interface over em1_1 was named 'rhevm' instead of 'em1_1.250'.

The user choose to create the management bridge over 'rhevm' but then hosted-engine-setup sent:
 2016-09-27 18:00:35 DEBUG otopi.plugins.gr_he_common.network.bridge bridge._misc:400 networks: {'ovirtmgmt': {'nic': u'em1_1', 'vlan': 250, 'ipaddr': u'172.27.132.101', 'netmask': u'255.255.254.0', 'bootproto': u'none', 'gateway': u'172.27.132.1', 'defaultRoute': True}}

with 'nic': u'em1_1' and 'vlan': 250.
And it failed:
 2016-09-27 18:00:39 DEBUG otopi.context context._executeMethod:142 method exception
 Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
     method['method']()
   File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 403, in _misc
     _setupNetworks(conn, networks, bonds, options)
   File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 421, in _setupNetworks
     'message: "%s"' % (networks, code, message))
 RuntimeError: Failed to setup networks {'ovirtmgmt': {'nic': u'em1_1', 'vlan': 250, 'ipaddr': u'172.27.132.101', 'netmask': u'255.255.254.0', 'bootproto': u'none', 'gateway': u'172.27.132.1', 'defaultRoute': True}}. Error code: "29" message: "ERROR    : [/usr/sbin/ifup] ERROR: could not add vlan 250 as em1_1.250 on dev em1_1"
 2016-09-27 18:00:39 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {'nic': u'em1_1', 'vlan': 250, 'ipaddr': u'172.27.132.101', 'netmask': u'255.255.254.0', 'bootproto': u'none', 'gateway': u'172.27.132.1', 'defaultRoute': True}}. Error code: "29" message: "ERROR    : [/usr/sbin/ifup] ERROR: could not add vlan 250 as em1_1.250 on dev em1_1"

If we check supervdsm logs:
it tried to remove /etc/sysconfig/network-scripts/ifcfg-em1_1.250 (but the vlan interface was named rhevm).

 MainProcess|jsonrpc.Executor/2::DEBUG::2016-09-27 18:00:39,787::ifcfg::404::root::(restoreAtomicBackup) Removing empty configuration backup /etc/sysconfig/network-scripts/ifcfg-em1_1.250
 MainProcess|jsonrpc.Executor/2::DEBUG::2016-09-27 18:00:39,787::ifcfg::302::root::(_removeFile) Removed file /etc/sysconfig/network-scripts/ifcfg-em1_1.250
 MainProcess|jsonrpc.Executor/2::INFO::2016-09-27 18:00:39,788::ifcfg::409::root::(restoreAtomicBackup) Restored /etc/sysconfig/network-scripts/ifcfg-em1_1.250

and so it failed when it tried to bring up em1_1.250 since was still used by rhevm vlan.
 MainProcess|jsonrpc.Executor/2::ERROR::2016-09-27 18:00:39,789::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
 Traceback (most recent call last):
   File "/usr/share/vdsm/supervdsmServer", line 94, in wrapper
     res = func(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 248, in setupNetworks
     _setup_networks(networks, bondings, options)
   File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 275, in _setup_networks
     netswitch.setup(networks, bondings, options, in_rollback)
   File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 117, in setup
     _setup_legacy(legacy_nets, legacy_bonds, options, in_rollback)
   File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 138, in _setup_legacy
     bondings, _netinfo)
   File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 471, in add_missing_networks
     _netinfo=_netinfo, **attrs)
   File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 180, in wrapped
     return func(network, configurator, **kwargs)
   File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 250, in _add_network
     net_ent_to_configure.configure(**options)
   File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 186, in configure
     self.configurator.configureBridge(self, **opts)
   File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 110, in configureBridge
     bridge.port.configure(**opts)
   File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 142, in configure
     self.configurator.configureVlan(self, **opts)
   File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 118, in configureVlan
     _ifup(vlan)
   File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 850, in _ifup
     _exec_ifup(iface, cgroup)
   File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 807, in _exec_ifup
     _exec_ifup_by_name(iface.name, cgroup)
   File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 793, in _exec_ifup_by_name
     raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
 ConfigNetworkError: (29, 'ERROR    : [/usr/sbin/ifup] ERROR: could not add vlan 250 as em1_1.250 on dev em1_1')

Comment 11 Simone Tiraboschi 2016-10-13 12:31:20 UTC
OK, reproduced also using a vlan interface named vlan0015 which is perfectly valid according to VLAN_NAME_TYPE_PLUS_VID name type (please see https://github.com/torvalds/linux/blob/master/net/8021q/vlan.c#L245 )

My initial configuration was:

 [root@c72he20160922h1 ~]# ip a
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host 
        valid_lft forever preferred_lft forever
 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     link/ether 00:1a:4a:4f:bd:10 brd ff:ff:ff:ff:ff:ff
     inet 192.168.1.124/24 brd 192.168.1.255 scope global dynamic eth0
        valid_lft 172783sec preferred_lft 172783sec
     inet6 fe80::21a:4aff:fe4f:bd10/64 scope link 
        valid_lft forever preferred_lft forever
 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     link/ether 00:1a:4a:4f:bd:18 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::21a:4aff:fe4f:bd18/64 scope link 
        valid_lft forever preferred_lft forever
 4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
     link/ether 00:1a:4a:4f:bd:19 brd ff:ff:ff:ff:ff:ff
 6: vlan0015@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
     link/ether 00:1a:4a:4f:bd:18 brd ff:ff:ff:ff:ff:ff
     inet 192.168.3.124/24 brd 192.168.3.255 scope global vlan0015
        valid_lft forever preferred_lft forever
     inet6 fe80::21a:4aff:fe4f:bd18/64 scope link 
        valid_lft forever preferred_lft forever
        
 [root@c72he20160922h1 ~]# ls /etc/sysconfig/network-scripts/ifcfg-*
 /etc/sysconfig/network-scripts/ifcfg-eth0  /etc/sysconfig/network-scripts/ifcfg-eth1  /etc/sysconfig/network-scripts/ifcfg-lo  /etc/sysconfig/network-scripts/ifcfg-vlan0015
 
 [root@c72he20160922h1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-vlan0015
 VLAN=yes
 DEVICE=vlan0015
 VLAN_NAME_TYPE=VLAN_NAME_TYPE_PLUS_VID
 PHYSDEV=eth1
 BOOTPROTO=static
 ONBOOT=yes
 TYPE=Ethernet
 IPADDR=192.168.3.124
 NETMASK=255.255.255.0
 
 [root@c72he20160922h1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
 TYPE=Ethernet
 BOOTPROTO=static
 NAME=eth1
 DEVICE=eth1
 ONBOOT=yes

hosted-engine-setup (but I think that this is not hosted-engine specific) correctly sent:
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,487::legacy_switch::461::root::(add_missing_networks) Adding network u'ovirtmgmt'
 MainProcess|jsonrpc/2::INFO::2016-10-13 14:07:37,488::netconfpersistence::58::root::(setNetwork) Adding network ovirtmgmt({'ipv6autoconf': False, 'nameservers': ['192.168.1.1', '0.0.0.0'], u'nic': u'eth1', u'vlan': 15, u'ipaddr': u'192.168.3.124', u'netmask': u'255.255.255.0', 'mtu': 1500, 'switch': 'legacy', 'dhcpv6': False, 'stp': False, 'bridged': True, u'gateway': u'192.168.3.1', u'defaultRoute': True})
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,488::legacy_switch::204::root::(_add_network) Validating network...
 MainProcess|jsonrpc/2::INFO::2016-10-13 14:07:37,488::legacy_switch::215::root::(_add_network) Adding network ovirtmgmt with vlan=15, bonding=None, nic=eth1, mtu=1500, bridged=True, defaultRoute=True, options={'switch': 'legacy', 'stp': False}

So supervdsm tried to write or rewrite /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt and /etc/sysconfig/network-scripts/ifcfg-eth1.15 but nothing removed /etc/sysconfig/network-scripts/ifcfg-vlan0015 and so the issue.

 MainProcess|jsonrpc/2::INFO::2016-10-13 14:07:37,488::legacy_switch::242::root::(_add_network) Configuring device ovirtmgmt
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,489::ifcfg::496::root::(_persistentBackup) backing up ifcfg-ovirtmgmt: # original file did not exist
 
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,489::ifcfg::401::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt (until next 'set safe config')
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,489::ifcfg::560::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt configuration:
 # Generated by VDSM version 4.18.999-725.gitcfbfec8.el7.centos
 DEVICE=ovirtmgmt
 TYPE=Bridge
 DELAY=0
 STP=off
 ONBOOT=yes
 IPADDR=192.168.3.124
 NETMASK=255.255.255.0
 GATEWAY=192.168.3.1
 BOOTPROTO=none
 MTU=1500
 DEFROUTE=yes
 NM_CONTROLLED=no
 IPV6INIT=no
 DNS1=192.168.1.1
 DNS2=0.0.0.0
 
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,492::commands::69::root::(execCmd) /usr/bin/taskset --cpu-list 0-3 /usr/sbin/ifdown ovirtmgmt (cwd None)
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,690::commands::93::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,691::ifcfg::496::root::(_persistentBackup) backing up ifcfg-eth1.15: # original file did not exist

The issue is here ------------------------------^
since instead we have /etc/sysconfig/network-scripts/ifcfg-vlan0015 (which is valid if in VLAN_NAME_TYPE_PLUS_VID schema) and nothing backups and remove it.

 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,692::ifcfg::401::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-eth1.15 (until next 'set safe config')
 MainProcess|jsonrpc/2::DEBUG::2016-10-13 14:07:37,692::ifcfg::560::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-eth1.15 configuration:
 # Generated by VDSM version 4.18.999-725.gitcfbfec8.el7.centos
 DEVICE=eth1.15
 VLAN=yes
 BRIDGE=ovirtmgmt
 ONBOOT=yes
 MTU=1500
 NM_CONTROLLED=no
 IPV6INIT=no

Comment 12 Dan Kenigsberg 2016-10-23 15:28:24 UTC
Correct, Vdsm is very picky about the ifcfg names it can process. Can you explain why do you mind that? What created vlan0015? Can it use 
VLAN_NAME_TYPE_RAW_PLUS_VID instead?

Comment 13 Simone Tiraboschi 2016-10-24 08:42:55 UTC
(In reply to Dan Kenigsberg from comment #12)
> Correct, Vdsm is very picky about the ifcfg names it can process. Can you
> explain why do you mind that? What created vlan0015? Can it use 
> VLAN_NAME_TYPE_RAW_PLUS_VID instead?

I manually set it up before running hosted-engine-setup and nothing alerted me; of course I could also use VLAN_NAME_TYPE_RAW_PLUS_VID: it was just to reproduce the user reported behavior on a valid system configuration (VLAN_NAME_TYPE_PLUS_VID is in theory valid, 'rhevm' as the user did wasn't).

IMO, if we don't want to support naming schema other than VLAN_NAME_TYPE_RAW_PLUS_VID, we should at least check and fail with a clear error so that for the user will be easier to detect the issue and fix.

Comment 14 Dan Kenigsberg 2016-10-30 13:45:48 UTC
Thank you, Simone.

You are obviously correct, but as it stands, supporting anything other than good-old ifcfg-eth0.400 naming scheme is not on our recent roadmap - unless Edy tells me that it's terribly easy to do.

Comment 15 Jason Woods 2016-10-30 14:17:43 UTC
"IMO, if we don't want to support naming schema other than VLAN_NAME_TYPE_RAW_PLUS_VID, we should at least check and fail with a clear error so that for the user will be easier to detect the issue and fix."

I fully agree with this statement. It is not important to support a new naming convention. But please have engine setup report an error stating this is the reason.

As it is now, the user is given a generic error and is left to decipher the true cause, which will likely generate a support call.

Thank you for the work put in to this.


Note You need to log in before you can comment on or make changes to this bug.