Description of problem: When VDSM has configured a bond with DHCP and then, the host is on a network without DHCP, vdsm fails to start, not allowing any kind of management of the host until manually fixed. Version-Release number of selected component (if applicable): vdsm-4.16.8.1-6.el6ev.x86_64 Thu Apr 2 09:57:15 2015 How reproducible: Steps to Reproduce: 1. Configure host in a network with DHCP with bond and have bond requiring Ip via DHCP 2. Disable DHCP on network 3. Try to activate host from maintenance or reconfigure network Actual results: VDSM tracebacks: /etc/init.d/vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: not running [FAILED] vdsm: Running run_final_hooks vdsm stop [ OK ] libvirtd start/running, process 12084 vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running configure_vdsm_logs vdsm: Running wait_for_network vdsm: Running run_init_hooks vdsm: Running upgraded_version_check vdsm: Running check_is_configured libvirt is already configured for vdsm vdsm: Running validate_configuration SUCCESS: ssl configured to true. No conflicts vdsm: Running prepare_transient_repository vdsm: Running syslog_available vdsm: Running nwfilter vdsm: Running dummybr vdsm: Running load_needed_modules vdsm: Running tune_system vdsm: Running test_space vdsm: Running test_lo vdsm: Running unified_network_persistence_upgrade vdsm: Running restore_nets libvirt: Network Driver error : Network not found: no network with matching name 'vdsm-rhevm' Traceback (most recent call last): File "/usr/share/vdsm/vdsm-restore-net-config", line 137, in <module> restore() File "/usr/share/vdsm/vdsm-restore-net-config", line 123, in restore unified_restoration() File "/usr/share/vdsm/vdsm-restore-net-config", line 69, in unified_restoration setupNetworks(nets, bonds, connectivityCheck=False, _inRollback=True) File "/usr/share/vdsm/network/api.py", line 680, in setupNetworks implicitBonding=True, _netinfo=_netinfo, **d) File "/usr/share/vdsm/network/api.py", line 226, in wrapped ret = func(**attrs) File "/usr/share/vdsm/network/api.py", line 315, in addNetwork netEnt.configure(**options) File "/usr/share/vdsm/network/models.py", line 169, in configure self.configurator.configureBridge(self, **opts) File "/usr/share/vdsm/network/configurators/ifcfg.py", line 88, in configureBridge ifup(bridge.name, bridge.ipConfig.async) File "/usr/share/vdsm/network/configurators/ifcfg.py", line 824, in ifup rc, out, err = _ifup(iface) File "/usr/share/vdsm/network/configurators/ifcfg.py", line 813, in _ifup raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '') network.errors.ConfigNetworkError: (29, 'Determining IP information for XXXX ... failed.') vdsm: stopped during execute restore_nets task (task returned with error code 1). vdsm start [FAILED] Expected results: VDSM should fail on bringing up this ip for this bond, but still have VDSM running Additional info: Fixed by manually removing 'dhcp' from /var/lib/vdsm/persistence/netconf/nets/$NETWORK as per https://access.redhat.com/solutions/1452363 But this still shouldn't happen.
I would say that the code in vdsm/vdsm/network/configurators/ifcfg.py requires some error handling that doesn't propagate to vdsm daemon: def _exec_ifup(iface_name, cgroup=dhclient.DHCLIENT_CGROUP): """Bring up an interface""" cmd = [constants.EXT_IFUP, iface_name] if cgroup is not None: cmd = cmdutils.systemd_run(cmd, scope=True, slice=cgroup) rc, out, err = utils.execCmd(cmd, raw=False) if rc != 0: # In /etc/sysconfig/network-scripts/ifup* the last line usually # contains the error reason. raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
This would be a less of an issue in 3.5.4, as in the typical case, Vdsm would not restore networks on its own, but rather expect to find them up by means of ONBOOT=yes.
(In reply to Dan Kenigsberg from comment #2) > This would be a less of an issue in 3.5.4, as in the typical case, Vdsm > would not restore networks on its own, but rather expect to find them up by > means of ONBOOT=yes. assuming a worse case where VDSM indeed needs to restore at least one network there are tew aspects to consider in case restoration fails: 1. Since there is a single API call to restore _all_ needed networks, this call either fails or succeeds. Although happily this happen transactionally, one network failure fails the whole restoration. This might be addresses by changing setupNetworks for the restoration case. This verb already knows if it was called during restoration or not (_inRollback). 2. Network restoration takes place in a oneShot dependecy called vdsm-network. If this service fails as a result of network restoration failure, VDSM will not start. as the first issue is worth a discussion on how to fix, the second makes sense, and 'fixing' he first will fix this scenario.
We could try to call setupNetworks from the restoration script separately (one net/bond at a time) and be ready accept failures. However, if the management network happens to fail (VDSM does not know which is it), the host will still be unmanageable. Also, bond creation failures might cause subsequent network creation failures.
Verified on - vdsm-4.19.2-2.el7ev.x86_64 and 4.1.0.2-0.1.el7 Now, when one or some networks failed to be restored, vdsm continue even if one failed and vdsmd is running in the end. restore-net::ERROR::2017-01-23 13:54:49,110::__init__::51::root::(__exit__) Failed rollback transaction last known good network. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 153, in _setup_legacy bondings, _netinfo) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 464, in add_missing_networks _netinfo=_netinfo, **attrs) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 182, in wrapped return func(network, configurator, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 243, in _add_network net_ent_to_configure.configure(**options) File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 187, in configure self.configurator.configureBridge(self, **opts) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 116, in configureBridge _ifup(bridge) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 927, in _ifup _exec_ifup(iface, cgroup) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 884, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 870, in _exec_ifup_by_name raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '') ConfigNetworkError: (29, '') restore-net::ERROR::2017-01-23 13:54:49,111::vdsm-restore-net-config::162::root::(_greedy_setup_nets) Failed to setup n11 Traceback (most recent call last): File "/usr/share/vdsm/vdsm-restore-net-config", line 160, in _greedy_setup_nets {'connectivityCheck': False, '_inRollback': True}) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 261, in setupNetworks _setup_networks(networks, bondings, options) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 282, in _setup_networks netswitch.setup(networks, bondings, options, in_rollback) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 132, in setup _setup_legacy(legacy_nets, legacy_bonds, options, in_rollback) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 153, in _setup_legacy bondings, _netinfo) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 464, in add_missing_networks _netinfo=_netinfo, **attrs) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 182, in wrapped return func(network, configurator, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 243, in _add_network net_ent_to_configure.configure(**options) File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 187, in configure self.configurator.configureBridge(self, **opts) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 116, in configureBridge _ifup(bridge) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 927, in _ifup _exec_ifup(iface, cgroup) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 884, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 870, in _exec_ifup_by_name raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '') ConfigNetworkError: (29, '') restore-net::INFO::2017-01-23 13:54:49,112::vdsm-restore-net-config::481::root::(restore) restoration completed successfully.