Bug 1242532 - vdsm fails to start if one network fails to be restored
Summary: vdsm fails to start if one network fails to be restored
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.1.0-beta
: ---
Assignee: Edward Haas
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-13 13:58 UTC by Pablo Iranzo Gómez
Modified: 2019-07-16 11:59 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.19.1-56.gitb2ac850
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-25 00:54:21 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:0998 0 normal SHIPPED_LIVE VDSM bug fix and enhancement update 4.1 GA 2017-04-18 20:11:39 UTC
oVirt gerrit 44274 0 'None' ABANDONED net: try restore networks harder. 2021-01-28 09:32:40 UTC
oVirt gerrit 68912 0 'None' MERGED net: Networking restoration done in a greedy mode 2021-01-28 09:32:40 UTC
oVirt gerrit 70606 0 'None' MERGED net: Networking restoration done in a greedy mode 2021-01-28 09:32:40 UTC

Description Pablo Iranzo Gómez 2015-07-13 13:58:10 UTC
Description of problem:

When VDSM has configured a bond with DHCP and then, the host is on a network without DHCP, vdsm fails to start, not allowing any kind of management of the host until manually fixed.

Version-Release number of selected component (if applicable):

vdsm-4.16.8.1-6.el6ev.x86_64                                Thu Apr  2 09:57:15 2015


How reproducible:

Steps to Reproduce:
1. Configure host in a network with DHCP with bond and have bond requiring Ip via DHCP
2. Disable DHCP on network
3. Try to activate host from maintenance or reconfigure network

Actual results:

VDSM tracebacks:


/etc/init.d/vdsmd restart
Shutting down vdsm daemon:
vdsm watchdog stop                                         [  OK  ]
vdsm: not running                                          [FAILED]
vdsm: Running run_final_hooks
vdsm stop                                                  [  OK  ]
libvirtd start/running, process 12084
vdsm: Running mkdirs
vdsm: Running configure_coredump
vdsm: Running configure_vdsm_logs
vdsm: Running wait_for_network
vdsm: Running run_init_hooks
vdsm: Running upgraded_version_check
vdsm: Running check_is_configured
libvirt is already configured for vdsm
vdsm: Running validate_configuration
SUCCESS: ssl configured to true. No conflicts
vdsm: Running prepare_transient_repository
vdsm: Running syslog_available
vdsm: Running nwfilter
vdsm: Running dummybr
vdsm: Running load_needed_modules
vdsm: Running tune_system
vdsm: Running test_space
vdsm: Running test_lo
vdsm: Running unified_network_persistence_upgrade
vdsm: Running restore_nets
libvirt: Network Driver error : Network not found: no network with matching name 'vdsm-rhevm'
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm-restore-net-config", line 137, in <module>
    restore()
  File "/usr/share/vdsm/vdsm-restore-net-config", line 123, in restore
    unified_restoration()
  File "/usr/share/vdsm/vdsm-restore-net-config", line 69, in unified_restoration
    setupNetworks(nets, bonds, connectivityCheck=False, _inRollback=True)
  File "/usr/share/vdsm/network/api.py", line 680, in setupNetworks
    implicitBonding=True, _netinfo=_netinfo, **d)
  File "/usr/share/vdsm/network/api.py", line 226, in wrapped
    ret = func(**attrs)
  File "/usr/share/vdsm/network/api.py", line 315, in addNetwork
    netEnt.configure(**options)
  File "/usr/share/vdsm/network/models.py", line 169, in configure
    self.configurator.configureBridge(self, **opts)
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 88, in configureBridge
    ifup(bridge.name, bridge.ipConfig.async)
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 824, in ifup
    rc, out, err = _ifup(iface)
  File "/usr/share/vdsm/network/configurators/ifcfg.py", line 813, in _ifup
    raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
network.errors.ConfigNetworkError: (29, 'Determining IP information for XXXX ... failed.')
vdsm: stopped during execute restore_nets task (task returned with error code 1).
vdsm start                                                 [FAILED]

Expected results:

VDSM should fail on bringing up this ip for this bond, but still have VDSM running

Additional info:

Fixed by manually removing 'dhcp' from /var/lib/vdsm/persistence/netconf/nets/$NETWORK as per https://access.redhat.com/solutions/1452363


But this still shouldn't happen.

Comment 1 Pablo Iranzo Gómez 2015-07-13 14:00:37 UTC
I would say that the code in vdsm/vdsm/network/configurators/ifcfg.py requires some error handling that doesn't propagate to vdsm daemon:

def _exec_ifup(iface_name, cgroup=dhclient.DHCLIENT_CGROUP):
    """Bring up an interface"""
    cmd = [constants.EXT_IFUP, iface_name]

    if cgroup is not None:
        cmd = cmdutils.systemd_run(cmd, scope=True, slice=cgroup)

    rc, out, err = utils.execCmd(cmd, raw=False)

    if rc != 0:
        # In /etc/sysconfig/network-scripts/ifup* the last line usually
        # contains the error reason.
        raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')

Comment 2 Dan Kenigsberg 2015-07-20 14:59:33 UTC
This would be a less of an issue in 3.5.4, as in the typical case, Vdsm would not restore networks on its own, but rather expect to find them up by means of ONBOOT=yes.

Comment 3 Ido Barkan 2015-07-22 12:20:39 UTC
(In reply to Dan Kenigsberg from comment #2)
> This would be a less of an issue in 3.5.4, as in the typical case, Vdsm
> would not restore networks on its own, but rather expect to find them up by
> means of ONBOOT=yes.

assuming a worse case where VDSM indeed needs to restore at least one network there are tew aspects to consider in case restoration fails:

1. Since there is a single API call to restore _all_ needed networks, this call
   either fails or succeeds. Although happily this happen transactionally, one
   network failure fails the whole restoration.
   This might be addresses by changing setupNetworks for the restoration case.
   This verb already knows if it was called during restoration or not 
   (_inRollback).
2. Network restoration takes place in a oneShot dependecy called vdsm-network.
   If this service fails as a result of network restoration failure, VDSM will
   not start.

as the first issue is worth a discussion on how to fix, the second makes sense, and 'fixing' he first will fix this scenario.

Comment 4 Ido Barkan 2015-08-02 10:38:44 UTC
We could try to call setupNetworks from the restoration script separately (one net/bond at a time) and be ready accept failures.

However, if the management network happens to fail (VDSM does not know which is it), the host will still be unmanageable.

Also, bond creation failures might cause subsequent network creation failures.

Comment 7 Michael Burman 2017-01-23 11:57:40 UTC
Verified on - vdsm-4.19.2-2.el7ev.x86_64 and 4.1.0.2-0.1.el7

Now, when one or some networks failed to be restored, vdsm continue even if one failed and vdsmd is running in the end. 

restore-net::ERROR::2017-01-23 13:54:49,110::__init__::51::root::(__exit__) Failed rollback transaction last known good network.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 153, in _setup_legacy
    bondings, _netinfo)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 464, in add_missing_networks
    _netinfo=_netinfo, **attrs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 182, in wrapped
    return func(network, configurator, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 243, in _add_network
    net_ent_to_configure.configure(**options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 187, in configure
    self.configurator.configureBridge(self, **opts)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 116, in configureBridge
    _ifup(bridge)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 927, in _ifup
    _exec_ifup(iface, cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 884, in _exec_ifup
    _exec_ifup_by_name(iface.name, cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 870, in _exec_ifup_by_name
    raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
ConfigNetworkError: (29, '')
restore-net::ERROR::2017-01-23 13:54:49,111::vdsm-restore-net-config::162::root::(_greedy_setup_nets) Failed to setup n11
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm-restore-net-config", line 160, in _greedy_setup_nets
    {'connectivityCheck': False, '_inRollback': True})
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 261, in setupNetworks
    _setup_networks(networks, bondings, options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 282, in _setup_networks
    netswitch.setup(networks, bondings, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 132, in setup
    _setup_legacy(legacy_nets, legacy_bonds, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 153, in _setup_legacy
    bondings, _netinfo)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 464, in add_missing_networks
    _netinfo=_netinfo, **attrs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 182, in wrapped
    return func(network, configurator, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 243, in _add_network
    net_ent_to_configure.configure(**options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 187, in configure
    self.configurator.configureBridge(self, **opts)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 116, in configureBridge
    _ifup(bridge)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 927, in _ifup
    _exec_ifup(iface, cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 884, in _exec_ifup
    _exec_ifup_by_name(iface.name, cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 870, in _exec_ifup_by_name
    raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
ConfigNetworkError: (29, '')
restore-net::INFO::2017-01-23 13:54:49,112::vdsm-restore-net-config::481::root::(restore) restoration completed successfully.


Note You need to log in before you can comment on or make changes to this bug.