Bug 1338818

Summary: vdsmd is not running and restore networks failed after server reboot
Product: [oVirt] vdsm Reporter: Michael Burman <mburman>
Component: CoreAssignee: Petr Horáček <phoracek>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.18.0CC: bugs, danken, gklein, phoracek
Target Milestone: ovirt-4.0.0-rcKeywords: Regression
Target Release: ---Flags: rule-engine: ovirt-4.0.0+
rule-engine: blocker+
rule-engine: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-05 07:56:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm logs none

Description Michael Burman 2016-05-23 13:26:41 UTC
Created attachment 1160630 [details]
vdsm logs

Description of problem:
vdsmd is not running and restore networks failed after server reboot.

supervdsm.log --> 

restore-net::ERROR::2016-05-23 15:03:35,105::__init__::54::root::(__exit__) Failed rollback transaction last known good network.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 130, in _setup_legacy
    bondings, _netinfo)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 471, in add_missing_networks
    _netinfo=_netinfo, **attrs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 180, in wrapped
    return func(network, configurator, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 250, in _add_network
    net_ent_to_configure.configure(**options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 186, in configure
    self.configurator.configureBridge(self, **opts)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 111, in configureBridge
    _ifup(bridge)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 846, in _ifup
    _exec_ifup(iface, cgroup)
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 991, in _wait_for_event
    mon.stop()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/monitor.py", line 164, in stop
    raise MonitorError(E_NOT_RUNNING)
MonitorError: 1
restore-net::INFO::2016-05-23 15:03:35,107::netconfpersistence::198::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/
restore-net::DEBUG::2016-05-23 15:03:35,107::netconfpersistence::193::root::(_clear_dir) No existent config to clear on /var/run/vdsm/netconf/bonds/
restore-net::ERROR::2016-05-23 15:03:35,107::vdsm-restore-net-config::447::root::(restore) unified restoration failed.
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm-restore-net-config", line 442, in restore
    unified_restoration()
  File "/usr/share/vdsm/vdsm-restore-net-config", line 142, in unified_restoration
    '_inRollback': True})
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 228, in setupNetworks
    _setup_networks(networks, bondings, options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 255, in _setup_networks
    netswitch.setup(networks, bondings, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 109, in setup
    _setup_legacy(legacy_nets, legacy_bonds, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 130, in _setup_legacy
    bondings, _netinfo)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 471, in add_missing_networks
    _netinfo=_netinfo, **attrs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 180, in wrapped
    return func(network, configurator, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 250, in _add_network
    net_ent_to_configure.configure(**options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 186, in configure
    self.configurator.configureBridge(self, **opts)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 111, in configureBridge
    _ifup(bridge)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 846, in _ifup
    _exec_ifup(iface, cgroup)
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 991, in _wait_for_event
    mon.stop()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/monitor.py", line 164, in stop
    raise MonitorError(E_NOT_RUNNING)
MonitorError: 1


Version-Release number of selected component (if applicable):
vdsm-4.18.0-16.git51df339.el7.centos.x86_64
4.0.0-0.7.master.el7ev

How reproducible:
Not clear, every second reboot on puma22.scl.lab.tlv.redhat.com HW for example.

Steps to Reproduce:
1. Install host 4.0 in engine 4.0(latest downstream)
2. Reboot server


Actual results:
Failed rollback transaction last known good network
unified restoration failed
vdsmd is nor running and host stays in non-responsive state in the engine

Expected results:
Should work as expected, restore-net should work and vdsmd must run after reboots

Additional info:

Comment 1 Michael Burman 2016-06-07 09:12:48 UTC
Verified on - 4.0.0.2-0.1.el7ev and vdsm-4.18.1-11.gita92976e.el7ev.x86_64

Comment 2 Sandro Bonazzola 2016-07-05 07:56:57 UTC
oVirt 4.0.0 has been released, closing current release.