Bug 988004 - [vdsm] OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/brif'
Summary: [vdsm] OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgm...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.3
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.3
Assignee: Assaf Muller
QA Contact: Haim
URL:
Whiteboard: network
Depends On: 988995
Blocks: 918494
TreeView+ depends on / blocked
 
Reported: 2013-07-24 14:15 UTC by Martin Pavlik
Modified: 2015-04-07 03:10 UTC (History)
9 users (show)

Fixed In Version: 4.12.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-07 14:22:21 UTC
oVirt Team: ---


Attachments (Terms of Use)
log_collector (748.58 KB, application/x-xz)
2013-07-24 14:15 UTC, Martin Pavlik
no flags Details
Events aggregated from messages, vdsm and supervdsm logs (3.11 KB, text/plain)
2013-07-28 14:19 UTC, Assaf Muller
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 17487 None None None Never

Description Martin Pavlik 2013-07-24 14:15:04 UTC
Created attachment 777806 [details]
log_collector

Description of problem:
When adding new (freshly installed) host (F19) to clean setup. Host install fails with following reason in GUI:
Host dell-08 installation failed. Failed to configure manamgent network on the host.

Version-Release number of selected component (if applicable):
engine:
oVirt Engine Version: 3.3.0-0.3.beta1.fc19 (running on F19)

host:
vdsm-4.12.0-0.1.rc3.fc19

How reproducible:
100%

Steps to Reproduce:
1. add clean host to ovirt

Actual results:
install fails

Expected results:
install successfull

Additional info:

vdsm.log 

Thread-35::INFO::2013-07-24 15:50:18,179::netinfo::428::root::(_getNetInfo) Obtaining info for net ovirtmgmt.
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 411, in _getNetInfo
    data.update({'ports': ports(iface), 'stp': bridge_stp_state(iface),
  File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 181, in ports
    return os.listdir('/sys/class/net/' + bridge + '/brif')
OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/brif'

Comment 1 Dan Kenigsberg 2013-07-24 15:35:49 UTC
vdsm dies while doing the initial setupNetwork:

Thread-15::DEBUG::2013-07-24 15:48:31,146::BindingXMLRPC::979::vds::(wrapper) client [10.34.63.66]::call ping with () {}
Thread-15::DEBUG::2013-07-24 15:48:31,146::BindingXMLRPC::986::vds::(wrapper) return ping with {'status': {'message': 'Done', 'code': 0}}
Thread-16::DEBUG::2013-07-24 15:48:31,148::BindingXMLRPC::979::vds::(wrapper) client [10.34.63.66]::call setupNetworks with ({'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {}
MainThread::INFO::2013-07-24 15:48:33,896::vdsm::101::vds::(run) (PID: 9487) I am the actual vdsm 4.12.0-0.1.rc3.fc19 dell-r210ii-08.rhev.lab.eng.brq.redhat.com (3.9.5-301.fc19.x86_64)


There's a seeming race between setupNetwork and the response from dhclient, but this on its own right does not explain the vdsm crash.

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,149::supervdsmServer::88::SuperVdsm.ServerCallback::(wrapper) calling to setupNetworks with ({'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,149::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,151::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,152::libvirtconnection::124::libvirtconnection::(get) trying to connect libvirt
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,165::configNetwork::480::setupNetworks::(setupNetworks) Setting up network according to configuration: networks:{'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, bondings:{}, options:{'connectivityCheck': 'true', 'connectivityTimeout': 120}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,166::configNetwork::484::root::(setupNetworks) Validating configuration
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,166::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,168::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,172::configNetwork::487::setupNetworks::(setupNetworks) Applying...
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,173::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,175::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,179::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,181::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,186::configNetwork::524::setupNetworks::(setupNetworks) Adding network 'ovirtmgmt'
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,186::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,188::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,192::configNetwork::174::root::(addNetwork) validating network...
MainProcess|Thread-16::INFO::2013-07-24 15:48:31,192::configNetwork::188::root::(addNetwork) Adding network ovirtmgmt with vlan=None, bonding=None, nics=['em1'], bondingOptions=None, mtu=None, bridged=True, options={'bootproto': 'dhcp', 'STP': 'no', 'implicitBonding': True}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,193::ifcfg::362::root::(_persistentBackup) backing up ifcfg-ovirtmgmt: # original file did not exist

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,193::ifcfg::233::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt (until next 'set safe config')
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,195::ifcfg::644::Storage.Misc.excCmd::(ifdown) '/usr/sbin/ifdown ovirtmgmt' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,345::ifcfg::644::Storage.Misc.excCmd::(ifdown) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,345::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,347::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,352::ifcfg::289::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-em1
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,352::ifcfg::362::root::(_persistentBackup) backing up ifcfg-em1: # Generated by parse-kickstart
IPV6INIT=no
BOOTPROTO=dhcp
DEVICE=em1
ONBOOT=yes
UUID=c7b808a5-a299-4fa6-a488-69028ec2f56a
HWADDR=D0:67:E5:F0:7F:2F
TYPE=Ethernet
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
NAME="System em1"

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,353::ifcfg::233::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-em1 (until next 'set safe config')
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,353::ifcfg::644::Storage.Misc.excCmd::(ifdown) '/usr/sbin/ifdown em1' (cwd None)
MainThread::DEBUG::2013-07-24 15:48:31,417::sourceRouteThread::18::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1374673711
MainThread::INFO::2013-07-24 15:48:31,418::sourceRouteThread::35::root::(process_IN_CLOSE_WRITE_filePath) interface em1 is not a libvirt interface
MainThread::WARNING::2013-07-24 15:48:31,419::utils::113::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/em1 already removed
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,561::ifcfg::644::Storage.Misc.excCmd::(ifdown) SUCCESS: <err> = 'bridge ovirtmgmt does not exist!\n'; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,561::ifcfg::651::Storage.Misc.excCmd::(_ifup) '/usr/sbin/ifup em1' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,714::ifcfg::651::Storage.Misc.excCmd::(_ifup) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,715::ifcfg::651::Storage.Misc.excCmd::(_ifup) '/usr/sbin/ifup ovirtmgmt' (cwd None)
MainProcess|storageRefresh::DEBUG::2013-07-24 15:48:32,376::misc::817::SamplingMethod::(__call__) Returning last result
MainProcess::DEBUG::2013-07-24 15:48:34,224::supervdsmServer::88::SuperVdsm.ServerCallback::(wrapper) calling to readMultipathConf with () {}


==============================
Unrelated rant: the introduction of the

MainProcess|Thread-16::DEBUG::2013-07-24 15:49:15,045::libvirtconnection::101::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 9 edom: 19 level: 2 message: operation failed: network 'vdsm-ovirtmgmt' already exists with uuid 9973dc75-a45a-4c84-b779-53041e899745

log by http://gerrit.ovirt.org/13990 is annoying an misleading.

==============================
Another unrelated rant: supervdsm.log should include the result of its functions, not only its calls.

Comment 2 Assaf Muller 2013-07-28 14:19:21 UTC
Created attachment 779367 [details]
Events aggregated from messages, vdsm and supervdsm logs

I reproduced the bug on a fresh F19 installation.

Two import distinctions:
1) It has NetworkManager on by default, like the host in the original bug report. However: I used a newer version of host_deploy which turns NetworkManager off. The bug still happened.
2) I disabled the source routing dhclient hook's ability to write to /var/run/vdsm/sourceRoutes, in effect neutralizing sourceRouteThread (A thread spawned by supervdsm which configures source routing). The bug still happened.

I added a file that aggregates all the events I find important from the messages, vdsm and supervdsm logs.

What I know so far:
1) VDSM is stopped immediately after its first successful initialization and started again.
2) The initial setupNetworks that configures the ovirtmgmt bridge fails and rolls back the config due to a libvirt exception as detailed in the log Martin attached.

Comment 3 Assaf Muller 2013-07-29 15:32:22 UTC
The cause has been discovered and a fix is being worked on by Yeela Kaplan.

Comment 4 Dan Kenigsberg 2013-07-31 11:13:24 UTC
Please verify that the solution for bug 988995 by vdsm 4.12.0 fixes this issue.


Note You need to log in before you can comment on or make changes to this bug.