Bug 988004 - [vdsm] OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/brif'
[vdsm] OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgm...
Status: CLOSED NEXTRELEASE
Product: oVirt
Classification: Community
Component: vdsm (Show other bugs)
3.3
x86_64 Linux
unspecified Severity urgent
: ---
: 3.3
Assigned To: Assaf Muller
Haim
network
:
Depends On: 988995
Blocks: 918494
  Show dependency treegraph
 
Reported: 2013-07-24 10:15 EDT by Martin Pavlik
Modified: 2015-04-06 23:10 EDT (History)
9 users (show)

See Also:
Fixed In Version: 4.12.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-07 10:22:21 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log_collector (748.58 KB, application/x-xz)
2013-07-24 10:15 EDT, Martin Pavlik
no flags Details
Events aggregated from messages, vdsm and supervdsm logs (3.11 KB, text/plain)
2013-07-28 10:19 EDT, Assaf Muller
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 17487 None None None Never

  None (edit)
Description Martin Pavlik 2013-07-24 10:15:04 EDT
Created attachment 777806 [details]
log_collector

Description of problem:
When adding new (freshly installed) host (F19) to clean setup. Host install fails with following reason in GUI:
Host dell-08 installation failed. Failed to configure manamgent network on the host.

Version-Release number of selected component (if applicable):
engine:
oVirt Engine Version: 3.3.0-0.3.beta1.fc19 (running on F19)

host:
vdsm-4.12.0-0.1.rc3.fc19

How reproducible:
100%

Steps to Reproduce:
1. add clean host to ovirt

Actual results:
install fails

Expected results:
install successfull

Additional info:

vdsm.log 

Thread-35::INFO::2013-07-24 15:50:18,179::netinfo::428::root::(_getNetInfo) Obtaining info for net ovirtmgmt.
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 411, in _getNetInfo
    data.update({'ports': ports(iface), 'stp': bridge_stp_state(iface),
  File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 181, in ports
    return os.listdir('/sys/class/net/' + bridge + '/brif')
OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/brif'
Comment 1 Dan Kenigsberg 2013-07-24 11:35:49 EDT
vdsm dies while doing the initial setupNetwork:

Thread-15::DEBUG::2013-07-24 15:48:31,146::BindingXMLRPC::979::vds::(wrapper) client [10.34.63.66]::call ping with () {}
Thread-15::DEBUG::2013-07-24 15:48:31,146::BindingXMLRPC::986::vds::(wrapper) return ping with {'status': {'message': 'Done', 'code': 0}}
Thread-16::DEBUG::2013-07-24 15:48:31,148::BindingXMLRPC::979::vds::(wrapper) client [10.34.63.66]::call setupNetworks with ({'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {}
MainThread::INFO::2013-07-24 15:48:33,896::vdsm::101::vds::(run) (PID: 9487) I am the actual vdsm 4.12.0-0.1.rc3.fc19 dell-r210ii-08.rhev.lab.eng.brq.redhat.com (3.9.5-301.fc19.x86_64)


There's a seeming race between setupNetwork and the response from dhclient, but this on its own right does not explain the vdsm crash.

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,149::supervdsmServer::88::SuperVdsm.ServerCallback::(wrapper) calling to setupNetworks with ({'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,149::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,151::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,152::libvirtconnection::124::libvirtconnection::(get) trying to connect libvirt
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,165::configNetwork::480::setupNetworks::(setupNetworks) Setting up network according to configuration: networks:{'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, bondings:{}, options:{'connectivityCheck': 'true', 'connectivityTimeout': 120}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,166::configNetwork::484::root::(setupNetworks) Validating configuration
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,166::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,168::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,172::configNetwork::487::setupNetworks::(setupNetworks) Applying...
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,173::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,175::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,179::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,181::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,186::configNetwork::524::setupNetworks::(setupNetworks) Adding network 'ovirtmgmt'
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,186::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,188::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,192::configNetwork::174::root::(addNetwork) validating network...
MainProcess|Thread-16::INFO::2013-07-24 15:48:31,192::configNetwork::188::root::(addNetwork) Adding network ovirtmgmt with vlan=None, bonding=None, nics=['em1'], bondingOptions=None, mtu=None, bridged=True, options={'bootproto': 'dhcp', 'STP': 'no', 'implicitBonding': True}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,193::ifcfg::362::root::(_persistentBackup) backing up ifcfg-ovirtmgmt: # original file did not exist

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,193::ifcfg::233::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt (until next 'set safe config')
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,195::ifcfg::644::Storage.Misc.excCmd::(ifdown) '/usr/sbin/ifdown ovirtmgmt' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,345::ifcfg::644::Storage.Misc.excCmd::(ifdown) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,345::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,347::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,352::ifcfg::289::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-em1
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,352::ifcfg::362::root::(_persistentBackup) backing up ifcfg-em1: # Generated by parse-kickstart
IPV6INIT=no
BOOTPROTO=dhcp
DEVICE=em1
ONBOOT=yes
UUID=c7b808a5-a299-4fa6-a488-69028ec2f56a
HWADDR=D0:67:E5:F0:7F:2F
TYPE=Ethernet
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
NAME="System em1"

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,353::ifcfg::233::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-em1 (until next 'set safe config')
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,353::ifcfg::644::Storage.Misc.excCmd::(ifdown) '/usr/sbin/ifdown em1' (cwd None)
MainThread::DEBUG::2013-07-24 15:48:31,417::sourceRouteThread::18::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1374673711
MainThread::INFO::2013-07-24 15:48:31,418::sourceRouteThread::35::root::(process_IN_CLOSE_WRITE_filePath) interface em1 is not a libvirt interface
MainThread::WARNING::2013-07-24 15:48:31,419::utils::113::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/em1 already removed
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,561::ifcfg::644::Storage.Misc.excCmd::(ifdown) SUCCESS: <err> = 'bridge ovirtmgmt does not exist!\n'; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,561::ifcfg::651::Storage.Misc.excCmd::(_ifup) '/usr/sbin/ifup em1' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,714::ifcfg::651::Storage.Misc.excCmd::(_ifup) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,715::ifcfg::651::Storage.Misc.excCmd::(_ifup) '/usr/sbin/ifup ovirtmgmt' (cwd None)
MainProcess|storageRefresh::DEBUG::2013-07-24 15:48:32,376::misc::817::SamplingMethod::(__call__) Returning last result
MainProcess::DEBUG::2013-07-24 15:48:34,224::supervdsmServer::88::SuperVdsm.ServerCallback::(wrapper) calling to readMultipathConf with () {}


==============================
Unrelated rant: the introduction of the

MainProcess|Thread-16::DEBUG::2013-07-24 15:49:15,045::libvirtconnection::101::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 9 edom: 19 level: 2 message: operation failed: network 'vdsm-ovirtmgmt' already exists with uuid 9973dc75-a45a-4c84-b779-53041e899745

log by http://gerrit.ovirt.org/13990 is annoying an misleading.

==============================
Another unrelated rant: supervdsm.log should include the result of its functions, not only its calls.
Comment 2 Assaf Muller 2013-07-28 10:19:21 EDT
Created attachment 779367 [details]
Events aggregated from messages, vdsm and supervdsm logs

I reproduced the bug on a fresh F19 installation.

Two import distinctions:
1) It has NetworkManager on by default, like the host in the original bug report. However: I used a newer version of host_deploy which turns NetworkManager off. The bug still happened.
2) I disabled the source routing dhclient hook's ability to write to /var/run/vdsm/sourceRoutes, in effect neutralizing sourceRouteThread (A thread spawned by supervdsm which configures source routing). The bug still happened.

I added a file that aggregates all the events I find important from the messages, vdsm and supervdsm logs.

What I know so far:
1) VDSM is stopped immediately after its first successful initialization and started again.
2) The initial setupNetworks that configures the ovirtmgmt bridge fails and rolls back the config due to a libvirt exception as detailed in the log Martin attached.
Comment 3 Assaf Muller 2013-07-29 11:32:22 EDT
The cause has been discovered and a fix is being worked on by Yeela Kaplan.
Comment 4 Dan Kenigsberg 2013-07-31 07:13:24 EDT
Please verify that the solution for bug 988995 by vdsm 4.12.0 fixes this issue.

Note You need to log in before you can comment on or make changes to this bug.