Bug 988004

Summary: [vdsm] OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/brif'
Product: [Retired] oVirt Reporter: Martin Pavlik <mpavlik>
Component: vdsmAssignee: Assaf Muller <amuller>
Status: CLOSED NEXTRELEASE QA Contact: Haim <hateya>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.3CC: abaron, acathrow, bazulay, danken, gklein, iheim, mgoldboi, mpavlik, yeylon
Target Milestone: ---   
Target Release: 3.3   
Hardware: x86_64   
OS: Linux   
Whiteboard: network
Fixed In Version: 4.12.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-07 14:22:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 988995    
Bug Blocks: 918494    
Attachments:
Description Flags
log_collector
none
Events aggregated from messages, vdsm and supervdsm logs none

Description Martin Pavlik 2013-07-24 14:15:04 UTC
Created attachment 777806 [details]
log_collector

Description of problem:
When adding new (freshly installed) host (F19) to clean setup. Host install fails with following reason in GUI:
Host dell-08 installation failed. Failed to configure manamgent network on the host.

Version-Release number of selected component (if applicable):
engine:
oVirt Engine Version: 3.3.0-0.3.beta1.fc19 (running on F19)

host:
vdsm-4.12.0-0.1.rc3.fc19

How reproducible:
100%

Steps to Reproduce:
1. add clean host to ovirt

Actual results:
install fails

Expected results:
install successfull

Additional info:

vdsm.log 

Thread-35::INFO::2013-07-24 15:50:18,179::netinfo::428::root::(_getNetInfo) Obtaining info for net ovirtmgmt.
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 411, in _getNetInfo
    data.update({'ports': ports(iface), 'stp': bridge_stp_state(iface),
  File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 181, in ports
    return os.listdir('/sys/class/net/' + bridge + '/brif')
OSError: [Errno 2] No such file or directory: '/sys/class/net/ovirtmgmt/brif'

Comment 1 Dan Kenigsberg 2013-07-24 15:35:49 UTC
vdsm dies while doing the initial setupNetwork:

Thread-15::DEBUG::2013-07-24 15:48:31,146::BindingXMLRPC::979::vds::(wrapper) client [10.34.63.66]::call ping with () {}
Thread-15::DEBUG::2013-07-24 15:48:31,146::BindingXMLRPC::986::vds::(wrapper) return ping with {'status': {'message': 'Done', 'code': 0}}
Thread-16::DEBUG::2013-07-24 15:48:31,148::BindingXMLRPC::979::vds::(wrapper) client [10.34.63.66]::call setupNetworks with ({'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {}
MainThread::INFO::2013-07-24 15:48:33,896::vdsm::101::vds::(run) (PID: 9487) I am the actual vdsm 4.12.0-0.1.rc3.fc19 dell-r210ii-08.rhev.lab.eng.brq.redhat.com (3.9.5-301.fc19.x86_64)


There's a seeming race between setupNetwork and the response from dhclient, but this on its own right does not explain the vdsm crash.

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,149::supervdsmServer::88::SuperVdsm.ServerCallback::(wrapper) calling to setupNetworks with ({'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,149::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,151::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,152::libvirtconnection::124::libvirtconnection::(get) trying to connect libvirt
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,165::configNetwork::480::setupNetworks::(setupNetworks) Setting up network according to configuration: networks:{'ovirtmgmt': {'nic': 'em1', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, bondings:{}, options:{'connectivityCheck': 'true', 'connectivityTimeout': 120}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,166::configNetwork::484::root::(setupNetworks) Validating configuration
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,166::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,168::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,172::configNetwork::487::setupNetworks::(setupNetworks) Applying...
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,173::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,175::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,179::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,181::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,186::configNetwork::524::setupNetworks::(setupNetworks) Adding network 'ovirtmgmt'
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,186::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,188::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,192::configNetwork::174::root::(addNetwork) validating network...
MainProcess|Thread-16::INFO::2013-07-24 15:48:31,192::configNetwork::188::root::(addNetwork) Adding network ovirtmgmt with vlan=None, bonding=None, nics=['em1'], bondingOptions=None, mtu=None, bridged=True, options={'bootproto': 'dhcp', 'STP': 'no', 'implicitBonding': True}
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,193::ifcfg::362::root::(_persistentBackup) backing up ifcfg-ovirtmgmt: # original file did not exist

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,193::ifcfg::233::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt (until next 'set safe config')
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,195::ifcfg::644::Storage.Misc.excCmd::(ifdown) '/usr/sbin/ifdown ovirtmgmt' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,345::ifcfg::644::Storage.Misc.excCmd::(ifdown) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,345::utils::486::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,347::utils::505::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,352::ifcfg::289::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-em1
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,352::ifcfg::362::root::(_persistentBackup) backing up ifcfg-em1: # Generated by parse-kickstart
IPV6INIT=no
BOOTPROTO=dhcp
DEVICE=em1
ONBOOT=yes
UUID=c7b808a5-a299-4fa6-a488-69028ec2f56a
HWADDR=D0:67:E5:F0:7F:2F
TYPE=Ethernet
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
NAME="System em1"

MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,353::ifcfg::233::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/ifcfg-em1 (until next 'set safe config')
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,353::ifcfg::644::Storage.Misc.excCmd::(ifdown) '/usr/sbin/ifdown em1' (cwd None)
MainThread::DEBUG::2013-07-24 15:48:31,417::sourceRouteThread::18::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1374673711
MainThread::INFO::2013-07-24 15:48:31,418::sourceRouteThread::35::root::(process_IN_CLOSE_WRITE_filePath) interface em1 is not a libvirt interface
MainThread::WARNING::2013-07-24 15:48:31,419::utils::113::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/em1 already removed
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,561::ifcfg::644::Storage.Misc.excCmd::(ifdown) SUCCESS: <err> = 'bridge ovirtmgmt does not exist!\n'; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,561::ifcfg::651::Storage.Misc.excCmd::(_ifup) '/usr/sbin/ifup em1' (cwd None)
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,714::ifcfg::651::Storage.Misc.excCmd::(_ifup) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::DEBUG::2013-07-24 15:48:31,715::ifcfg::651::Storage.Misc.excCmd::(_ifup) '/usr/sbin/ifup ovirtmgmt' (cwd None)
MainProcess|storageRefresh::DEBUG::2013-07-24 15:48:32,376::misc::817::SamplingMethod::(__call__) Returning last result
MainProcess::DEBUG::2013-07-24 15:48:34,224::supervdsmServer::88::SuperVdsm.ServerCallback::(wrapper) calling to readMultipathConf with () {}


==============================
Unrelated rant: the introduction of the

MainProcess|Thread-16::DEBUG::2013-07-24 15:49:15,045::libvirtconnection::101::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 9 edom: 19 level: 2 message: operation failed: network 'vdsm-ovirtmgmt' already exists with uuid 9973dc75-a45a-4c84-b779-53041e899745

log by http://gerrit.ovirt.org/13990 is annoying an misleading.

==============================
Another unrelated rant: supervdsm.log should include the result of its functions, not only its calls.

Comment 2 Assaf Muller 2013-07-28 14:19:21 UTC
Created attachment 779367 [details]
Events aggregated from messages, vdsm and supervdsm logs

I reproduced the bug on a fresh F19 installation.

Two import distinctions:
1) It has NetworkManager on by default, like the host in the original bug report. However: I used a newer version of host_deploy which turns NetworkManager off. The bug still happened.
2) I disabled the source routing dhclient hook's ability to write to /var/run/vdsm/sourceRoutes, in effect neutralizing sourceRouteThread (A thread spawned by supervdsm which configures source routing). The bug still happened.

I added a file that aggregates all the events I find important from the messages, vdsm and supervdsm logs.

What I know so far:
1) VDSM is stopped immediately after its first successful initialization and started again.
2) The initial setupNetworks that configures the ovirtmgmt bridge fails and rolls back the config due to a libvirt exception as detailed in the log Martin attached.

Comment 3 Assaf Muller 2013-07-29 15:32:22 UTC
The cause has been discovered and a fix is being worked on by Yeela Kaplan.

Comment 4 Dan Kenigsberg 2013-07-31 11:13:24 UTC
Please verify that the solution for bug 988995 by vdsm 4.12.0 fixes this issue.