Bug 1466628

Summary: SetupNetworks failed after sometime with 'IOError: [Errno 5] Out of memory' and 'OSError: [Errno 9] Bad file descriptor'
Product: [oVirt] vdsm Reporter: Meni Yakove <myakove>
Component: SuperVDSMAssignee: Edward Haas <edwardh>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meni Yakove <myakove>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.20.0CC: bugs, myakove, ylavi
Target Milestone: ---Keywords: Automation, Regression
Target Release: ---Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-05 07:44:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm,supervdsm,messages and journalctl none

Description Meni Yakove 2017-06-30 05:49:56 UTC
Created attachment 1293088 [details]
vdsm,supervdsm,messages and journalctl

Description of problem:
After a few Setupnetworks calls, SetupNetworks failed with:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 95, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 202, in setupNetworks
    _setup_networks(networks, bondings, options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 223, in _setup_networks
    netswitch.configurator.setup(networks, bondings, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 137, in setup
    _setup_legacy(legacy_nets, legacy_bonds, options, in_rollback)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 151, in _setup_legacy
    _netinfo)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 331, in remove_networks
    keep_bridge=keep_bridge)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 160, in wrapped
    return func(network, configurator, net_info, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 291, in _del_network
    net_ent_to_remove.remove()
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 193, in remove
    self.configurator.removeBridge(self)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 213, in removeBridge
    bridge.port.remove()
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 262, in remove
    self.configurator.removeBond(self)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 275, in removeBond
    bonding.configure()
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 244, in configure
    self.configurator.configureBond(self, **opts)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 131, in configureBond
    _ifup(bond)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 920, in _ifup
    _exec_ifup(iface, cgroup)
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/waitfor.py", line 41, in waitfor_linkup
    return
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/monitor.py", line 129, in __exit__
    self.stop()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/monitor.py", line 175, in stop
    os.write(self._pipetrick[1], b'c')
OSError: [Errno 9] Bad file descriptor

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 185, in run
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/monitor.py", line 167, in _scan
    libnl.nl_recvmsgs_default(sock)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/libnl.py", line 437, in nl_recvmsgs_default
    raise IOError(-err, nl_geterror(err))
IOError: [Errno 5] Out of memory


Version-Release number of selected component (if applicable):
vdsm-4.20.1-89.git2912779.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1.host_network_api automation test (Many SetupNetworks calls via rest)

Comment 1 Meni Yakove 2017-07-03 07:49:22 UTC
Working vdsm version: vdsm-4.20.1-66.git228c7be.el7.centos.x86_64
Non working vdsm version: vdsm-4.20.1-89.git2912779.el7.centos.x86_64

Comment 2 Edward Haas 2017-07-03 12:22:52 UTC
I'm unsure if this will help, but the following patch: https://gerrit.ovirt.org/#/c/78928
Is increasing the buffer size for the netlink socket.

Could you please try to see if this help with the reported problem?

Comment 3 Meni Yakove 2017-07-04 14:29:21 UTC
I can't reproduce this one on latest master vdsm-4.20.1-120.git28558d7.el7.centos.x86_64

All our tests passed.
Attach 100 networks to host NIC passed.

I'm fine with closing this bug until next time.