Bug 1256446

Summary: OSError: [Errno 24] Too many open files while running automation tests
Product: Red Hat Enterprise Virtualization Manager Reporter: Meni Yakove <myakove>
Component: vdsmAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED ERRATA QA Contact: Meni Yakove <myakove>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: bazulay, gklein, lsurette, mgoldboi, myakove, oourfali, pkliczew, pstehlik, ycui, yeylon, ykaul
Target Milestone: ovirt-3.6.0-rc3Keywords: Automation, AutomationBlocker, ZStream
Target Release: 3.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: v4.17.5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1265965 (view as bug list) Environment:
Last Closed: 2016-03-09 19:44:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1265965    
Attachments:
Description Flags
engine logs
none
vdsm logs - host is host_mixed_1 - 10.35.128.28 none

Description Meni Yakove 2015-08-24 15:14:30 UTC
Description of problem:
While running automation tests all operations of setupNetworks fail 
error from the engine:
Status: 400
Reason: Bad Request
Detail: [Unexpected exception]

the fd reach 1025 and then we start getting this error.

ll /proc/8344/fd | wc -l
1025



Version-Release number of selected component (if applicable):
vdsm-4.17.3-1.el7ev.noarch
rhevm-3.6.0-0.11.master.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Run host_network_api tests few time or network tier1

Comment 1 Meni Yakove 2015-08-24 15:16:28 UTC
Created attachment 1066477 [details]
engine logs

Comment 2 Meni Yakove 2015-08-24 15:17:26 UTC
Created attachment 1066479 [details]
vdsm logs - host is host_mixed_1 - 10.35.128.28

Comment 3 Dima Kuznetsov 2015-08-24 15:31:35 UTC
I've looked at the VDSM logs. and VDSM runs out of its allowed 1024 file descriptors. 
Following the open FDs during several runs of the tests, VDSM is constantly leaking FDs at relatively steady pace when the tests are active, furthermore, leak is limited to a single type, VDSM is leaking TCP sockets.

I've tried to intercept its syscalls and I came across multiple accept(2) calls that never closed their descriptors during the whole time of the syscall trace (1~2 minutes), I'd suggest continuing the investigation there.

Comment 4 Piotr Kliczewski 2015-09-16 09:04:37 UTC
It seems that it still randomly happens. We need to determine the steps how to reproduce the issue again. It is related to setupNetworks BZ #1262051.

Please provide the steps to reproduce.

Comment 5 Moran Goldboim 2015-09-20 14:56:04 UTC
Marked as a GA blocker for now, since no clear repo steps and frequency seems to be down. not a beta1 blocker.

Comment 6 Piotr Kliczewski 2015-09-21 07:29:01 UTC
I have access to the env so working on it now.

Comment 9 Oved Ourfali 2015-09-24 11:49:54 UTC
This isn't a regression. Removing regression flag.
Cloned also to 3.5.Z.

Comment 11 errata-xmlrpc 2016-03-09 19:44:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html