Bug 1256446 - OSError: [Errno 24] Too many open files while running automation tests
Summary: OSError: [Errno 24] Too many open files while running automation tests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-3.6.0-rc3
: 3.6.0
Assignee: Piotr Kliczewski
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks: 1265965
TreeView+ depends on / blocked
 
Reported: 2015-08-24 15:14 UTC by Meni Yakove
Modified: 2016-03-09 19:44 UTC (History)
11 users (show)

Fixed In Version: v4.17.5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1265965 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:44:00 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine logs (433.72 KB, application/x-bzip)
2015-08-24 15:16 UTC, Meni Yakove
no flags Details
vdsm logs - host is host_mixed_1 - 10.35.128.28 (393.63 KB, application/x-bzip)
2015-08-24 15:17 UTC, Meni Yakove
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0362 0 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 45615 0 master MERGED jsonrpc: fd leak Never
oVirt gerrit 45714 0 ovirt-3.6 MERGED jsonrpc: fd leak Never
oVirt gerrit 46620 0 master MERGED subscription: use different ids for different subscription Never
oVirt gerrit 46625 0 master MERGED ssl: ssl socket may throw sslerror during reading Never
oVirt gerrit 47358 0 ovirt-3.6 MERGED ssl: ssl socket may throw sslerror during reading Never

Description Meni Yakove 2015-08-24 15:14:30 UTC
Description of problem:
While running automation tests all operations of setupNetworks fail 
error from the engine:
Status: 400
Reason: Bad Request
Detail: [Unexpected exception]

the fd reach 1025 and then we start getting this error.

ll /proc/8344/fd | wc -l
1025



Version-Release number of selected component (if applicable):
vdsm-4.17.3-1.el7ev.noarch
rhevm-3.6.0-0.11.master.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Run host_network_api tests few time or network tier1

Comment 1 Meni Yakove 2015-08-24 15:16:28 UTC
Created attachment 1066477 [details]
engine logs

Comment 2 Meni Yakove 2015-08-24 15:17:26 UTC
Created attachment 1066479 [details]
vdsm logs - host is host_mixed_1 - 10.35.128.28

Comment 3 Dima Kuznetsov 2015-08-24 15:31:35 UTC
I've looked at the VDSM logs. and VDSM runs out of its allowed 1024 file descriptors. 
Following the open FDs during several runs of the tests, VDSM is constantly leaking FDs at relatively steady pace when the tests are active, furthermore, leak is limited to a single type, VDSM is leaking TCP sockets.

I've tried to intercept its syscalls and I came across multiple accept(2) calls that never closed their descriptors during the whole time of the syscall trace (1~2 minutes), I'd suggest continuing the investigation there.

Comment 4 Piotr Kliczewski 2015-09-16 09:04:37 UTC
It seems that it still randomly happens. We need to determine the steps how to reproduce the issue again. It is related to setupNetworks BZ #1262051.

Please provide the steps to reproduce.

Comment 5 Moran Goldboim 2015-09-20 14:56:04 UTC
Marked as a GA blocker for now, since no clear repo steps and frequency seems to be down. not a beta1 blocker.

Comment 6 Piotr Kliczewski 2015-09-21 07:29:01 UTC
I have access to the env so working on it now.

Comment 9 Oved Ourfali 2015-09-24 11:49:54 UTC
This isn't a regression. Removing regression flag.
Cloned also to 3.5.Z.

Comment 11 errata-xmlrpc 2016-03-09 19:44:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.