Description of problem: While running automation tests all operations of setupNetworks fail error from the engine: Status: 400 Reason: Bad Request Detail: [Unexpected exception] the fd reach 1025 and then we start getting this error. ll /proc/8344/fd | wc -l 1025 Version-Release number of selected component (if applicable): vdsm-4.17.3-1.el7ev.noarch rhevm-3.6.0-0.11.master.el6.noarch How reproducible: 100% Steps to Reproduce: 1. Run host_network_api tests few time or network tier1
Created attachment 1066477 [details] engine logs
Created attachment 1066479 [details] vdsm logs - host is host_mixed_1 - 10.35.128.28
I've looked at the VDSM logs. and VDSM runs out of its allowed 1024 file descriptors. Following the open FDs during several runs of the tests, VDSM is constantly leaking FDs at relatively steady pace when the tests are active, furthermore, leak is limited to a single type, VDSM is leaking TCP sockets. I've tried to intercept its syscalls and I came across multiple accept(2) calls that never closed their descriptors during the whole time of the syscall trace (1~2 minutes), I'd suggest continuing the investigation there.
It seems that it still randomly happens. We need to determine the steps how to reproduce the issue again. It is related to setupNetworks BZ #1262051. Please provide the steps to reproduce.
Marked as a GA blocker for now, since no clear repo steps and frequency seems to be down. not a beta1 blocker.
I have access to the env so working on it now.
This isn't a regression. Removing regression flag. Cloned also to 3.5.Z.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html