Bug 470870 - condor_schedd running out of file descriptors
Summary: condor_schedd running out of file descriptors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid
Version: 1.0
Hardware: All
OS: Linux
high
urgent
Target Milestone: 1.1
: ---
Assignee: Ted Ross
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-10 18:08 UTC by Matthew Farrellee
Modified: 2009-02-04 16:05 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-04 16:05:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0036 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 1.1 Release 2009-02-04 16:03:49 UTC

Description Matthew Farrellee 2008-11-10 18:08:05 UTC
Description of problem:

The condor_schedd runs out of file descriptors and EXCEPTs, via dprintf.


Version-Release number of selected component (if applicable):

7.0.4-0.4 (with qmf-plugins)


How reproducible:

Always


Steps to Reproduce:
1. run the condor_schedd with plugins enabled
2. submit 512 jobs
3. remove 512 jobs

  
Actual results:

condor_schedd EXCEPTs with a message in the SchedLog: **** PANIC -- OUT OF FILE DESCRIPTORS at line 783 in dprintf.c


Expected results:

A working schedd...


Additional info:

This appears to only happen when the QMF plugins are loaded.

In such a case /proc/<schedd pid>/fd can be ls'd to see many sockets open. Also lsof | grep <schedd pid> lists the sockets.

Comment 1 Matthew Farrellee 2008-11-10 18:15:22 UTC
qpidc is r711740

Comment 2 Ted Ross 2008-11-10 18:25:21 UTC
I can reproduce a similar symptom when running the example qmf-agent with no
available broker.

It appears that when the connection fails (connection-refused), the FD is not
reclaimed and is not reused.

Comment 3 Ted Ross 2008-11-10 18:41:46 UTC
More specific information:

In the c++ client, when Connection.open() fails (i.e. throws and exception), it appears to leak a file descriptor.

Calling Connection.close() in the exception handler does not solve the problem.

Comment 4 Ted Ross 2008-11-11 20:18:36 UTC
To verify, run the qmf-agent example with no running broker.  qmf-agent will continually attempt to connect to the broker.  Use "/usr/sbin/lsof | grep qmf" to see if there are an increasing number of file descriptors allocated to the qmf-agent process.  The number of FDs should remain constant.

Comment 7 errata-xmlrpc 2009-02-04 16:05:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html


Note You need to log in before you can comment on or make changes to this bug.