Bug 428837

Summary: leaking file descriptors
Product: Red Hat Enterprise Linux 5 Reporter: Jack Neely <jjneely>
Component: nss_ldapAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: carenas, cward, dash, dslehman, jbastian, jbourne, jplans, kajtzu, mhuth, mmarcini, ohudlick, sputhenp, syeghiay, tao, vfalico
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 253-18.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 491419 (view as bug list) Environment:
Last Closed: 2009-09-02 11:49:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Debugging information from a busted machine.
none
This looks like it will address this bug...have not verified yet none

Description Jack Neely 2008-01-15 15:45:16 UTC
Description of problem:
nscd against an LDAP directory service appears to leak file descriptors like
crazy.  Eventually, nscd hits the 1024 limit and goes into a loop consuming 100%
CPU attempting to reconnect to the LDAP server and getting EMFILE.  It appears
that the leaking is triggered by having to reconnect to the LDAP server.

Test machine marvin has just rebooted.  The lsof shows that nscd has no leaked
FDs and a TCP connection to the ldap server.

My workstation, anduril shows that nscd is using over 1,000 FDs show as:

   nscd      24468      nscd 1001u     sock                0,5            
11243699 can't identify protocol

and then a TCP connection to the LDAP server.

Busted machine, uni01svn's lsof shows lots of the above and the last file
descriptor is 1023.  Its done, it can't spare an FD to connect to the LDAP server.

Version-Release number of selected component (if applicable):
[root@uni01svn log]# rpm -q nscd
nscd-2.5-18.el5_1.1

[root@uni01svn log]# rpm -q nss_ldap
nss_ldap-253-5.el5


How reproducible:
This seems to build as LDAP reconnections are required.

Comment 1 Jack Neely 2008-01-15 15:45:16 UTC
Created attachment 291718 [details]
Debugging information from a busted machine.

Comment 2 Jack Neely 2008-01-15 15:46:56 UTC
My /etc/ldap.conf looks like this:

base dc=ncsu,dc=edu
timelimit 120
bind_timelimit 120
idle_timelimit 3600
nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon
uri ldap://ldap.ncsu.edu/
ssl no
tls_cacertdir /etc/openldap/cacerts
pam_password md5


Comment 3 Jack Neely 2008-01-15 15:51:20 UTC
nscd has also consumed 254MB of virtual memory

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
26331 nscd      25   0  254m 137m 1224 R   99  6.8   1155:54 nscd 

Comment 4 Jack Neely 2008-03-18 20:19:01 UTC
Created attachment 298449 [details]
This looks like it will address this bug...have not verified yet

Comment 5 Jack Neely 2008-03-18 20:20:59 UTC
At this time I'm trying to verify if the same behavior is present in

   nss_ldap-253-10.el4

from the RHEL 5.2 beta and if the patch in comment #4 will fix the thing.  On a
fully updated RHEL 5 box I am unable to build nss_ldap-253-5.el5.

Comment 6 Jack Neely 2008-03-18 21:05:17 UTC
I have confirmed that nss_ldap-253-10.el5 also leaks sockets the same as
nss_ldap-253-5.el5.

Comment 7 Jack Neely 2008-03-18 21:09:49 UTC
The attach patch also appears to reclaim old sockets properly.  I've watched
nscd develop a stale socket for the ldap connection, did a getent password foo,
and used lsof to verify that it reconnected to the ldap server on a different fd
and closed the stale socket.

Comment 8 Jack Neely 2008-03-20 17:04:06 UTC
My current src package:

http://www4.ncsu.edu/~jjneely/nss_ldap-253-10.ncsu.1.EL5.src.rpm

Comment 9 Jack Neely 2008-06-02 15:19:12 UTC
RHEL 5.2 does not include this patch.  I've rebased my nss_ldap packaged from
253-12 and added the patch to stop the evil count down to RHEL 5 destruction.

http://www4.ncsu.edu/~jjneely/nss_ldap-253-12.ncsu.EL5.src.rpm

Comment 10 Veaceslav Falico 2009-02-09 22:27:27 UTC
Confirming multiple issues with nscd, tested the patch, it fixes the bug.

Comment 11 Issue Tracker 2009-02-25 02:51:37 UTC
Hi,

My customer contacted us again asking the current status of the issue. So
I need the update on t
his one. I have two questions.

>Confirming multiple issues with nscd, tested the patch, it fixes the
bug.

1. Is it possible to release the hotfix package for this? 

2. What release are we planning to include the patch? Can you plan the
inclusion in RHEL4.8?


Thanks,

Masahiro





This event sent from IssueTracker by mokubo 
 issue 262446

Comment 12 James Bourne 2009-02-26 18:21:53 UTC
FYI, we are also seeing this issue and have a ticket open with support.

Ticket number 1885792 under Mount Royal College.  We currently have set nscd to restart every 6 hours to work around the issue.  I'll cross post this bug ID into our ticket so L2 can follow it.

This is causing issues with both RHEL 4.7 and RHEL 5.3.

Regards
James

Comment 14 Issue Tracker 2009-03-05 09:28:48 UTC
Hi,

Thank you for the test package offer. But what the customer is asking us
is the package 
supported by RH at this stage. So can you provide the hotfix please?

This customer is a detailed-oriented one. So can you also give me the
detailed information
about what was the problem and what was fixed on the issue?

Regards,

Masairo


This event sent from IssueTracker by mokubo 
 issue 262446

Comment 16 Jeff Bastian 2009-03-20 21:20:49 UTC
See bug 491419 for RHEL 4 clone of this bug.

Comment 18 Chris Ward 2009-04-01 09:02:55 UTC
We understand the desire for the Hot Fix; currently this option is being considered. However, prior to that, we need to ensure that the fix we have provided in fact resolves the issues. We are unfortunately unable to easily reproduce this issue in-house, so we request to get confirmation from the reporter that the test packages provided resolve the issue. Please grab the test packages that I have provided below and report back whether they indeed fix the issue or not. 

If they fix the issue, then we will determine the next step to getting the updated packages out the door as quickly and conveniently as possible.

http://people.redhat.com/cward/5.4.0/nss_ldap/

Comment 26 Chris Ward 2009-06-14 23:13:57 UTC
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~

RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!

Comment 27 Chris Ward 2009-07-03 17:59:48 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 28 Chris Ward 2009-07-10 19:03:20 UTC
~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~

RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching.

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.

Comment 31 Chris Ward 2009-08-25 13:27:38 UTC
Jack, please update us with the latest Beta test results confirming the
resolution of this request. Thank you.

Comment 32 Jack Neely 2009-08-25 14:53:10 UTC
I've examined the patches and they look good.  I've been testing on a VM for several days now and I cannot reproduce the bug with nss_ldap-253-21 package from the RHEL 5.4 beta.

Thanks!

Comment 33 errata-xmlrpc 2009-09-02 11:49:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1379.html