Bug 99078 - nscd(8) not closing connections to ldap server in combination with nss:ldap 198
nscd(8) not closing connections to ldap server in combination with nss:ldap 198
Status: CLOSED CANTFIX
Product: Red Hat Linux
Classification: Retired
Component: nss_ldap (Show other bugs)
8.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Nalin Dahyabhai
Jay Turner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-07-14 04:16 EDT by Miroslav Zubcic
Modified: 2015-01-07 19:05 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-10-18 12:34:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
a small patch to enable DEBUG in nss_ldap (389 bytes, patch)
2003-11-13 04:50 EST, Need Real Name
no flags Details | Diff
my spec file to active DEBUG (17.12 KB, text/plain)
2003-11-13 04:50 EST, Need Real Name
no flags Details

  None (edit)
Description Miroslav Zubcic 2003-07-14 04:16:37 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030701

Description of problem:
When using name service switch with ldap, and nscd(8) to cache uids,
gids, services etc ... nscd(8) will not close connections to ldap
server. After 20 hours there will be cca 1500 ESTABLISHED connections
(listed with lsof(8) tool).

Things get worse if you list /home with `ls -l' or list some files
without literal existing name for uid. Nscd(8) will open 5 by 5 new
connections without closing them.

After a week RH8 slapd(8) server started to refuse connections with
error "Too many open files" (ENFILE). Killing nscd(8) and restarting
slapd(8) will get things back in normal. /proc/sys/fs/file-max is
tuned to value 16384.

OTOH SunOS 5.9 server with identical role and with padl.com's nss_ldap
lib nscd(8) was fine all the time, without wasting connections.
On that SunOS server I have compiled nss_ldap 204 couple months ago.

I have solved the problem compiling new nss_ldap library from padl.com
(version 207), linking static it with openldap 2.1.18 sasl2 and
openssl libs, and dynamic with libresolv and libc. Number of
connections is now exactly 5 - I have 4 threads configuration
parameter in nscd.conf(5) - and this is correct behavior. One process
and fout threads IMHO.



Version-Release number of selected component (if applicable):
nss_ldap-198-3

How reproducible:
Always

Steps to Reproduce:
1. Setup machine with ldap name service switch client and turn on nscd(8)
2. create some users and groups in LDAP directory
3. chown(1) and chgrp(1) some files to them on system
4. Do `ls -l' frequently
5. Watch number of connections with lsof(8) after couple of hours

Actual Results:  lsof -i tcp | egrep "nscd.*.ldap" | wc -l
      815

Expected Results:  lsof -i tcp | egrep "nscd.*.ldap" | wc -l
      5


Additional info:

Server is running OpenLDAP server _and_ it is ldap client to
himself and to other machines on network.
Comment 1 Alain RICHARD 2003-10-17 05:46:23 EDT
I have had exactly the same behaviour on various servers under RedHat 8 and RedHat 9. The only 
solution was to kill the nscd daemon completly.

So currently my servers are under performing because of the lake of caching uid/gip informations.

This bug is not corrected by all the current (October 17, 2003) patches from redhat.
Comment 2 Need Real Name 2003-11-13 04:50:06 EST
Created attachment 95940 [details]
a small patch to enable DEBUG in nss_ldap

I patched ldap_nns.h to define DEBUG and DEBUG_SYSLOG, since my tries to patch
config.h.in failed.
Comment 3 Need Real Name 2003-11-13 04:50:42 EST
Created attachment 95941 [details]
my spec file to active DEBUG
Comment 4 Need Real Name 2003-11-13 05:55:35 EST
Hello,

    We do have to very same problem here and seems to have "fixed" it
( by chance I guess ) since I still do not understand what is going on.


At first we set /etc/openldap/sldap.conf idletimeout to 600 to force
LDAP server to close idle connections after 1 hour; otherwise, due to
the bug, the server will not run more that 6-8 hours....without
restarting ldap service.


Then we downloaded source rpm of glibc and studied code of nscd.
Nothing stange there... nscd does not know about ldap. So cuplrit
seems to be nss_ldap ?

Here is an execpt on my "log book" of that matter:

1) make sure nss_ldap is the latest for RedHAT 8
    apt_get install nss_ldap 
    answer of freshrpms.net : nss_ldap is the newest....

2) content of current nss_ldap:
root@pc107t-1 BUILD]# rpm -qpl
/misc/regen/RedHat/RPMS/nss_ldap-198-3.i386.rpm
warning: /misc/regen/RedHat/RPMS/nss_ldap-198-3.i386.rpm: V3 DSA
signature: NOKEY, key ID db42a60e
/etc/ldap.conf
/lib/libnss_ldap-2.2.90.so    <-- note the version number 
/lib/security/pam_ldap.so
/usr/lib/libnss_ldap.so
/usr/share/doc/nss_ldap-198

3) downloaded source RPM & built it 
apt-get source nss_ldap
rpmbuild -ba SPECS/nss_ldap.spec
see what has been done ..

[root@pc107t-1 BUILD]# rpm -qpl ../RPMS/i386/nss_ldap-198-3.i386.rpm
/etc/ldap.conf
/lib/libnss_ldap-2.3.2.so   <--- strange
/lib/security/pam_ldap.so
/usr/lib/libnss_ldap.so
/usr/share/doc/nss_ldap-198
.....

Note that the version number is not the same 2.3.2: instead of 2.2.90

4) installed the newly compiled one:
[root@pc107t-1 BUILD]# rpm -Uvh --force
../RPMS/i386/nss_ldap-198-3.i386.rpm
Preparing...               
########################################### [100%]
   1:nss_ldap              
########################################### [100%]

5) see what I got: 2 /lib/libnss_ldap but links are good 
[root@pc107t-1 BUILD]# ll /lib/libnss_l*
-rwxr-xr-x    1 root     root      1682620 aoû 27  2002
/lib/libnss_ldap-2.2.90.so
-rwxr-xr-x    1 root     root      1679482 nov 12 18:17
/lib/libnss_ldap-2.3.2.so
lrwxrwxrwx    1 root     root           20 nov 12 18:22
/lib/libnss_ldap.so.2 -> libnss_ldap-2.3.2.so
[root@pc107t-1 BUILD]#

6) restarted nscd and tried again to shake up the system :

[root@pc107t-1 BUILD]# ll /hetuds/pc
....  give me a long listing on student's home on a NFS server

see what connections are left open : 
[root@pc107t-1 BUILD]# lsof |grep ldap | grep cipc2 |wc -l
    352
the bug is still there !!!!

6bis) removed /lib/libnss_ldap-2.2.90.so and restarted workstation
bug is still there... 


7) Decided to activate debug mode in nss_ldap by applying a small
patch ( see the attachment)
SOURCES/nss_ldap-pp.patch
SPECS/nss_ldap-pp.spec

rebuild  by rpmbuild -ba SPECS/nss_ldap-pp.spec
force installed: rpm -Uvh --force ../RPMS/i386/nss_ldap-198-3.i386.rpm

see what has been done: 

[root@pc107t-1 BUILD]# ll /lib/libnss_l*
-rwxr-xr-x    1 root     root      1687714 nov 12 16:05
/lib/libnss_ldap-2.3.2.so
lrwxrwxrwx    1 root     root           20 nov 12 18:22
/lib/libnss_ldap.so.2 -> libnss_ldap-2.3.2.so

My debug version is slightly bigger... and links still OK

restarted nscd

[root@pc107t-1 BUILD]# ll /hetuds/pc
.... shake again the system 

root@pc107t-1 BUILD]# lsof |grep ldap | grep cipc2 |wc -l
      9
      
???? 
so in debug mode connections are cleanly closed ???

8) remove my patch, rebuild, reinstalled, restarted nscd, the bug is
back !!!!

So I distributed my "debug version" to all 220 RH80 workstations,
and removed idle_timeout=600 in server's /etc/openldap/sldap.conf,  
and for the first time our ldap server did not complained anymore
after 6 hours about too many open files... 
      
I did peeked at the code of nss_ldap 198 but could not figure out
what's happening... 

Looks like the bug do not happen with our 2 RedHat 9.0 workstations
having nss_ldap-202-5.i386.rpm  and /lib/libnss_ldap-2.3.1.so. Note
that with RedHat 9, libnss_ldap is tagged 2.3.1 and is 2.3.2 from the
SRPMS of RedHat 8 ?

So i tried to install this RH9 RPM on a RH8 by:
rpm -Uvh /misc/regen/updates/postes/commun/tests/nss_ldap-202-5.i386.rpm

I had to fidle with symlinks in /usr/lib to read: 
[root@pc107n-3 lib]# ll /usr/lib/libnss_l*
lrwxrwxrwx    1 root     root           30 nov 13 11:28
/usr/lib/libnss_ldap.so -> ../../lib/libnss_ldap-2.3.1.so
[root@pc107n-3 lib]# ll /lib/libnss_l*
-rwxr-xr-x    1 root     root      1855520 jan 25  2003
/lib/libnss_ldap-2.3.1.so
lrwxrwxrwx    1 root     root           20 nov 13 11:27
/lib/libnss_ldap-2.so.2 -> libnss_ldap-2.3.1.so

and to restart the workstation ( otherwise getent passwd gives only
local accounts)... and the bug seems to be gone ...

on RH8:
 nss_ldap-198 is OK in debug mode ?
 nss_ldap-205-3 is OK after some fiddling. 

Hope this will help you to release a working binary RPM for RH8 that
will install cleanly ...and superseeds nss_ldap-198. 
Comment 5 Bill Nottingham 2006-08-07 15:10:22 EDT
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still
running Red Hat Linux, you are strongly advised to upgrade to a
current Fedora Core release or Red Hat Enterprise Linux or comparable.
Some information on which option may be right for you is available at
http://www.redhat.com/rhel/migrate/redhatlinux/.

Red Hat apologizes that these issues have not been resolved yet. We do
want to make sure that no important bugs slip through the cracks.
Please check if this issue is still present in a current Fedora Core
release. If so, please change the product and version to match, and
check the box indicating that the requested information has been
provided. Note that any bug still open against Red Hat Linux on will be
closed as 'CANTFIX' on September 30, 2006. Thanks again for your help.
Comment 6 Bill Nottingham 2006-10-18 12:34:47 EDT
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still
running Red Hat Linux, you are strongly advised to upgrade to a
current Fedora Core release or Red Hat Enterprise Linux or comparable.
Some information on which option may be right for you is available at
http://www.redhat.com/rhel/migrate/redhatlinux/.

Closing as CANTFIX.

Note You need to log in before you can comment on or make changes to this bug.