From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030701 Description of problem: When using name service switch with ldap, and nscd(8) to cache uids, gids, services etc ... nscd(8) will not close connections to ldap server. After 20 hours there will be cca 1500 ESTABLISHED connections (listed with lsof(8) tool). Things get worse if you list /home with `ls -l' or list some files without literal existing name for uid. Nscd(8) will open 5 by 5 new connections without closing them. After a week RH8 slapd(8) server started to refuse connections with error "Too many open files" (ENFILE). Killing nscd(8) and restarting slapd(8) will get things back in normal. /proc/sys/fs/file-max is tuned to value 16384. OTOH SunOS 5.9 server with identical role and with padl.com's nss_ldap lib nscd(8) was fine all the time, without wasting connections. On that SunOS server I have compiled nss_ldap 204 couple months ago. I have solved the problem compiling new nss_ldap library from padl.com (version 207), linking static it with openldap 2.1.18 sasl2 and openssl libs, and dynamic with libresolv and libc. Number of connections is now exactly 5 - I have 4 threads configuration parameter in nscd.conf(5) - and this is correct behavior. One process and fout threads IMHO. Version-Release number of selected component (if applicable): nss_ldap-198-3 How reproducible: Always Steps to Reproduce: 1. Setup machine with ldap name service switch client and turn on nscd(8) 2. create some users and groups in LDAP directory 3. chown(1) and chgrp(1) some files to them on system 4. Do `ls -l' frequently 5. Watch number of connections with lsof(8) after couple of hours Actual Results: lsof -i tcp | egrep "nscd.*.ldap" | wc -l 815 Expected Results: lsof -i tcp | egrep "nscd.*.ldap" | wc -l 5 Additional info: Server is running OpenLDAP server _and_ it is ldap client to himself and to other machines on network.
I have had exactly the same behaviour on various servers under RedHat 8 and RedHat 9. The only solution was to kill the nscd daemon completly. So currently my servers are under performing because of the lake of caching uid/gip informations. This bug is not corrected by all the current (October 17, 2003) patches from redhat.
Created attachment 95940 [details] a small patch to enable DEBUG in nss_ldap I patched ldap_nns.h to define DEBUG and DEBUG_SYSLOG, since my tries to patch config.h.in failed.
Created attachment 95941 [details] my spec file to active DEBUG
Hello, We do have to very same problem here and seems to have "fixed" it ( by chance I guess ) since I still do not understand what is going on. At first we set /etc/openldap/sldap.conf idletimeout to 600 to force LDAP server to close idle connections after 1 hour; otherwise, due to the bug, the server will not run more that 6-8 hours....without restarting ldap service. Then we downloaded source rpm of glibc and studied code of nscd. Nothing stange there... nscd does not know about ldap. So cuplrit seems to be nss_ldap ? Here is an execpt on my "log book" of that matter: 1) make sure nss_ldap is the latest for RedHAT 8 apt_get install nss_ldap answer of freshrpms.net : nss_ldap is the newest.... 2) content of current nss_ldap: root@pc107t-1 BUILD]# rpm -qpl /misc/regen/RedHat/RPMS/nss_ldap-198-3.i386.rpm warning: /misc/regen/RedHat/RPMS/nss_ldap-198-3.i386.rpm: V3 DSA signature: NOKEY, key ID db42a60e /etc/ldap.conf /lib/libnss_ldap-2.2.90.so <-- note the version number /lib/security/pam_ldap.so /usr/lib/libnss_ldap.so /usr/share/doc/nss_ldap-198 3) downloaded source RPM & built it apt-get source nss_ldap rpmbuild -ba SPECS/nss_ldap.spec see what has been done .. [root@pc107t-1 BUILD]# rpm -qpl ../RPMS/i386/nss_ldap-198-3.i386.rpm /etc/ldap.conf /lib/libnss_ldap-2.3.2.so <--- strange /lib/security/pam_ldap.so /usr/lib/libnss_ldap.so /usr/share/doc/nss_ldap-198 ..... Note that the version number is not the same 2.3.2: instead of 2.2.90 4) installed the newly compiled one: [root@pc107t-1 BUILD]# rpm -Uvh --force ../RPMS/i386/nss_ldap-198-3.i386.rpm Preparing... ########################################### [100%] 1:nss_ldap ########################################### [100%] 5) see what I got: 2 /lib/libnss_ldap but links are good [root@pc107t-1 BUILD]# ll /lib/libnss_l* -rwxr-xr-x 1 root root 1682620 aoû 27 2002 /lib/libnss_ldap-2.2.90.so -rwxr-xr-x 1 root root 1679482 nov 12 18:17 /lib/libnss_ldap-2.3.2.so lrwxrwxrwx 1 root root 20 nov 12 18:22 /lib/libnss_ldap.so.2 -> libnss_ldap-2.3.2.so [root@pc107t-1 BUILD]# 6) restarted nscd and tried again to shake up the system : [root@pc107t-1 BUILD]# ll /hetuds/pc .... give me a long listing on student's home on a NFS server see what connections are left open : [root@pc107t-1 BUILD]# lsof |grep ldap | grep cipc2 |wc -l 352 the bug is still there !!!! 6bis) removed /lib/libnss_ldap-2.2.90.so and restarted workstation bug is still there... 7) Decided to activate debug mode in nss_ldap by applying a small patch ( see the attachment) SOURCES/nss_ldap-pp.patch SPECS/nss_ldap-pp.spec rebuild by rpmbuild -ba SPECS/nss_ldap-pp.spec force installed: rpm -Uvh --force ../RPMS/i386/nss_ldap-198-3.i386.rpm see what has been done: [root@pc107t-1 BUILD]# ll /lib/libnss_l* -rwxr-xr-x 1 root root 1687714 nov 12 16:05 /lib/libnss_ldap-2.3.2.so lrwxrwxrwx 1 root root 20 nov 12 18:22 /lib/libnss_ldap.so.2 -> libnss_ldap-2.3.2.so My debug version is slightly bigger... and links still OK restarted nscd [root@pc107t-1 BUILD]# ll /hetuds/pc .... shake again the system root@pc107t-1 BUILD]# lsof |grep ldap | grep cipc2 |wc -l 9 ???? so in debug mode connections are cleanly closed ??? 8) remove my patch, rebuild, reinstalled, restarted nscd, the bug is back !!!! So I distributed my "debug version" to all 220 RH80 workstations, and removed idle_timeout=600 in server's /etc/openldap/sldap.conf, and for the first time our ldap server did not complained anymore after 6 hours about too many open files... I did peeked at the code of nss_ldap 198 but could not figure out what's happening... Looks like the bug do not happen with our 2 RedHat 9.0 workstations having nss_ldap-202-5.i386.rpm and /lib/libnss_ldap-2.3.1.so. Note that with RedHat 9, libnss_ldap is tagged 2.3.1 and is 2.3.2 from the SRPMS of RedHat 8 ? So i tried to install this RH9 RPM on a RH8 by: rpm -Uvh /misc/regen/updates/postes/commun/tests/nss_ldap-202-5.i386.rpm I had to fidle with symlinks in /usr/lib to read: [root@pc107n-3 lib]# ll /usr/lib/libnss_l* lrwxrwxrwx 1 root root 30 nov 13 11:28 /usr/lib/libnss_ldap.so -> ../../lib/libnss_ldap-2.3.1.so [root@pc107n-3 lib]# ll /lib/libnss_l* -rwxr-xr-x 1 root root 1855520 jan 25 2003 /lib/libnss_ldap-2.3.1.so lrwxrwxrwx 1 root root 20 nov 13 11:27 /lib/libnss_ldap-2.so.2 -> libnss_ldap-2.3.1.so and to restart the workstation ( otherwise getent passwd gives only local accounts)... and the bug seems to be gone ... on RH8: nss_ldap-198 is OK in debug mode ? nss_ldap-205-3 is OK after some fiddling. Hope this will help you to release a working binary RPM for RH8 that will install cleanly ...and superseeds nss_ldap-198.
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still running Red Hat Linux, you are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Red Hat apologizes that these issues have not been resolved yet. We do want to make sure that no important bugs slip through the cracks. Please check if this issue is still present in a current Fedora Core release. If so, please change the product and version to match, and check the box indicating that the requested information has been provided. Note that any bug still open against Red Hat Linux on will be closed as 'CANTFIX' on September 30, 2006. Thanks again for your help.
Red Hat Linux is no longer supported by Red Hat, Inc. If you are still running Red Hat Linux, you are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Closing as CANTFIX.