From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827 Description of problem: NSCD quits servicing requests on one of our imap servers. When this happens, the server becomes totally unusable, the load starts to skyrocket, and any processes waiting on a request from nscd simply wait and do not exit (e.g. we keep spawning new imap processes until we hit the maximum number specified in the xinetd config file). The problem tends to occur on weekday mornings as our server usage begins to increase. Under normal conditions, the load on this box is always under 4. An strace on a command like ps -l shows that the command is waiting on to read from the nscd socket in /var/run. The problem occurs with great frequency--every morning before I disabled nscd. All packegs on our box are 100% up to date. An strace on nscd shows that the application is waiting on a read (of what I don't know since I have only run strace after the nscd process had hung). The problem occured with both nscd-2.2.5-42 and nscd-2.2.5-39. A similar problem appears to have been reported in bugs 17519 and 13308. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: I can't reproduce the conditions, but when it runs in production it happens regularly--almost every weekday between 10am and 12:30pm. Additional info: Hardware info: CPU: 2x1.26Ghz PIII RAM: 1Gb Disk: 2x36G (Raid 1), 34G of SAN storage accessed via a Qlogic 2300 HBA
This problem should be a higher priority. I've noticed a similar problem on two of our servers. We are converting our authentication over to use LDAP. If the LDAP servers disappear for too long a period of time, nscd stops responding at all, which in turn locks up the server. This is a very serious problem as it can effect production environments and can quickly cause what should be a simple problem to bring down every server. I did not see if the original problem included interactions with LDAP, but I wonder if it also depends on a remote server.
I got exactly the same problem. the nscd die and i have xemacs processes that eat all memory of my server.
We're seeing this issue as well. We're running nscd 2.2.5-43 and OpenLDAP 2.0.27-2.7.3. All system processes go into sleep, no new processes are spawned. System is, for all intents and purposes, locked. The only way for us to re-engage the system is a reboot.
(The component was wrong, it should have been glibc not nscd, this is why this bug went unnoticed.) There have been countless of c hanges since 7.3 and glibc 2.2.5. With glibc-2.3.3-64 and up I don't expect any problems even with the nss_ldap module anymore. Upgrade to this or a later version when available and retest. Open new bugs for all negative findings. This bug is outdated and therefore I close it.