After an unspecified amount of time nscd will deadlock, crippling the system. Each thread seems to hang shortly after recieving a request for something that can't be found in the cache (according to the nscd logs). Once all threads are hung anything needing to access nscd blocks indefinitely. Logins are impossible, but if a root terminal is already open nscd can be killed, restoring system functionality. nscd threads are able to process other cache misses, but for some reason they will eventually recieve one that causes the thread to hang. I'm submitting this as a high-priority, high-severity bug because it relates to a core system component (glibc) and can cause a system to be unusable. While nscd is an optional service, disabling it isn't really a viable solution; the performance degradation is quite noticable. Our backend is an LDAP server - with around a hundred clients banging on the server, removing nscd creates some serious performance problems.
assigned to jakub
I am seeing the same problems on our deployed RedHat 7.2 servers, again with LDAP as a backend. All the related packages (nss_ldap, pam_ldap, glibc) are all either the default RedHat 7.2 install with most of the machines at the latest released RedHat 7.2 updated package. Is any progress being made here? THanks
We are running Novell eDirectory on Red Hat 7.3 server. Without using nscd the server will jam totaly. The problem is that nscd is extreme unstable and it has to restarted on crontab about every minute. Here is snipper what I see with "ps fax" command. Not a pretty sight: 3475 ? S 0:11 /usr/sbin/nscd 3484 ? Z 0:00 \_ [nscd <defunct>] 3684 ? S 0:09 /usr/sbin/nscd 3687 ? Z 0:00 \_ [nscd <defunct>] 3816 ? S 0:08 /usr/sbin/nscd 3819 ? Z 0:00 \_ [nscd <defunct>] 3954 ? S 0:07 /usr/sbin/nscd 3961 ? Z 0:00 \_ [nscd <defunct>] 4147 ? S 0:07 /usr/sbin/nscd 4151 ? Z 0:00 \_ [nscd <defunct>]
We also run nscd with an LDAP backend, we are fortunate in that the nscd daemon die abnormally frequently but not deadlock. The nscd daemon dies leaving behind /var/run/nscd.pid and /var/run/.nscd_socket - these need to be removed before nscd can be restarted again. I've tried to increase the number of nscd threads and enabling debug logging but I am still not sure if these resolve the problem. This problem happens on both a RH7.3 box and RH7.1 box with the current nscd/glibc errata RPMs.
We've had this problem happen on Red Hat 7.3 Red Hat 8.0 Red Hat ES 2.1 Red Hat ES 3 We kept our RH 7.3 and 8 systems up to date with patches, and Red Hat Network is keeping our ES 2.1 and ES 3 systems completely up to date, and we're still seeing the problem, on multiple different systems. This problem is listed as "ASSIGNED", but that was more than a year ago. What's the holdup? Would nscd debug logs help?
The holdup is that the coponent is wrong. Somewhat set this up for some reason but none of the people responsible for the package even knew it existed. The bug should have been filed against glibc since this is the package nscd is part of. There is a problem in nscd which is fixed in the current glibc at least. Use FC3t2 or later when it comes available. Part part of the blame is to be laid on the nss_ldap module which far too often misbehaves. I won't anayze it since I at some point want to eat again. If you have problems with lockups in FC3 let me know by reopening. But we certainly won't touch any code in RHL9 or earlier, FC1, or FC2.