Bug 17519 - nscd deadlocks, halting system activity
nscd deadlocks, halting system activity
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
6.2
i386 Linux
high Severity high
: ---
: ---
Assigned To: Jakub Jelinek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-09-14 18:17 EDT by shuey
Modified: 2005-10-31 17:00 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-10-04 02:50:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description shuey 2000-09-14 18:17:31 EDT
After an unspecified amount of time nscd will deadlock, crippling the
system.  Each thread seems to hang shortly after recieving a request
for something that can't be found in the cache (according to the
nscd logs).  Once all threads are hung anything needing to access nscd
blocks indefinitely.  Logins are impossible, but if a root terminal is
already open nscd can be killed, restoring system functionality.  nscd
threads are able to process other cache misses, but for some reason
they will eventually recieve one that causes the thread to hang.

I'm submitting this as a high-priority, high-severity bug because it
relates to a core system component (glibc) and can cause a system to be
unusable.  While nscd is an optional service, disabling it isn't really
a viable solution; the performance degradation is quite noticable.  Our
backend is an LDAP server - with around a hundred clients banging on the
server, removing nscd creates some serious performance problems.
Comment 1 Cristian Gafton 2000-10-17 19:34:53 EDT
assigned to jakub
Comment 2 Ben Klang 2002-05-16 10:05:26 EDT
I am seeing the same problems on our deployed RedHat 7.2 servers, again with 
LDAP as a backend. All the related packages (nss_ldap, pam_ldap, glibc) are all 
either the default RedHat 7.2 install with most of the machines at the latest 
released RedHat 7.2 updated package.

Is any progress being made here?

THanks
Comment 3 Petri T. Koistinen 2002-08-30 16:06:47 EDT
We are running Novell eDirectory on Red Hat 7.3 server. Without using nscd the
server will jam totaly. The problem is that nscd is extreme unstable and it has
to restarted on crontab about every minute.

Here is snipper what I see with "ps fax" command. Not a pretty sight:

 3475 ?        S      0:11 /usr/sbin/nscd
 3484 ?        Z      0:00  \_ [nscd <defunct>]
 3684 ?        S      0:09 /usr/sbin/nscd
 3687 ?        Z      0:00  \_ [nscd <defunct>]
 3816 ?        S      0:08 /usr/sbin/nscd
 3819 ?        Z      0:00  \_ [nscd <defunct>]
 3954 ?        S      0:07 /usr/sbin/nscd
 3961 ?        Z      0:00  \_ [nscd <defunct>]
 4147 ?        S      0:07 /usr/sbin/nscd
 4151 ?        Z      0:00  \_ [nscd <defunct>]
Comment 4 David Vu 2002-12-02 01:42:46 EST
We also run nscd with an LDAP backend, we are fortunate in that the nscd daemon 
die abnormally frequently but not deadlock.  The nscd daemon dies leaving 
behind /var/run/nscd.pid and /var/run/.nscd_socket - these need to be removed 
before nscd can be restarted again.

I've tried to increase the number of nscd threads and enabling debug logging 
but I am still not sure if these resolve the problem.

This problem happens on both a RH7.3 box and RH7.1 box with the current 
nscd/glibc errata RPMs.
Comment 5 Tim Mooney 2004-01-09 18:51:07 EST
We've had this problem happen on

  Red Hat 7.3
  Red Hat 8.0
  Red Hat ES 2.1
  Red Hat ES 3

We kept our RH 7.3 and 8 systems up to date with patches, and Red Hat
Network is keeping our ES 2.1 and ES 3 systems completely up to date,
and we're still seeing the problem, on multiple different systems.

This problem is listed as "ASSIGNED", but that was more than a year
ago.  What's the holdup?  Would nscd debug logs help?
Comment 6 Ulrich Drepper 2004-10-04 02:50:41 EDT
The holdup is that the coponent is wrong.  Somewhat set this up for
some reason but none of the people responsible for the package even
knew it existed.  The bug should have been filed against glibc since
this is the package nscd is part of.

There is a problem in nscd which is fixed in the current glibc at
least.  Use FC3t2 or later when it comes available.  Part part of the
blame is to be laid on the nss_ldap module which far too often
misbehaves.  I won't anayze it since I at some point want to eat again.

If you have problems with lockups in FC3 let me know by reopening. 
But we certainly won't touch any code in RHL9 or earlier, FC1, or FC2.

Note You need to log in before you can comment on or make changes to this bug.