Bug 495083 - [RHEL4] nscd uses 100% cpu and stops responding
[RHEL4] nscd uses 100% cpu and stops responding
Status: CLOSED DUPLICATE of bug 495082
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: glibc (Show other bugs)
All Linux
high Severity high
: rc
: ---
Assigned To: Andreas Schwab
Depends On:
  Show dependency treegraph
Reported: 2009-04-09 13:14 EDT by Alan Matsuoka
Modified: 2009-09-07 08:46 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-09-07 08:45:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
sosreport-MFrohm.1907516-249173-7f48db.tar.bz2 (2.22 MB, application/x-bzip2)
2009-04-09 13:14 EDT, Alan Matsuoka
no flags Details

  None (edit)
Description Alan Matsuoka 2009-04-09 13:14:46 EDT
Created attachment 338949 [details]

> ##### General Escalation Information
> State the problem
> 1. Provide time and date of the problem


> 2. Indicate the platform(s) (architectures) the problem is being reported
> against.

RHEL 4.7 ES and AS i386

> 3. Provide clear and concise problem description as it is understood at the
> time of escalation
> * Observed behavior

nscd hangs on futex call and the nscd processes are using up 100% CPU on
several of our machines.  nscd isn't responding at all. 'service restart nscd'
is not able to stop the process and nscd will only respond to a 'kill -9'. We
are currently restarting nscd in a daily cronjob as a workaround.

We have also noticed that on the machines where the nscd processes are using up
100% CPU, 'lsof' shows two fd:s opens /var/run/nscd/socket. But on the machines
with a normal nscd 'lsof' shows only one opened /var/run/nscd/socket.

This problem occurs on machines both with and without LDAP connection.

> * Desired behavior

nscd should not use 100% CPU and should respond normally to kill signals etc

> 4. State specific action requested of SEG

Analyse the problem and advise if we can gather any extra data.

> 5. State whether or not a defect in the product is suspected

This is suspected to be a bug in both RHEL 4.7 and CentOS. This customer and others have actually opened a bug directly in bugzilla (which I have discouraged them from doing in future) and a CentOS bug tracker as well:

> * Provide Bugzilla if one already exists

N.B. This has already been assigned to Jakub Jelinek

> 8. This is especially important for severity one and two issues. What is the
> impact to the customer when they experience this problem?

This is happening frequently and is affecting users and is frustrating the customer.

> ##### Provide supporting info
> 1. State other actions already taken in working the problem:
> * tech-list, google searches, fulltext, consulting with another engineer
> * Provide any relevant data found
> 2. Attach sosreport

Attached an sosreport from an example system. It looks like they might be using a customer kernel, so I'm going to ask if they can reproduce with the stock kernel. However, since they can reproduce on multiple architectures, AS/ES and on CentOS and other people have reported the same behaviour, my suspicion is that it's not related to the kernel version and we should progress without quibbling.

> 3. Attach other supporting data

See Bugzilla referenced above
> 4. Provide issue repro information:

None applicable

> 5. List any known hot-fix packages on the system


> 6. List any customer applied changes from the last 30 days

Comment 2 Andreas Schwab 2009-09-07 08:45:57 EDT

*** This bug has been marked as a duplicate of bug 495082 ***

Note You need to log in before you can comment on or make changes to this bug.