Red Hat Bugzilla – Bug 495083
[RHEL4] nscd uses 100% cpu and stops responding
Last modified: 2009-09-07 08:46:04 EDT
Created attachment 338949 [details]
> ##### General Escalation Information
> State the problem
> 1. Provide time and date of the problem
> 2. Indicate the platform(s) (architectures) the problem is being reported
RHEL 4.7 ES and AS i386
> 3. Provide clear and concise problem description as it is understood at the
> time of escalation
> * Observed behavior
nscd hangs on futex call and the nscd processes are using up 100% CPU on
several of our machines. nscd isn't responding at all. 'service restart nscd'
is not able to stop the process and nscd will only respond to a 'kill -9'. We
are currently restarting nscd in a daily cronjob as a workaround.
We have also noticed that on the machines where the nscd processes are using up
100% CPU, 'lsof' shows two fd:s opens /var/run/nscd/socket. But on the machines
with a normal nscd 'lsof' shows only one opened /var/run/nscd/socket.
This problem occurs on machines both with and without LDAP connection.
> * Desired behavior
nscd should not use 100% CPU and should respond normally to kill signals etc
> 4. State specific action requested of SEG
Analyse the problem and advise if we can gather any extra data.
> 5. State whether or not a defect in the product is suspected
This is suspected to be a bug in both RHEL 4.7 and CentOS. This customer and others have actually opened a bug directly in bugzilla (which I have discouraged them from doing in future) and a CentOS bug tracker as well:
> * Provide Bugzilla if one already exists
N.B. This has already been assigned to Jakub Jelinek
> 8. This is especially important for severity one and two issues. What is the
> impact to the customer when they experience this problem?
This is happening frequently and is affecting users and is frustrating the customer.
> ##### Provide supporting info
> 1. State other actions already taken in working the problem:
> * tech-list, google searches, fulltext, consulting with another engineer
> * Provide any relevant data found
> 2. Attach sosreport
Attached an sosreport from an example system. It looks like they might be using a customer kernel, so I'm going to ask if they can reproduce with the stock kernel. However, since they can reproduce on multiple architectures, AS/ES and on CentOS and other people have reported the same behaviour, my suspicion is that it's not related to the kernel version and we should progress without quibbling.
> 3. Attach other supporting data
See Bugzilla referenced above
> 4. Provide issue repro information:
> 5. List any known hot-fix packages on the system
> 6. List any customer applied changes from the last 30 days
*** This bug has been marked as a duplicate of bug 495082 ***