Bug 495083 - [RHEL4] nscd uses 100% cpu and stops responding
Summary: [RHEL4] nscd uses 100% cpu and stops responding
Keywords:
Status: CLOSED DUPLICATE of bug 495082
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: glibc
Version: 4.7
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Andreas Schwab
QA Contact: BaseOS QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-04-09 17:14 UTC by Alan Matsuoka
Modified: 2009-09-07 12:46 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-07 12:45:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport-MFrohm.1907516-249173-7f48db.tar.bz2 (2.22 MB, application/x-bzip2)
2009-04-09 17:14 UTC, Alan Matsuoka
no flags Details

Description Alan Matsuoka 2009-04-09 17:14:46 UTC
Created attachment 338949 [details]
sosreport-MFrohm.1907516-249173-7f48db.tar.bz2

> ##### General Escalation Information
>
> State the problem
>
> 1. Provide time and date of the problem

Sporadic

> 2. Indicate the platform(s) (architectures) the problem is being reported
> against.

RHEL 4.7 ES and AS i386

> 3. Provide clear and concise problem description as it is understood at the
> time of escalation
>
> * Observed behavior

nscd hangs on futex call and the nscd processes are using up 100% CPU on
several of our machines.  nscd isn't responding at all. 'service restart nscd'
is not able to stop the process and nscd will only respond to a 'kill -9'. We
are currently restarting nscd in a daily cronjob as a workaround.

We have also noticed that on the machines where the nscd processes are using up
100% CPU, 'lsof' shows two fd:s opens /var/run/nscd/socket. But on the machines
with a normal nscd 'lsof' shows only one opened /var/run/nscd/socket.

This problem occurs on machines both with and without LDAP connection.

>
> * Desired behavior

nscd should not use 100% CPU and should respond normally to kill signals etc

> 4. State specific action requested of SEG

Analyse the problem and advise if we can gather any extra data.

> 5. State whether or not a defect in the product is suspected

This is suspected to be a bug in both RHEL 4.7 and CentOS. This customer and others have actually opened a bug directly in bugzilla (which I have discouraged them from doing in future) and a CentOS bug tracker as well:

> * Provide Bugzilla if one already exists

https://bugzilla.redhat.com/show_bug.cgi?id=492581
N.B. This has already been assigned to Jakub Jelinek
http://bugs.centos.org/view.php?id=3373

> 8. This is especially important for severity one and two issues. What is the
> impact to the customer when they experience this problem?

This is happening frequently and is affecting users and is frustrating the customer.

> ##### Provide supporting info
>
> 1. State other actions already taken in working the problem:
>
> * tech-list, google searches, fulltext, consulting with another engineer
>
> * Provide any relevant data found
>
> 2. Attach sosreport

Attached an sosreport from an example system. It looks like they might be using a customer kernel, so I'm going to ask if they can reproduce with the stock kernel. However, since they can reproduce on multiple architectures, AS/ES and on CentOS and other people have reported the same behaviour, my suspicion is that it's not related to the kernel version and we should progress without quibbling.

> 3. Attach other supporting data

See Bugzilla referenced above
>
> 4. Provide issue repro information:

None applicable

> 5. List any known hot-fix packages on the system

None

> 6. List any customer applied changes from the last 30 days

None

Comment 2 Andreas Schwab 2009-09-07 12:45:57 UTC

*** This bug has been marked as a duplicate of bug 495082 ***


Note You need to log in before you can comment on or make changes to this bug.