Bug 495082 - [RHEL 4] NSCD high CPU utilization on systems with significant uptime
[RHEL 4] NSCD high CPU utilization on systems with significant uptime
Status: CLOSED DUPLICATE of bug 496201
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: glibc (Show other bugs)
4.7
All Linux
urgent Severity high
: rc
: ---
Assigned To: Andreas Schwab
BaseOS QE
:
: 492581 495083 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-09 13:08 EDT by Alan Matsuoka
Modified: 2016-11-24 11:09 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-07 08:41:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alan Matsuoka 2009-04-09 13:08:22 EDT
Description of problem:
top is reporting 100% CPU for the nscd process on machines with high uptime, usage returns to normal after restarting nscd, however.
[root@eqeuro1u ~]# top
top - 16:32:11 up 90 days,  7:30,  4 users,  load average: 1.03, 1.03, 1.00
Tasks: 222 total,   2 running, 220 sleeping,   0 stopped,   0 zombie
Cpu(s):12.6% us,  0.2% sy,  0.0% ni, 87.2% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   32942984k total, 22434060k used, 10508924k free,   532184k buffers
Swap:  8385920k total,        0k used,  8385920k free, 19947240k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+    COMMAND
4771 nscd      16   0  168m 1516 1040 S 100.2 0.0    40969:30 nscd
19859 frawarsv 37  18  145m 25m  1000 S  1.0  0.1    28:27.78 rvd
1452 frawarsv  16   0  8500 1476 1012 S  0.3  0.0    0:32.93  top

[root@eqeuro1u ~]# mpstat -P ALL 10 10
Linux 2.6.9-78.0.1.ELsmp (eqeuro1u)     01/13/2009
04:34:31 PM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle    intr/s
04:34:41 PM  all   13.40    0.01    0.27    0.00    0.00    0.04   86.28   1626.60
04:34:41 PM    0    2.40    0.10    0.90    0.00    0.00    0.20   96.40    600.40
04:34:41 PM    1    0.10    0.00    0.10    0.00    0.00    0.00   99.80     16.60
04:34:41 PM    2    1.70    0.00    0.40    0.00    0.00    0.00   97.90    556.40
04:34:41 PM    3    0.70    0.00    0.40    0.00    0.00    0.00   98.90    452.80
04:34:41 PM    4    1.20    0.00    0.30    0.00    0.00    0.00   98.40      0.00
04:34:41 PM    5    0.20    0.00    0.10    0.00    0.00    0.00   99.70      0.20
04:34:41 PM    6  100.00    0.00    0.00    0.00    0.00    0.00    0.00      0.10
04:34:41 PM    7    0.80    0.00    0.10    0.00    0.00    0.00   99.10      0.10

[root@eqeuro1u ~]#  sar -P 6 10 10
Linux 2.6.9-78.0.1.ELsmp (eqeuro1u)     01/13/2009
04:29:18 PM       CPU     %user     %nice   %system   %iowait     %idle
04:29:28 PM         6    100.00      0.00      0.00      0.00      0.00
04:29:38 PM         6    100.00      0.00      0.00      0.00      0.00
04:29:48 PM         6    100.00      0.00      0.00      0.00      0.00
04:29:58 PM         6    100.00      0.00      0.00      0.00      0.00
04:30:08 PM         6    100.00      0.00      0.00      0.00      0.00
04:30:18 PM         6    100.00      0.00      0.00      0.00      0.00

Truss of the ncsd process while in this state:
4771  futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...>
4774  futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...>
4775  futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...>
4777  futex(0x552abc55a4, FUTEX_WAIT, 4851653, NULL <unfinished ...>
4778  futex(0x552abc55a4, FUTEX_WAIT, 4851653, NULL <unfinished ...>
4779  futex(0x552abc55a4, FUTEX_WAIT, 4851653, NULL <unfinished ...>


Currently I've requested a sysreport from the SA and will post it here when received

other reporters:This is a known issue that has been seen in RHEL4
>  4-07-09   283558   nscd using 100% CPU on RHEL 4.7
>  3-13-09   275573   nscd using 100% CPU on RHEL 4.6
>  3-04-09   272426   nscd using 100% CPU on RHEL 4.6
>  2-10-09   264826   nscd using 100% CPU on RHEL 4.7
>  2-08-09   264249   nscd using 100% CPU on RHEL 4.5
> 11-06-08   237055   nscd using 100% CPU on RHEL 4
>  9-16-08   221175   nscd using 100% CPU on RHEL 4
Comment 6 Kostas Georgiou 2009-04-23 07:30:49 EDT
I also see this in our cluster every few weeks, at the moment I have about 20 of them with nscd stuck in the futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...> call using 100% of the cpu.
Comment 7 Chris Ward 2009-05-14 02:51:29 EDT
Is this issue related to bug 496201?
Comment 8 Chris Ward 2009-05-14 02:54:29 EDT
It is believed that this issue is related to a kernel bug and should not need to be fixed in glibc.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f39894d1b5c253b10fcb8fbbbcf65a330f6cdc7  

Please apply this patch and test. Let us know asap if this resolves the issue.
Comment 10 Chris Ward 2009-06-08 11:25:25 EDT
Yes, sorry we didn't ping back here earlier. The issue has been resolved in kernel space. Grab the latest kernel from the link below and report back your test results.

http://people.redhat.com/vgoyal/rhel4/
Comment 20 Andreas Schwab 2009-09-07 08:41:31 EDT

*** This bug has been marked as a duplicate of bug 496201 ***
Comment 21 Andreas Schwab 2009-09-07 08:45:57 EDT
*** Bug 495083 has been marked as a duplicate of this bug. ***
Comment 22 Andreas Schwab 2009-09-07 08:46:29 EDT
*** Bug 492581 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.