Description of problem: top is reporting 100% CPU for the nscd process on machines with high uptime, usage returns to normal after restarting nscd, however. [root@eqeuro1u ~]# top top - 16:32:11 up 90 days, 7:30, 4 users, load average: 1.03, 1.03, 1.00 Tasks: 222 total, 2 running, 220 sleeping, 0 stopped, 0 zombie Cpu(s):12.6% us, 0.2% sy, 0.0% ni, 87.2% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 32942984k total, 22434060k used, 10508924k free, 532184k buffers Swap: 8385920k total, 0k used, 8385920k free, 19947240k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4771 nscd 16 0 168m 1516 1040 S 100.2 0.0 40969:30 nscd 19859 frawarsv 37 18 145m 25m 1000 S 1.0 0.1 28:27.78 rvd 1452 frawarsv 16 0 8500 1476 1012 S 0.3 0.0 0:32.93 top [root@eqeuro1u ~]# mpstat -P ALL 10 10 Linux 2.6.9-78.0.1.ELsmp (eqeuro1u) 01/13/2009 04:34:31 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s 04:34:41 PM all 13.40 0.01 0.27 0.00 0.00 0.04 86.28 1626.60 04:34:41 PM 0 2.40 0.10 0.90 0.00 0.00 0.20 96.40 600.40 04:34:41 PM 1 0.10 0.00 0.10 0.00 0.00 0.00 99.80 16.60 04:34:41 PM 2 1.70 0.00 0.40 0.00 0.00 0.00 97.90 556.40 04:34:41 PM 3 0.70 0.00 0.40 0.00 0.00 0.00 98.90 452.80 04:34:41 PM 4 1.20 0.00 0.30 0.00 0.00 0.00 98.40 0.00 04:34:41 PM 5 0.20 0.00 0.10 0.00 0.00 0.00 99.70 0.20 04:34:41 PM 6 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 04:34:41 PM 7 0.80 0.00 0.10 0.00 0.00 0.00 99.10 0.10 [root@eqeuro1u ~]# sar -P 6 10 10 Linux 2.6.9-78.0.1.ELsmp (eqeuro1u) 01/13/2009 04:29:18 PM CPU %user %nice %system %iowait %idle 04:29:28 PM 6 100.00 0.00 0.00 0.00 0.00 04:29:38 PM 6 100.00 0.00 0.00 0.00 0.00 04:29:48 PM 6 100.00 0.00 0.00 0.00 0.00 04:29:58 PM 6 100.00 0.00 0.00 0.00 0.00 04:30:08 PM 6 100.00 0.00 0.00 0.00 0.00 04:30:18 PM 6 100.00 0.00 0.00 0.00 0.00 Truss of the ncsd process while in this state: 4771 futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...> 4774 futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...> 4775 futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...> 4777 futex(0x552abc55a4, FUTEX_WAIT, 4851653, NULL <unfinished ...> 4778 futex(0x552abc55a4, FUTEX_WAIT, 4851653, NULL <unfinished ...> 4779 futex(0x552abc55a4, FUTEX_WAIT, 4851653, NULL <unfinished ...> Currently I've requested a sysreport from the SA and will post it here when received other reporters:This is a known issue that has been seen in RHEL4 > 4-07-09 283558 nscd using 100% CPU on RHEL 4.7 > 3-13-09 275573 nscd using 100% CPU on RHEL 4.6 > 3-04-09 272426 nscd using 100% CPU on RHEL 4.6 > 2-10-09 264826 nscd using 100% CPU on RHEL 4.7 > 2-08-09 264249 nscd using 100% CPU on RHEL 4.5 > 11-06-08 237055 nscd using 100% CPU on RHEL 4 > 9-16-08 221175 nscd using 100% CPU on RHEL 4
I also see this in our cluster every few weeks, at the moment I have about 20 of them with nscd stuck in the futex(0x552abc5500, FUTEX_WAIT, 2, NULL <unfinished ...> call using 100% of the cpu.
Is this issue related to bug 496201?
It is believed that this issue is related to a kernel bug and should not need to be fixed in glibc. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f39894d1b5c253b10fcb8fbbbcf65a330f6cdc7 Please apply this patch and test. Let us know asap if this resolves the issue.
Yes, sorry we didn't ping back here earlier. The issue has been resolved in kernel space. Grab the latest kernel from the link below and report back your test results. http://people.redhat.com/vgoyal/rhel4/
*** This bug has been marked as a duplicate of bug 496201 ***
*** Bug 495083 has been marked as a duplicate of this bug. ***
*** Bug 492581 has been marked as a duplicate of this bug. ***