I hope this is the right component -- there is no component specifically for nscd. As it's built from glibc I figure it might be the best place to send this report. Description of problem: nscd is spinning, taking up all the CPU time it can. When all else is idle it sits there and takes up 98% of the CPU. I cannot figure out what kind of work it's doing. I don't see any network activity. I don't know why it started doing this, either, or when. It hasn't been doing it the whole time, it started relatively recently. If I try to "gdb attach" to the running process I get an selinux violation that "signal" is not allowed. Version-Release number of selected component (if applicable): nscd-2.3.5-0.fc3.1 How reproducible: It seems to do it every time. Even if I restart nscd it will (eventually) start spinning. There are no log messages to explain why it might be spinning. Steps to Reproduce: 1. boot into fc3 2. login and wait 3. Actual results: Here is "top" from when I was performing a compile. Notice that nscd is taking up even more CPU time than the compiler! Tasks: 122 total, 2 running, 120 sleeping, 0 stopped, 0 zombie Cpu(s): 92.1% us, 7.3% sy, 0.3% ni, 0.0% id, 0.0% wa, 0.3% hi, 0.0% si Mem: 1555136k total, 779052k used, 776084k free, 47388k buffers Swap: 2048248k total, 0k used, 2048248k free, 461908k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4724 nscd 16 0 21140 1248 968 S 57.5 0.1 125:03.59 nscd 4150 root 15 0 168m 35m 10m S 4.0 2.4 1:26.71 X 6825 warlord 16 0 4888 1708 908 S 3.7 0.1 0:00.11 sh 4882 warlord 16 0 35284 17m 11m S 3.0 1.2 0:11.52 gaim 4875 warlord 15 0 43940 18m 8400 S 2.3 1.2 0:03.20 gnome-terminal 4850 warlord 16 0 41724 18m 9876 S 0.3 1.2 0:00.96 nautilus Expected results: nscd shouldn't spin. It appears to also cause gdm to hang, as when I try to logout from the root account it takes an inordinate amount of time to actually cause X to die. Additional info: I'd provide any if I could. I'm happy to do whatever you want to help.
Does this happen when you boot with SELinux turned off? If so, can you try getting some information from gdb at that point?
Yes, it still happens without SELinux. Here's the backtrace. Not very useful with the -debuginfo I'm afraid: (gdb) attach 4036 Attaching to process 4036 warning: The current VSYSCALL page code requires an existing execuitable. Use "add-symbol-file-from-memory" to load the VSYSCALL page by hand Reading symbols from /usr/sbin/nscd...(no debugging symbols found)...done. Using host libthread_db library "/lib/tls/libthread_db.so.1". (no debugging symbols found)...Loaded symbols for /usr/sbin/nscd Reading symbols from /lib/tls/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/tls/librt.so.1 Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread -1210603840 (LWP 4036)] [New Thread -1218663504 (LWP 4078)] [New Thread -1217610832 (LWP 4077)] [New Thread -1216558160 (LWP 4076)] [New Thread -1215505488 (LWP 4075)] [New Thread -1214452816 (LWP 4074)] [New Thread -1213400144 (LWP 4073)] Loaded symbols for /lib/tls/libpthread.so.0 Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libselinux.so.1...(no debugging symbols found)...done.Loaded symbols for /lib/libselinux.so.1 Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 0xb7f1a402 in ?? () (gdb) t a a bt Thread 7 (Thread -1213400144 (LWP 4073)): #0 0xb7f1a402 in ?? () #1 0xb7ed1cfc in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0xb7f21b4b in main () from /usr/sbin/nscd Thread 6 (Thread -1214452816 (LWP 4074)): #0 0xb7f27d53 in gethostbyname2_r () from /usr/sbin/nscd #1 0xb7f21ba7 in main () from /usr/sbin/nscd Thread 5 (Thread -1215505488 (LWP 4075)): #0 0xb7f1a402 in ?? () #1 0xb7ed1cfc in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0xb7f21b4b in main () from /usr/sbin/nscd Thread 4 (Thread -1216558160 (LWP 4076)): #0 0xb7f1a402 in ?? () #1 0xb7ed1a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0xb7f21a28 in main () from /usr/sbin/nscd ---Type <return> to continue, or q <return> to quit--- Thread 3 (Thread -1217610832 (LWP 4077)): #0 0xb7f1a402 in ?? () #1 0xb7ed1a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0xb7f21a28 in main () from /usr/sbin/nscd Thread 2 (Thread -1218663504 (LWP 4078)): #0 0xb7f1a402 in ?? () #1 0xb7ed1a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0xb7f21a28 in main () from /usr/sbin/nscd Thread 1 (Thread -1210603840 (LWP 4036)): #0 0xb7f1a402 in ?? () #1 0xb7e4361e in epoll_wait () from /lib/tls/libc.so.6 #2 0xb7f22889 in main () from /usr/sbin/nscd (gdb)
Can you: a) make a copy of /var/db/nscd b) remove all 3 files in it c) restart nscd? The most likely reason in this case would be an unclean shutdown leaving the persistent database in an inconsistent state.
I just renamed all the files in the directory from foo -> foo.bak and restarted nscd. It is no longer spinning. Thank you.
rawhide glibc (2.3.90-8) includes a nscd persistent database verified, which is run on nscd startup. If the database is corrupted, nscd will remove it and recreate it from scratch. If this works well in rawhide, it will be eventually backported to FC4 and maybe FC3 as well.