Bug 164001

Summary: nscd is spinning. Cannot debug it due to selinux fault
Product: [Fedora] Fedora Reporter: Derek Atkins <warlord>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: jaboutbo
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.3.90-8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-09 07:53:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Atkins 2005-07-22 19:29:31 UTC
I hope this is the right component -- there is no component specifically for
nscd.  As it's built from glibc I figure it might be the best place to send this
report.

Description of problem:
nscd is spinning, taking up all the CPU time it can.   When all else is idle it
sits there and takes up 98% of the CPU.  I cannot figure out what kind of work
it's doing.  I don't see any network activity.  I don't know why it started
doing this, either, or when.  It hasn't been doing it the whole time, it started
relatively recently.

If I try to "gdb attach" to the running process I get an selinux violation that
"signal" is not allowed.

Version-Release number of selected component (if applicable):
nscd-2.3.5-0.fc3.1

How reproducible:
It seems to do it every time.  Even if I restart nscd it will (eventually) start
spinning.  There are no log messages to explain why it might be spinning.

Steps to Reproduce:
1. boot into fc3
2. login and wait
3.
  
Actual results:
Here is "top" from when I was performing a compile.  Notice that nscd is taking
up even more CPU time than the compiler!

Tasks: 122 total,   2 running, 120 sleeping,   0 stopped,   0 zombie
Cpu(s): 92.1% us,  7.3% sy,  0.3% ni,  0.0% id,  0.0% wa,  0.3% hi,  0.0% si
Mem:   1555136k total,   779052k used,   776084k free,    47388k buffers
Swap:  2048248k total,        0k used,  2048248k free,   461908k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4724 nscd      16   0 21140 1248  968 S 57.5  0.1 125:03.59 nscd
 4150 root      15   0  168m  35m  10m S  4.0  2.4   1:26.71 X
 6825 warlord   16   0  4888 1708  908 S  3.7  0.1   0:00.11 sh
 4882 warlord   16   0 35284  17m  11m S  3.0  1.2   0:11.52 gaim
 4875 warlord   15   0 43940  18m 8400 S  2.3  1.2   0:03.20 gnome-terminal
 4850 warlord   16   0 41724  18m 9876 S  0.3  1.2   0:00.96 nautilus


Expected results:
nscd shouldn't spin.  It appears to also cause gdm to hang, as when I try to
logout from the root account it takes an inordinate amount of time to actually
cause X to die.

Additional info:

I'd provide any if I could.  I'm happy to do whatever you want to help.

Comment 1 Jack Aboutboul 2005-07-22 19:38:14 UTC
Does this happen when you boot with SELinux turned off?  If so, can you try
getting some information from gdb at that point?

Comment 2 Derek Atkins 2005-07-22 19:48:42 UTC
Yes, it still happens without SELinux.  Here's the backtrace.  Not very useful
with the -debuginfo I'm afraid:

(gdb) attach 4036
Attaching to process 4036
warning: The current VSYSCALL page code requires an existing execuitable.
Use "add-symbol-file-from-memory" to load the VSYSCALL page by hand
Reading symbols from /usr/sbin/nscd...(no debugging symbols found)...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".
(no debugging symbols found)...Loaded symbols for /usr/sbin/nscd
Reading symbols from /lib/tls/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/tls/librt.so.1
Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread -1210603840 (LWP 4036)]
[New Thread -1218663504 (LWP 4078)]
[New Thread -1217610832 (LWP 4077)]
[New Thread -1216558160 (LWP 4076)]
[New Thread -1215505488 (LWP 4075)]
[New Thread -1214452816 (LWP 4074)]
[New Thread -1213400144 (LWP 4073)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libselinux.so.1...(no debugging symbols
found)...done.Loaded symbols for /lib/libselinux.so.1
Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
0xb7f1a402 in ?? ()
(gdb) t a a bt

Thread 7 (Thread -1213400144 (LWP 4073)):
#0  0xb7f1a402 in ?? ()
#1  0xb7ed1cfc in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0xb7f21b4b in main () from /usr/sbin/nscd

Thread 6 (Thread -1214452816 (LWP 4074)):
#0  0xb7f27d53 in gethostbyname2_r () from /usr/sbin/nscd
#1  0xb7f21ba7 in main () from /usr/sbin/nscd

Thread 5 (Thread -1215505488 (LWP 4075)):
#0  0xb7f1a402 in ?? ()
#1  0xb7ed1cfc in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0xb7f21b4b in main () from /usr/sbin/nscd

Thread 4 (Thread -1216558160 (LWP 4076)):
#0  0xb7f1a402 in ?? ()
#1  0xb7ed1a86 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0xb7f21a28 in main () from /usr/sbin/nscd

---Type <return> to continue, or q <return> to quit---
Thread 3 (Thread -1217610832 (LWP 4077)):
#0  0xb7f1a402 in ?? ()
#1  0xb7ed1a86 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0xb7f21a28 in main () from /usr/sbin/nscd

Thread 2 (Thread -1218663504 (LWP 4078)):
#0  0xb7f1a402 in ?? ()
#1  0xb7ed1a86 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0xb7f21a28 in main () from /usr/sbin/nscd

Thread 1 (Thread -1210603840 (LWP 4036)):
#0  0xb7f1a402 in ?? ()
#1  0xb7e4361e in epoll_wait () from /lib/tls/libc.so.6
#2  0xb7f22889 in main () from /usr/sbin/nscd
(gdb)


Comment 3 Jakub Jelinek 2005-07-25 12:51:51 UTC
Can you: a) make a copy of /var/db/nscd b) remove all 3 files in it c) restart
nscd?  The most likely reason in this case would be an unclean shutdown leaving
the persistent database in an inconsistent state.

Comment 4 Derek Atkins 2005-07-25 13:13:02 UTC
I just renamed all the files in the directory from foo -> foo.bak and restarted
nscd.  It is no longer spinning.  Thank you.


Comment 5 Jakub Jelinek 2005-08-09 07:53:51 UTC
rawhide glibc (2.3.90-8) includes a nscd persistent database verified, which is
run on nscd startup.  If the database is corrupted, nscd will remove it and
recreate it from scratch.  If this works well in rawhide, it will be eventually
backported to FC4 and maybe FC3 as well.