Description of problem: As mentioned in #163538, I am seeing this even in the latest FC4 testing glibc/nscd (glibc-2.3.5-10.2): 30677: Reloading "14339447" in password cache! *** Segmentation fault Register dump: EAX: 00000001 EBX: 00e7aca0 ECX: 0000008c EDX: 00000005 ESI: b726b3b8 EDI: 6e54a504 EBP: b7067db4 ESP: b7067bac EIP: 00e72836 EFLAGS: 00010a13 CS: 0073 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Trap: 0000000e Error: 00000004 OldMask: 00000000 ESP/signal: b7067bac CR2: 6e54a518 Backtrace: /lib/libSegFault.so[0x483115] [0x1ad420] nscd[0xe6d6a0] /lib/libpthread.so.0[0xdb7b80] /lib/libc.so.6(__clone+0x5e)[0x6d99ae] The UID is legit... just quite a bit higher than ordinary UIDs. It smells like #163538 to me, hence my questions there. Version-Release number of selected component (if applicable): $ rpm -q glibc glibc-2.3.5-10.2 $ rpm -q nscd nscd-2.3.5-10.2 $ rpm -q glibc-debuginfo glibc-debuginfo-2.3.5-10.2 How reproducible: Always Steps to Reproduce: 1. Start nscd 2. Run any program that involves an uid lookup Actual results: nscd crashes Expected results: No crash Additional info: I had glibc-debuginfo already installed at the time of my post in #163538, like in the rpm output above, but no meaningful stack trace. I don't see the same happening on FC3 systems with the same nscd.conf and nsswitch.conf (passwd: files ldap and shadow: files ldap).
Have you wiped the old /var/db/nscd/* cache after you upgraded from 2.3.5-10? nscd-2.3.5-10 (the original FC4 nscd) was miscompiled, so it is possible it created a broken persistent cache files. And when this happens, even fixed nscd crashes on it (a database checker for nscd databases is still work in progress). The above libSegFault.so output is not really very much useful, as nscd is a PIE. I can just guess the instruction where it crashed, the backtrace most probably has the 3rd frame in nscd_run right after the call to prune_cache, but it might very well be in gc (which would support the theory of broken database files from 2.3.5-10). So, can you please remove /var/db/nscd/* (after making a backup copy) and if you can reproduce the problem with 2.3.5-10.2 even after that, try to reproduce it with gdb --args nscd -d and find more details?
I tried running nscd under gdb as you suggested and I got these two dumps: 19575: Reloading "99" in password cache! 19575: remove GETPWBYNAME entry "pcap" 19575: remove GETPWBYUID entry "77" Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1221579856 (LWP 19601)] 0x0036332f in gc (db=0x36b040) at mem.c:354 354 && (*next_data)->packet == off_alloc); 19628: remove GETGRBYGID entry "47" Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1223939152 (LWP 19656)] 0x0025c836 in prune_cache (table=0x265140, now=1123507878) at cache.c:242 242 struct datahead *dh = (struct datahead *) (data + runp->packet); I then removed the cache files and, voila', it works again. Thanks for the help, I should have thought of that. Marking as fixed in 2.3.5-10.2; I guess you might want to change it to FUTURERELEASE if you think the problem is more satisfactorily fixed by the checker, whenever that is going to be merged in.
rawhide glibc (2.3.90-8) includes a nscd persistent database verifier, which is run on nscd startup. If the database is corrupted, nscd will remove it and recreate it from scratch. If this works well in rawhide, it will be eventually backported to FC4 and maybe FC3 as well.