Description of problem: # service nscd stop # rm -f /var/db/nscd/* # service nscd start # nscd -g nscd configuration: 0 server debug level 27s server runtime -1 current number of threads 32 maximum number of threads 0 number of times clients had to wait paranoia mode enabled 3600 restart internal passwd cache: cache is enabled cache is persistent cache is shared 211 suggested size 216064 total data pool size 0 used data pool size 600 seconds time to live for positive entries 20 seconds time to live for negative entries 10949039793983449144 cache hits on positive entries 11529188693938805532 cache hits on negative entries 11529188918037037044 cache misses on positive entries 12884901888 cache misses on negative entries 0% cache hit rate 2549596148 current number of cached values 2684348608 maximum number of cached values 3 maximum chain length searched 10948798150397887758 number of delays on rdlock 10950432075457239728 number of delays on wrlock 11529189379343912960 memory allocations failed check /etc/passwd for changes group cache: cache is enabled cache is persistent cache is shared 211 suggested size 216064 total data pool size 0 used data pool size 3600 seconds time to live for positive entries 60 seconds time to live for negative entries 10949039793983449144 cache hits on positive entries 11529188693938805532 cache hits on negative entries 11529188918037037044 cache misses on positive entries 12884901888 cache misses on negative entries 0% cache hit rate 2549596148 current number of cached values 2684348608 maximum number of cached values 3 maximum chain length searched 10948798150397887758 number of delays on rdlock 10950432075457239728 number of delays on wrlock 11529189379343912960 memory allocations failed check /etc/group for changes hosts cache: cache is enabled cache is persistent cache is shared 211 suggested size 216064 total data pool size 0 used data pool size 3600 seconds time to live for positive entries 20 seconds time to live for negative entries 10949039793983449144 cache hits on positive entries 11529188693938805532 cache hits on negative entries 11529188918037037044 cache misses on positive entries 12884901888 cache misses on negative entries 0% cache hit rate 2549596148 current number of cached values 2684348608 maximum number of cached values 3 maximum chain length searched 10948798150397887758 number of delays on rdlock 10950432075457239728 number of delays on wrlock 11529189379343912960 memory allocations failed check /etc/hosts for changes Version-Release number of selected component (if applicable): nscd-2.3.4-21 How reproducible: 100%
still with nscd-2.3.5-10
Investigation with valgrind and FC4 toolset gives: | # rm -f /var/db/nscd/* | # valgrind --db-attach=yes --tool=memcheck nscd -d | ... | ==23365== Syscall param write(buf) points to uninitialised byte(s) | ==23365== at 0x41C2A093: __write_nocancel (in /lib/libpthread-2.3.5.so) | ==23365== by 0x40AC: main (nscd.c:286) | ==23365== Address 0x43AFE6D0 is on thread 1's stack | ==23365== | ==23365== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y | ... | (gdb) symbol-file /usr/sbin/nscd | ... | (gdb) bt | #0 0x43aff022 in ?? () | #1 0x41c2a093 in ?? () | #2 0x00013cc0 in ?? () | #3 0x00005252 in nscd_init () at connections.c:425 | #4 0x000040ad in main (argc=2, argv=0x43afe914) at nscd.c:286 | (gdb) frame 3 | #3 0x00005252 in nscd_init () at connections.c:425 | 425 if ((TEMP_FAILURE_RETRY (write (fd, &head, sizeof (head))) | (gdb) p head | $1 = {version = 1, header_size = 104, gc_cycle = 1135601508, nscd_certainly_running = 1135601368, | timestamp = 0, module = 211, data_size = 216064, first_free = 0, nentries = 1102942196, | maxnentries = 1135601312, maxnsearched = 3, poshit = 5430568472, neghit = 4877371648521529937, | posmiss = 1624419140681999920, negmiss = 4737100662300926739, rdlockdelayed = 4735439317410855342, | wrlockdelayed = 4737100661576894536, addfailed = 4877370393255481344, array = 0x43afe730} | (gdb) list 413,425 | 413 /* Create the header of the file. */ | 414 struct database_pers_head head = | 415 { | 416 .version = DB_VERSION, | 417 .header_size = sizeof (head), | 418 .module = dbs[cnt].suggested_module, | 419 .data_size = (dbs[cnt].suggested_module | 420 * DEFAULT_DATASIZE_PER_BUCKET), | 421 .first_free = 0 | 422 }; | 423 void *mem; | 424 | 425 if ((TEMP_FAILURE_RETRY (write (fd, &head, sizeof (head))) Code around this place is | 0x000051f8 <nscd_init+1944>: ja 0x547c <nscd_init+2588> | 0x000051fe <nscd_init+1950>: test %edi,%edi | 0x00005200 <nscd_init+1952>: jne 0x55b7 <nscd_init+2903> | 0x00005206 <nscd_init+1958>: mov 0xfffffe2c(%ebp),%edi | 0x0000520c <nscd_init+1964>: mov (%edi),%eax | 0x0000520e <nscd_init+1966>: movl $0x1,0xfffffed4(%ebp) | 0x00005218 <nscd_init+1976>: movl $0x68,0xfffffed8(%ebp) | 0x00005222 <nscd_init+1986>: mov %eax,0xfffffeec(%ebp) | 0x00005228 <nscd_init+1992>: shl $0xa,%eax | 0x0000522b <nscd_init+1995>: mov %eax,0xfffffef0(%ebp) | 0x00005231 <nscd_init+2001>: movl $0x0,0xfffffef4(%ebp) | 0x0000523b <nscd_init+2011>: lea 0xfffffed4(%ebp),%esi | 0x00005241 <nscd_init+2017>: lea 0x0(%esi),%esi | 0x00005244 <nscd_init+2020>: push $0x68 | 0x00005246 <nscd_init+2022>: push %esi | 0x00005247 <nscd_init+2023>: pushl 0xfffffe14(%ebp) | 0x0000524d <nscd_init+2029>: call 0x31a0 | 0x00005252 <nscd_init+2034>: add $0xc,%esp It seems, that the not explicitly initialized members of the 'head' structure will not be zeroed. So perhaps a gcc fault? gcc-4.0.0-8.i386 nscd-2.3.5-10.i386 glibc-2.3.5-10.i686
I have seen the bogus values (and nscd crashes) on vanilla kernels only; on RH kernels it *seems* to work. But I think that this is caused by another memory mapping only which might zero the stack on the RH kernels. But you can verify it on RH kernels with the valgrind command mentioned above: | # service nscd stop | # rm -f /var/db/nscd/* | # valgrind --tool=memcheck nscd -d When you see something like ==6356== Syscall param write(buf) points to uninitialised byte(s) ==6356== at 0x47B093: __write_nocancel (in /lib/libpthread-2.3.5.so) ==6356== by 0x40AC: main (in /usr/sbin/nscd) ==6356== Address 0x52BFE760 is on thread 1's stack you ran into the uninitialized 'head' case analysed above.
It is caused by the special 'struct database_pers_head' structure which contains an element 'ref_t array[0];' at the end: nscd-client.h: | struct database_pers_head | { | int32_t version; | ... | ref_t array[0]; | }; This seems to cause gcc, not to initialize not explicitly named members in structures. E.g. ---- struct Foo { int a; int b; char c[0]; }; int main() { struct Foo a = { .a = 0 }; return a.b; } ---- $ gcc -O0 foo.c $ ./a.out ; echo $? 112 Without the 'char c[0]', you get the expected '0'.
Example above has really weird behavior. Using '-Wall -W -O2' cflags warns | foo.c:11: warning: 'a.b' is used uninitialized in this function Adding a new member | int b; |+ int d; | char c[0]; removes the warning and gives the expected '0'.
Indeed, that's a GCC bug. See http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01103.html Thanks for tracking this down, I could not reproduce it on my box before.
gcc seems to be fixed now so a rebuild of glibc might be in order?
*** Bug 155124 has been marked as a duplicate of this bug. ***
Should be fixed in nscd-2.3.5-10.2, which has just been pushed to FC4 testing.