Bug 154782

Summary: 'nscd -g' gives bogus results
Product: [Fedora] Fedora Reporter: Enrico Scholz <rh-bugzilla>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: pierre-bugzilla
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.3.5-10.2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-07-28 18:28:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150222, 158504    

Description Enrico Scholz 2005-04-14 03:30:37 UTC
Description of problem:

# service nscd stop
# rm -f /var/db/nscd/*
# service nscd start 


# nscd -g
nscd configuration:

              0  server debug level
            27s  server runtime
             -1  current number of threads
             32  maximum number of threads
              0  number of times clients had to wait
                 paranoia mode enabled
           3600  restart internal

passwd cache:

                 cache is enabled
                 cache is persistent
                 cache is shared
            211  suggested size
         216064  total data pool size
              0  used data pool size
            600  seconds time to live for positive entries
             20  seconds time to live for negative entries
10949039793983449144  cache hits on positive entries
11529188693938805532  cache hits on negative entries
11529188918037037044  cache misses on positive entries
    12884901888  cache misses on negative entries
              0% cache hit rate
     2549596148  current number of cached values
     2684348608  maximum number of cached values
              3  maximum chain length searched
10948798150397887758  number of delays on rdlock
10950432075457239728  number of delays on wrlock
11529189379343912960  memory allocations failed
                 check /etc/passwd for changes

group cache:

                 cache is enabled
                 cache is persistent
                 cache is shared
            211  suggested size
         216064  total data pool size
              0  used data pool size
           3600  seconds time to live for positive entries
             60  seconds time to live for negative entries
10949039793983449144  cache hits on positive entries
11529188693938805532  cache hits on negative entries
11529188918037037044  cache misses on positive entries
    12884901888  cache misses on negative entries
              0% cache hit rate
     2549596148  current number of cached values
     2684348608  maximum number of cached values
              3  maximum chain length searched
10948798150397887758  number of delays on rdlock
10950432075457239728  number of delays on wrlock
11529189379343912960  memory allocations failed
                 check /etc/group for changes

hosts cache:

                 cache is enabled
                 cache is persistent
                 cache is shared
            211  suggested size
         216064  total data pool size
              0  used data pool size
           3600  seconds time to live for positive entries
             20  seconds time to live for negative entries
10949039793983449144  cache hits on positive entries
11529188693938805532  cache hits on negative entries
11529188918037037044  cache misses on positive entries
    12884901888  cache misses on negative entries
              0% cache hit rate
     2549596148  current number of cached values
     2684348608  maximum number of cached values
              3  maximum chain length searched
10948798150397887758  number of delays on rdlock
10950432075457239728  number of delays on wrlock
11529189379343912960  memory allocations failed
                 check /etc/hosts for changes


Version-Release number of selected component (if applicable):

nscd-2.3.4-21


How reproducible:

100%

Comment 1 Enrico Scholz 2005-06-01 20:08:41 UTC
still with nscd-2.3.5-10


Comment 2 Enrico Scholz 2005-06-12 18:00:50 UTC
Investigation with valgrind and FC4 toolset gives:

| # rm -f /var/db/nscd/*
| # valgrind --db-attach=yes --tool=memcheck nscd -d
| ...
| ==23365== Syscall param write(buf) points to uninitialised byte(s)
| ==23365==    at 0x41C2A093: __write_nocancel (in /lib/libpthread-2.3.5.so)
| ==23365==    by 0x40AC: main (nscd.c:286)
| ==23365==  Address 0x43AFE6D0 is on thread 1's stack
| ==23365== 
| ==23365== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y
| ...
| (gdb) symbol-file /usr/sbin/nscd
| ...
| (gdb) bt
| #0  0x43aff022 in ?? ()
| #1  0x41c2a093 in ?? ()
| #2  0x00013cc0 in ?? ()
| #3  0x00005252 in nscd_init () at connections.c:425
| #4  0x000040ad in main (argc=2, argv=0x43afe914) at nscd.c:286
| (gdb) frame 3
| #3  0x00005252 in nscd_init () at connections.c:425
| 425                     if ((TEMP_FAILURE_RETRY (write (fd, &head, sizeof (head)))
| (gdb) p head
| $1 = {version = 1, header_size = 104, gc_cycle = 1135601508, nscd_certainly_running = 1135601368, 
|   timestamp = 0, module = 211, data_size = 216064, first_free = 0, nentries = 1102942196, 
|   maxnentries = 1135601312, maxnsearched = 3, poshit = 5430568472, neghit = 4877371648521529937, 
|   posmiss = 1624419140681999920, negmiss = 4737100662300926739, rdlockdelayed = 4735439317410855342, 
|   wrlockdelayed = 4737100661576894536, addfailed = 4877370393255481344, array = 0x43afe730}
| (gdb) list 413,425
| 413                     /* Create the header of the file.  */
| 414                     struct database_pers_head head =
| 415                       {
| 416                         .version = DB_VERSION,
| 417                         .header_size = sizeof (head),
| 418                         .module = dbs[cnt].suggested_module,
| 419                         .data_size = (dbs[cnt].suggested_module
| 420                                       * DEFAULT_DATASIZE_PER_BUCKET),
| 421                         .first_free = 0
| 422                       };
| 423                     void *mem;
| 424
| 425                     if ((TEMP_FAILURE_RETRY (write (fd, &head, sizeof (head)))


Code around this place is

| 0x000051f8 <nscd_init+1944>:    ja     0x547c <nscd_init+2588>
| 0x000051fe <nscd_init+1950>:    test   %edi,%edi
| 0x00005200 <nscd_init+1952>:    jne    0x55b7 <nscd_init+2903>
| 0x00005206 <nscd_init+1958>:    mov    0xfffffe2c(%ebp),%edi
| 0x0000520c <nscd_init+1964>:    mov    (%edi),%eax
| 0x0000520e <nscd_init+1966>:    movl   $0x1,0xfffffed4(%ebp)
| 0x00005218 <nscd_init+1976>:    movl   $0x68,0xfffffed8(%ebp)
| 0x00005222 <nscd_init+1986>:    mov    %eax,0xfffffeec(%ebp)
| 0x00005228 <nscd_init+1992>:    shl    $0xa,%eax
| 0x0000522b <nscd_init+1995>:    mov    %eax,0xfffffef0(%ebp)
| 0x00005231 <nscd_init+2001>:    movl   $0x0,0xfffffef4(%ebp)
| 0x0000523b <nscd_init+2011>:    lea    0xfffffed4(%ebp),%esi
| 0x00005241 <nscd_init+2017>:    lea    0x0(%esi),%esi
| 0x00005244 <nscd_init+2020>:    push   $0x68
| 0x00005246 <nscd_init+2022>:    push   %esi
| 0x00005247 <nscd_init+2023>:    pushl  0xfffffe14(%ebp)
| 0x0000524d <nscd_init+2029>:    call   0x31a0
| 0x00005252 <nscd_init+2034>:    add    $0xc,%esp



It seems, that the not explicitly initialized members of the 'head'
structure will not be zeroed. So perhaps a gcc fault?


gcc-4.0.0-8.i386
nscd-2.3.5-10.i386
glibc-2.3.5-10.i686

Comment 3 Enrico Scholz 2005-06-12 20:25:14 UTC
I have seen the bogus values (and nscd crashes) on vanilla kernels only; on RH
kernels it *seems* to work. But I think that this is caused by another memory
mapping only which might zero the stack on the RH kernels.

But you can verify it on RH kernels with the valgrind command mentioned above:

| # service nscd stop
| # rm -f /var/db/nscd/*
| # valgrind  --tool=memcheck nscd -d

When you see something like

==6356== Syscall param write(buf) points to uninitialised byte(s)
==6356==    at 0x47B093: __write_nocancel (in /lib/libpthread-2.3.5.so)
==6356==    by 0x40AC: main (in /usr/sbin/nscd)
==6356==  Address 0x52BFE760 is on thread 1's stack

you ran into the uninitialized 'head' case analysed above.

Comment 4 Enrico Scholz 2005-06-12 20:46:16 UTC
It is caused by the special 'struct database_pers_head' structure which
contains an element 'ref_t array[0];' at the end:

nscd-client.h:
| struct database_pers_head
| {
|   int32_t version;
|   ...
|   ref_t array[0];
| };


This seems to cause
gcc, not to initialize not explicitly named members in structures.

E.g.

----
struct Foo
{
    int		a;
    int		b;
    char	c[0];
};

int main()
{
  struct Foo	a = { .a = 0 };
  return a.b;
}
----

$ gcc -O0 foo.c
$ ./a.out ; echo $?
112


Without the 'char c[0]', you get the expected '0'.


Comment 5 Enrico Scholz 2005-06-12 21:52:00 UTC
Example above has really weird behavior. Using '-Wall -W -O2' cflags warns

| foo.c:11: warning: 'a.b' is used uninitialized in this function


Adding a new member

|    int      b;
|+   int      d;
|    char     c[0];

removes the warning and gives the expected '0'.

Comment 6 Jakub Jelinek 2005-06-13 13:46:09 UTC
Indeed, that's a GCC bug.
See http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01103.html
Thanks for tracking this down, I could not reproduce it on my box before.

Comment 7 Pierre Ossman 2005-06-29 06:56:28 UTC
gcc seems to be fixed now so a rebuild of glibc might be in order?

Comment 8 Ulrich Drepper 2005-07-08 07:22:02 UTC
*** Bug 155124 has been marked as a duplicate of this bug. ***

Comment 9 Jakub Jelinek 2005-07-28 18:28:59 UTC
Should be fixed in nscd-2.3.5-10.2, which has just been pushed to FC4 testing.