This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 443713 - [RHEL5] nscd SEGV's periodically
[RHEL5] nscd SEGV's periodically
Status: CLOSED DUPLICATE of bug 483636
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: glibc (Show other bugs)
5.1
x86_64 Linux
low Severity medium
: rc
: ---
Assigned To: Jeff Law
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-22 19:09 EDT by Aaron Richton
Modified: 2012-01-20 03:04 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-01-20 03:04:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Aaron Richton 2008-04-22 19:09:43 EDT
Description of problem:
nscd SEGV's periodically. I was hoping to catch it under valgrind, but
apparently valgrind is missing syscalls so that's not in the cards. Maybe I can
use MALLOC_CHECK_...

Version-Release number of selected component (if applicable):
glibc-2.5-18.el5_1.1

How reproducible:
The crashes are pretty consistent...maybe a few days apart across six servers.

Steps to Reproduce:
1. "/sbin/service nscd start"
2. wait a few days...
  
Actual results:
core dump

Expected results:
no core dump

Additional info:
Core was generated by `/usr/sbin/nscd'.
Program terminated with signal 11, Segmentation fault.
#0  gc (db=0x55555576e330) at mem.c:96
96            mark[elem++] = ALLBITS;
Comment 1 Aaron Richton 2008-04-23 08:59:06 EDT
OK, it crashed with MALLOC_CHECK_=3. No change in the backtrace:
Core was generated by `/usr/sbin/nscd'.
Program terminated with signal 11, Segmentation fault.
#0  gc (db=0x55555576e330) at mem.c:96
96            mark[elem++] = ALLBITS;

#0  gc (db=0x55555576e330) at mem.c:96
#1  0xffffffffffffffff in ?? ()
#2  0xffffffffffffffff in ?? ()
#3  0xffffffffffffffff in ?? ()
[...]
#806 0xffffffffffffffff in ?? ()
#807 0xffffffffffffffff in ?? ()
#808 0x0000000000000000 in ?? ()
Comment 2 Aaron Richton 2008-04-23 16:41:54 EDT
I turned up the debug level and see that nscd crashed removing entries:

# tail -4 /var/log/nscd.log 
28398: remove GETHOSTBYADDR entry "218.57.182.136"
28398: remove GETHOSTBYADDR entry "77.210.97.189"
28398: remove GETHOSTBYNAME entry "adsl-79-81.ttk.if.ua"
28398: remove GETHOSTBYADDR entry "24.244.158.246"

The backtrace is different now and seems to match the log:
Core was generated by `/usr/sbin/nscd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00005555555637d1 in gc (db=0x55555576e330) at mem.c:303
303           new_move->from = db->data + off_alloc;
(gdb) where
#0  0x00005555555637d1 in gc (db=0x55555576e330) at mem.c:303
#1  0x0000555555562a0f in prune_cache (table=0x55555576e330, now=1208956471,
fd=-1) at cache.c:486
#2  0x000055555555c303 in nscd_run (p=0x1) at connections.c:1484
#3  0x00002aaaaaed52f7 in start_thread (arg=<value optimized out>) at
pthread_create.c:296
#4  0x00002aaaaba0185d in clone () from /lib64/libc.so.6

Note that this is still with MALLOC_CHECK_=3.
Comment 3 Alexandre Oliva 2012-01-20 02:34:53 EST
I'm pretty sure both of these stack trackes are caused by excessive use of alloca() for mark, in one case, and new_move in the other.  The second is certainly the first use of new_move's storage, right after allocation, and the former is possibly the first use of mark.  When the cache size grows large enough, we end up allocating too much stack space for cache garbage collection, and since we start accessing it by the bottom, we may end up accessing unmapped pages below the allocated stack bottom.

glibc 2.5-36 and newer fix this problem by using alloca only for small-enough allocations, and using malloc() otherwise.
Comment 4 Jeff Law 2012-01-20 03:04:21 EST
As noted, this bug is a duplicate of 483636.  Bug 483636 was fixed by this errata:  http://rhn.redhat.com/errata/RHBA-2009-1415.html

*** This bug has been marked as a duplicate of bug 483636 ***

Note You need to log in before you can comment on or make changes to this bug.