Bug 443713
Summary: | [RHEL5] nscd SEGV's periodically | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Aaron Richton <richton> |
Component: | glibc | Assignee: | Jeff Law <law> |
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.1 | CC: | aoliva, drepper, fweimer, jakub, law |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-01-20 08:04:21 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Aaron Richton
2008-04-22 23:09:43 UTC
OK, it crashed with MALLOC_CHECK_=3. No change in the backtrace: Core was generated by `/usr/sbin/nscd'. Program terminated with signal 11, Segmentation fault. #0 gc (db=0x55555576e330) at mem.c:96 96 mark[elem++] = ALLBITS; #0 gc (db=0x55555576e330) at mem.c:96 #1 0xffffffffffffffff in ?? () #2 0xffffffffffffffff in ?? () #3 0xffffffffffffffff in ?? () [...] #806 0xffffffffffffffff in ?? () #807 0xffffffffffffffff in ?? () #808 0x0000000000000000 in ?? () I turned up the debug level and see that nscd crashed removing entries: # tail -4 /var/log/nscd.log 28398: remove GETHOSTBYADDR entry "218.57.182.136" 28398: remove GETHOSTBYADDR entry "77.210.97.189" 28398: remove GETHOSTBYNAME entry "adsl-79-81.ttk.if.ua" 28398: remove GETHOSTBYADDR entry "24.244.158.246" The backtrace is different now and seems to match the log: Core was generated by `/usr/sbin/nscd'. Program terminated with signal 11, Segmentation fault. #0 0x00005555555637d1 in gc (db=0x55555576e330) at mem.c:303 303 new_move->from = db->data + off_alloc; (gdb) where #0 0x00005555555637d1 in gc (db=0x55555576e330) at mem.c:303 #1 0x0000555555562a0f in prune_cache (table=0x55555576e330, now=1208956471, fd=-1) at cache.c:486 #2 0x000055555555c303 in nscd_run (p=0x1) at connections.c:1484 #3 0x00002aaaaaed52f7 in start_thread (arg=<value optimized out>) at pthread_create.c:296 #4 0x00002aaaaba0185d in clone () from /lib64/libc.so.6 Note that this is still with MALLOC_CHECK_=3. I'm pretty sure both of these stack trackes are caused by excessive use of alloca() for mark, in one case, and new_move in the other. The second is certainly the first use of new_move's storage, right after allocation, and the former is possibly the first use of mark. When the cache size grows large enough, we end up allocating too much stack space for cache garbage collection, and since we start accessing it by the bottom, we may end up accessing unmapped pages below the allocated stack bottom. glibc 2.5-36 and newer fix this problem by using alloca only for small-enough allocations, and using malloc() otherwise. As noted, this bug is a duplicate of 483636. Bug 483636 was fixed by this errata: http://rhn.redhat.com/errata/RHBA-2009-1415.html *** This bug has been marked as a duplicate of bug 483636 *** |