Bug 430324

Summary: nscd crashes with SIGABRT
Product: [Fedora] Fedora Reporter: Valdis Kletnieks <valdis.kletnieks>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 9CC: bnocera, csnook, deknuydt, drepper, intrep, jansen, javiplx, jorton, j, k.georgiou, matteo, ndbecker2, roth
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.8-8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-03 03:59:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Valdis Kletnieks 2008-01-26 03:33:36 UTC
Description of problem: nscd crashes after some amount of uptime with a SIGABRT.
It's quite repeatable - it *always* dies after 10-30 minutes.


Version-Release number of selected component (if applicable):
nscd-2.7.90-4

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
I attached gdb to it and waited for a crash, and it said:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x4082c950 (LWP 6498)]
0x00007fc883769f95 in raise () from /lib64/libc.so.6
(gdb) where
#0  0x00007fc883769f95 in raise () from /lib64/libc.so.6
#1  0x00007fc88376ba40 in abort () from /lib64/libc.so.6
#2  0x00007fc88376310f in __assert_fail () from /lib64/libc.so.6
#3  0x00007fc884934a75 in gc () from /usr/sbin/nscd
#4  0x00007fc884933507 in prune_cache () from /usr/sbin/nscd
#5  0x00007fc884929430 in nscd_run_prune () from /usr/sbin/nscd
#6  0x00007fc8842e7447 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fc88380fbdd in clone () from /lib64/libc.so.6

Running 'nscd --debug' ends with:
7409:   GETFDPW
7409: provide access to FD 6, for passwd
7409: handle_request: request received (Version = 2) from PID 7789
7409:   GETFDPW
7409: provide access to FD 6, for passwd
7409: Reloading "valdis" in password cache!
7409: remove GETPWBYUID entry "967"
7409: remove GETPWBYNAME entry "valdis"
nscd: mem.c:399: gc: Assertion `next_hash == &he[db->head->nentries]' failed.

Another run:
7822: handle_request: request received (Version = 2) from PID 8253
7822:   GETFDPW
7822: provide access to FD 6, for passwd
7822: Reloading "0" in password cache!
7822: Reloading "valdis" in password cache!
7822: remove GETPWBYUID entry "967"
7822: remove GETPWBYNAME entry "valdis"
nscd: mem.c:399: gc: Assertion `next_hash == &he[db->head->nentries]' failed.
Aborted

Comment 1 Bug Zapper 2008-05-14 04:53:26 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 2 Kostas Georgiou 2008-05-22 13:14:51 UTC
*** Bug 443388 has been marked as a duplicate of this bug. ***

Comment 3 Kostas Georgiou 2008-05-22 13:16:46 UTC
*** Bug 445656 has been marked as a duplicate of this bug. ***

Comment 4 Ben Herrick 2008-05-24 01:19:14 UTC
I get the same results using nscd under F9/i386:

May 19 12:35:34 mgmt1 kernel: nscd[16709]: segfault at fffc1ea0 ip b80a9025 sp
ad4c1d88 error 6 in nscd[b8098000+1e000]
May 19 14:51:23 mgmt1 kernel: nscd[11695]: segfault at ec80d426 ip b80d2b82 sp
ad0e6c14 error 5 in nscd[b80c2000+1e000]
May 19 16:00:38 mgmt1 kernel: nscd[3555]: segfault at b81ff0fc ip b8011178 sp
ad428e08 error 4 in nscd[b8000000+1e000]
May 19 21:49:37 mgmt1 kernel: nscd[6838]: segfault at fffe7e90 ip 0021bf46 sp
ad187e68 error 6 in libc-2.8.so[1a5000+163000]
May 20 01:43:15 mgmt1 kernel: nscd[3736]: segfault at 18 ip b80272fd sp ad43eef4
error 4 in nscd[b8016000+1e000]
May 20 12:11:38 mgmt1 kernel: nscd[15682]: segfault at b7f1fec2 ip b7f20a4f sp
ad135ea0 error 7 in nscd[b7f0f000+1e000]
May 20 13:17:58 mgmt1 kernel: nscd[9917]: segfault at 15a0b22c ip b7f89b82 sp
aaccec14 error 4 in nscd[b7f79000+1e000]
May 20 23:05:36 mgmt1 kernel: nscd[31898]: segfault at 12acf62d ip b805e236 sp
ad275078 error 4 in nscd[b804e000+1e000]
May 21 01:01:01 mgmt1 kernel: nscd[25308]: segfault at 1773c42c ip b8067b82 sp
ad07ac14 error 4 in nscd[b8057000+1e000]
May 21 11:39:20 mgmt1 kernel: nscd[20512]: segfault at 18 ip b7f632fd sp
ab3afec4 error 4 in nscd[b7f52000+1e000]
May 21 14:24:25 mgmt1 kernel: nscd[2447]: segfault at fffe50f8 ip b80172de sp
ad42ee18 error 4 in nscd[b8006000+1e000]
May 21 15:23:36 mgmt1 kernel: nscd[6218]: segfault at ed0ffedc ip b7f0a2c6 sp
ad11fe2c error 5 in nscd[b7ef9000+1e000]
May 22 14:40:12 mgmt1 kernel: nscd[26487]: segfault at ffffe0f8 ip b7f102de sp
ad327df4 error 4 in nscd[b7eff000+1e000]
May 22 15:49:21 mgmt1 kernel: nscd[3267]: segfault at ca8 ip b7ff5185 sp
ab23fe38 error 4 in nscd[b7fe4000+1e000]
May 23 17:08:37 mgmt1 kernel: nscd[3308]: segfault at f777124c ip b7fe121e sp
ad1f8078 error 5 in nscd[b7fd1000+1e000]


Comment 5 Matteo Corti 2008-05-27 07:14:50 UTC
You should change the Hardware tag to all instead of x86_64 since I am
experiencing the same problem on i386 as does Andrew (comment #4) and one of
bugs marked as duplicate of this one (Bug 443388).

Otherwise we should mark the duplicates as non-duplicates and continue with two
bug reports (one for architecture).

Comment 6 Matteo Corti 2008-05-29 12:54:54 UTC
This is the info I get from gdb:

29408: handle_request: request received (Version = 2) from PID 29633
29408: 	GETFDHST
29408: provide access to FD 11, for hosts
29408: handle_request: request received (Version = 2) from PID 29633
29408: 	GETFDPW
29408: provide access to FD 7, for passwd
29408: handle_request: request received (Version = 2) from PID 29633
29408: 	GETFDGR
29408: provide access to FD 9, for group
29408: handle_request: request received (Version = 2) from PID 29633
29408: 	GETFDHST
29408: provide access to FD 11, for hosts
29408: handle_request: request received (Version = 2) from PID 29634
29408: 	GETFDPW
29408: provide access to FD 7, for passwd
29408: handle_request: request received (Version = 2) from PID 29634
29408: 	GETFDGR
29408: provide access to FD 9, for group
29408: handle_request: request received (Version = 2) from PID 29634
29408: 	GETFDHST
29408: provide access to FD 11, for hosts
29408: handle_request: request received (Version = 2) from PID 29625
29408: 	GETPWBYUID (0)

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xaf86fb90 (LWP 29417)]
0xb80adb82 in cache_search () from /usr/sbin/nscd

Is there a way to build an nscd-debuginfo package? With rpm --rebuild glib....
I only get the glib-debuinfo packages.



Comment 7 Bastien Nocera 2008-06-06 21:01:44 UTC
(In reply to comment #6)
<snip>
> Is there a way to build an nscd-debuginfo package? With rpm --rebuild glib....
> I only get the glib-debuinfo packages.

That's because the packages you're looking for are called "glibc", not "glib".
Debuginfo packages are most likely already available. See:
http://fedoraproject.org/wiki/StackTraces

Comment 8 Ben Herrick 2008-06-06 22:00:44 UTC
A little more info:

I have several systems using LDAP + NSCD on x86 (non 64bit) hardware.
The most heavily loaded system seems to crash the most often.
NSCD pegs the CPU at 100% just before crashing.

Bug ID 448916 may be a duplicate of this bug.

Comment 9 Chris Snook 2008-07-10 15:30:38 UTC
I'm seeing this too on x86_64.  Not sure if this is related, but I also started
seeing a bunch of SELinux denials (I'm running in permissive mode) shortly
thereafter, which I think was after the next reboot.  The denial always pops up
when NetworkManager connects, and sometimes nscd crashes around this time.  I'm
using local authentication only.

Comment 10 Ulrich Drepper 2008-08-03 03:59:45 UTC
Should work fine in the current release.