162326 – nscd dies after a few seconds

Bug 162326 - nscd dies after a few seconds

Summary: nscd dies after a few seconds

Keywords:
Status:	CLOSED DUPLICATE of bug 162712
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	nss_ldap
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-07-02 16:35 UTC by Andre Robatino
Modified:	2019-01-02 11:48 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-07-22 15:18:35 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	162712	0	medium	CLOSED	nscd segfaulting using nss_ldap	2021-02-22 00:41:40 UTC

Description Andre Robatino 2005-07-02 16:35:07 UTC

Description of problem:
  After nscd is started, it dies within a few seconds.

Version-Release number of selected component (if applicable):
nscd-2.3.5-10

How reproducible:
always

Steps to Reproduce:
1.su -
2.nscd
3.wait a few seconds
  
Actual results:
nscd dies

Expected results:
nscd should stay up

Additional info:
clean install of FC4

Comment 1 Dan Cox 2005-07-02 18:56:47 UTC

I'm having the same problem. I'm using nss_ldap for LDAP authentication with
SELinux disabled. Also tried with persistent = no and shared = no settings with
the same results. This setup was stable under FC3.

I can't seem to get it to dump a core file either, which seems strange:
# ulimit -c
unlimited

Here's the tail end of an strace -f nscd

[pid   440] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid   440] futex(0x3682a4, FUTEX_WAKE, 1) = 0
[pid   440] time(NULL)                  = 1120330271
[pid   440] stat64("/etc/passwd", {st_mode=S_IFREG|0444, st_size=2516, ...}) = 0
[pid   440] clock_gettime(CLOCK_MONOTONIC, {49842, 645350000}) = 0
[pid   440] clock_gettime(CLOCK_MONOTONIC, {49842, 646743000}) = 0
[pid   440] futex(0x3682e4, FUTEX_WAIT, 111, {14, 998607000} <unfinished ...>
[pid   443] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid   443] futex(0x3682a4, FUTEX_WAKE, 1) = 0
[pid   443] time(NULL)                  = 1120330271
[pid   443] stat64("/etc/hosts", {st_mode=S_IFREG|0444, st_size=382, ...}) = 0
[pid   443] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 437 detached
Process 443 detached
[pid   441] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid   441] +++ killed by SIGSEGV +++
PANIC: handle_group_exit: 441 leader 437
[pid   440] <... futex resumed> )       = -1 EINTR (Interrupted system call)
[pid   440] +++ killed by SIGSEGV +++
PANIC: handle_group_exit: 440 leader 437
Process 437 detached

Comment 2 Jakub Jelinek 2005-07-04 19:45:08 UTC

You can use LD_PRELOAD=libSegFault.so nscd -d to see a backtrace.
If you are using LDAP, it could be either nss_ldap or glibc bug.
If the latter, I'd be interested to know if using glibc-2.3.5-11 nscd
(rawhide) cures it (then it could be nscd miscompilation by GCC - #154782).

Comment 3 Andre Robatino 2005-07-04 19:56:39 UTC

[root@localhost ~]# LD_PRELOAD=libSegFault.so nscd -d
3084: Access Vector Cache (AVC) started
3084: Reloading "0" in password cache!
*** Segmentation fault
Register dump:

 EAX: 00000005   EBX: 00b40cc0   ECX: 000000cb   EDX: 00000005
 ESI: b73993b8   EDI: f9e5beb7   EBP: b732ddb4   ESP: b732dbac

 EIP: 00b38732   EFLAGS: 00210296

 CS: 0073   DS: 007b   ES: 007b   FS: 0000   GS: 0033   SS: 007b

 Trap: 0000000e   Error: 00000005   OldMask: 00000000
 ESP/signal: b732dbac   CR2: f9e5becb

Backtrace:
3084: Reloading "0" in group cache!
3084: Reloading "ftp.freshrpms.net" in hosts cache!
3084: Reloading "lp" in group cache!
3084: Reloading "1" in group cache!
3084: Reloading "2" in group cache!
3084: Reloading "slocate" in group cache!
3084: Reloading "3" in group cache!
3084: Reloading "4" in group cache!
3084: Reloading "root" in group cache!
3084: Reloading "10" in group cache!
3084: Reloading "5" in group cache!
3084: Reloading "users" in group cache!
3084: Reloading "500" in group cache!
3084: Reloading "6" in group cache!
3084: Reloading "12" in group cache!
/lib/libSegFault.so[0x8ac115]
[0x11f420]
nscd[0xb33616]
/lib/libpthread.so.0[0xac8b80]
/lib/libc.so.6(__clone+0x5e)[0x1eadee]
Segmentation fault
[root@localhost ~]#

Comment 4 Andre Robatino 2005-07-04 20:46:28 UTC

  I don't even know what LDAP is.  I'm using nscd to try to speed up DNS as
recommended by

http://www.fedoraforum.org/forum/showthread.php?t=42943

  Essentially the same thing happens with nscd-2.3.5-11 (just the one package
upgraded).

Comment 5 Jakub Jelinek 2005-07-04 20:52:26 UTC

Ok, can you as root:
mkdir -p ~/db-nscd/
cp -a /var/db/nscd/* ~/db-nscd/
rm -f /var/db/nscd/*
and retry?

LDAP notice was in response to comment #1.

Comment 6 Andre Robatino 2005-07-04 21:02:07 UTC

  After following the instructions, nscd stays up (I restored the original nscd
package first).

Comment 7 Jakub Jelinek 2005-07-04 21:10:50 UTC

Ok, can you now stop nscd, copy the ~/db-nscd/* files back and retry?
If that crashes again, I'd be very much interested in those 3 db files, to make
nscd more robust when it sees broken cache files.  You can mail the files to me
or attach here.

Comment 8 Andre Robatino 2005-07-04 21:27:31 UTC

  It crashes again after restoring the 3 files, which I've emailed to you.  I
noticed that after doing a cp -p for the 3 files back to /var/db/nscd, even
though the modification times are the same (June 12), the contents differ. 
Presumably the modification times are being saved and restored.

Comment 9 Dan Cox 2005-07-05 00:36:04 UTC

Still couldn't get a core dump with LD_PRELOAD=libSegFault.so. I'm testing this
under Xen guest and host, could that be preventing this? I also tried upgrading
glibc and nscd to no avail:

# rpm -q glibc nscd
glibc-2.3.5-11
nscd-2.3.5-11

Here's some valgrind output instead. Note that I'm also using LDAP for NIS
netgroup lookups. I'm not sure why I get the fatal error on /proc/self/maps..

# valgrind --db-attach=no --tool=memcheck --error-limit=no nscd -d

7463: handle_request: request received (Version = 2) from PID 7474
7463:   GETFDPW
7463: provide access to FD 5, for passwd
7463: handle_request: request received (Version = 2) from PID 7474
7463:   GETPWBYNAME (dcox)
7463: Haven't found "dcox" in password cache!
==7463==
==7463== Thread 3:
==7463== Syscall param write(buf) points to uninitialised byte(s)
==7463==    at 0x1B9330BB: (within /lib/libpthread-2.3.5.so)
==7463==    by 0x1BBDC597: sb_debug_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBD04E1: sb_tls_bio_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BCCB119: BIO_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BC9C100: ssl3_write_pending (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BC9C6AA: ssl3_write_bytes (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BCBAFA3: ssl3_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BCA240B: SSL_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBD0323: sb_tls_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBDC597: sb_debug_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBDBA6B: ber_int_sb_write (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBD87FD: ber_flush (in /lib/libnss_ldap-2.3.5.so)
==7463==  Address 0x1BF2BFCD is 5 bytes inside a block of size 18698 alloc'd
==7463==    at 0x1B909222: malloc (vg_replace_malloc.c:130)
==7463==    by 0x1BCBCC39: default_malloc_ex (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BCBD1E6: CRYPTO_malloc (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BC9E621: ssl3_setup_buffers (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BC9F4EE: ssl23_connect (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BCA384B: SSL_connect (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBD1D9F: ldap_int_tls_start (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBD2282: ldap_start_tls_s (in /lib/libnss_ldap-2.3.5.so)
==7463==    by 0x1BBAC0AA: do_open (ldap-nss.c:1274)
==7463==    by 0x1BBAC28E: do_init2 (ldap-nss.c:960)
==7463==    by 0x1BBAEDF7: _nss_ldap_initgroups_dyn (ldap-grp.c:1050)
==7463==    by 0x1B9DED43: internal_getgrouplist (in /lib/libc-2.3.5.so)
==7463== FATAL: can't open /proc/self/maps

Comment 10 Andre Robatino 2005-07-05 03:46:14 UTC

  I found that even after deleting the 3 files and letting them be recreated,
nscd eventually dies, though not quickly.  Once it dies, the 3 files in
/var/db/nscd at that time cause it to crash quickly again.

Comment 11 Ulrich Drepper 2005-07-07 21:10:20 UTC

Stop adding the LDAP related comments here.  As can be seen in the backtrace in
comment #9, this seems to be a problem in the LDAP code.  It might not be the
only problem.  Eliminate the use of LDAP if you want to add anything to this bug.

Beside, your xen domain seems to be severely crippled.  No /proc is mounted,
that is fatal these days.

Comment 12 Eric Doutreleau 2005-07-21 12:22:02 UTC

I have the same problem without ldap.

here is the output of valgrind

valgrind --db-attach=no --tool=memcheck --error-limit=no nsc d -d
==24122== Memcheck, a memory error detector for x86-linux.
==24122== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==24122== Using valgrind-2.4.0, a program supervision framework for x86-linux.
==24122== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==24122== For more details, rerun with: -v
==24122==
==24122== Syscall param write(buf) points to uninitialised byte(s)
==24122==    at 0x525093: __write_nocancel (in /lib/libpthread-2.3.5.so)
==24122==    by 0x40AC: main (in /usr/sbin/nscd)
==24122==  Address 0x52BFE7F0 is on thread 1's stack



24122: handle_request: requÃªte reÃ§ue (Version = 2) Ã  partir du PID 24132
24122:  GETFDPW
24122: provide access to FD 4, for passwd
24122: handle_request: requÃªte reÃ§ue (Version = 2) Ã  partir du PID 24132
24122:  GETPWBYNAME (sshd)
24122: N'a pas trouvÃ© Â« sshd Â» dans la cache des mots de passe!
24122: handle_request: requÃªte reÃ§ue (Version = 2) Ã  partir du PID 24133
24122:  GETPWBYNAME (sshd)
........
[some lines removed ]
..........
24122: provide access to FD 4, for passwd
24122: handle_request: requÃªte reÃ§ue (Version = 2) Ã  partir du PID 24179
24122:  GETFDPW
24122: provide access to FD 4, for passwd
24122: handle_request: requÃªte reÃ§ue (Version = 2) Ã  partir du PID 24181
24122:  GETFDPW
24122: provide access to FD 4, for passwd
24122: remove GETPWBYNAME entry "test"
==24122==
==24122== Thread 2:
==24122== Invalid write of size 4
==24122==    at 0xBECE: (within /usr/sbin/nscd)
==24122==  Address 0x1AE26AC0 is on thread 2's stack
==24122== Stack overflow in thread 2: can't grow stack to 0x1AE26AC0
==24122==
==24122== Process terminating with default action of signal 11 (SIGSEGV)
==24122==  Access not within mapped region at address 0x1AE26AC0
==24122==    at 0xBECE: (within /usr/sbin/nscd)
==24122==
==24122== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 26 from 1)
==24122== malloc/free: in use at exit: 13665 bytes in 28 blocks.
==24122== malloc/free: 256 allocs, 228 frees, 90239 bytes allocated.
==24122== For counts of detected errors, rerun with: -v
==24122== searching for pointers to 28 not-freed blocks.
==24122== checked 6436176 bytes.
==24122==
==24122== LEAK SUMMARY:
==24122==    definitely lost: 0 bytes in 0 blocks.
==24122==      possibly lost: 816 bytes in 6 blocks.
==24122==    still reachable: 12849 bytes in 22 blocks.
==24122==         suppressed: 0 bytes in 0 blocks.
==24122== Reachable blocks (those to which a pointer was found) are not shown.
==24122== To see them, rerun with: --show-reachable=yes
==24122== FATAL: can't open /proc/self/maps

Comment 13 Jack Aboutboul 2005-07-22 15:18:35 UTC


*** This bug has been marked as a duplicate of 162712 ***

Note You need to log in before you can comment on or make changes to this bug.