Bug 164464

Summary:	nscd 'segmentation fault' with LDAP authentication
Product:	[Fedora] Fedora	Reporter:	Giovani <giovani>
Component:	glibc	Assignee:	Jakub Jelinek <jakub>
Status:	CLOSED RAWHIDE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	drepper
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
URL:	http://www.mrinformatica.com.br/nscd.txt
Whiteboard:
Fixed In Version:	2.3.5-10.3	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-08-18 09:23:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Giovani 2005-07-28 00:59:43 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Description of problem:
When using ncsd and LDAP authentication, after a sucessfull login, nscd gives an segmentation fault and crashes.

Version-Release number of selected component (if applicable):
nscd-2.3.5-10

How reproducible:
Always

Steps to Reproduce:
1.Set up a system with LDAP authentication
2.Start nscd
3.Login and wait
  

Actual Results:  After about 30 to 60 seconds, nscd crashes. 

Expected Results:  Nscd should stay up and running.

Additional info:

All debug information I could fetch is avaliable at http://www.mrinformatica.com.br/nscd.txt

Comment 1 Jakub Jelinek 2005-07-28 13:15:58 UTC

The valgrind output is really not very much useful, much more useful would be
to know the backtrace at the point where the program segfaulted, and that's
something gdb would be far better tool for.
Just start gdb --args /usr/sbin/nscd -d
and do whatever you do to trigger the failure.
Note that this might very well be a nss_ldap bug, not nscd.

Comment 2 Giovani 2005-07-28 17:46:10 UTC

Done.

You can see the outputs at http://www.mrinformatica.com.br/nscd-debug.txt (nscd 
degub information) and http://www.mrinformatica.com.br/nscd-gdb.txt (gdb 
information, including backtrace)

Well, it really could be an nss_ldap bug, but it seens that the segmentation 
fault occurs when nscd is cleaning up the cache, more precisely at the "remove 
GETHOSTBYADDR entry "192.168.0.100"" function. But I'll leave the conclusions 
to the ones who knows better than me.

P.S.: My system is in brazilian portuguese, so some of the output might be in 
portuguese. FYI, "Falha de SegmentaÃ§Ã£o" means segmentation fault.

Comment 3 Ulrich Drepper 2005-07-28 19:39:01 UTC

None of this information is really useful.  There is no way to locate the
instruction which fails.

Try installing the debuginfo package for glibc:

  yum install glibc-debuginfo

and then run the program again in gdb.  This should provide a better backtrace
with line numbers.  When you see the segv also disassemble the code around the
location of the error.

There is also a test release for a new glibc out there now.  I don't think any
of the problems fixed applies but it's certainly a more recent code base.

Comment 4 Giovani 2005-07-29 16:48:41 UTC

Ok, done.

Sorry for taking so long. I was having trouble with the debuginfo packages, so 
I rebuilded glibc source rpm.

The gdb output is avaliable at http://www.mrinformatica.com.br/nscd-gdb2.txt

While testing, I realized that nscd crashes even without ldap. I ran 
authconfig, unchecked all LDAP references, and let "Cache Information" checked.

Even without logging in  on my system at all, nscd crashed after a few seconds.

Comment 5 Ulrich Drepper 2005-07-29 16:59:19 UTC

One thing to do before we go further: remove the cache files /var/db/nscd/* and
restart nscd.  Myabe the databases are corrupted.  This is known to cause problems.

Comment 6 Giovani 2005-07-29 17:10:06 UTC

Still the same.

cd /var/db/nscd 
rm -rf *

No LDAP in authconfig, only Cache Information.

run: gdb --args /usr/sbin/nscd -d, run

Login with as valid passwd account. After a few seconds:

22677: provide access to FD 7, for passwd
22677: handle_request: request received (Version = 2) from PID 22707
22677:  GETFDPW
22677: provide access to FD 7, for passwd
22677: remove GETHOSTBYADDR entry "192.168.0.100"
22677: remove GETHOSTBYADDR entry "192.168.0.100"

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1213248592 (LWP 22682)]
0x009c8da9 in gc (db=0x9d1140) at mem.c:171
171       qsort (he_data, cnt, sizeof (struct hashentry *), sort_he_data);
(gdb) bt
#0  0x009c8da9 in gc (db=0x9d1140) at mem.c:171
#1  0x009c8948 in prune_cache (table=0x9d1140, now=1122656713) at cache.c:429
#2  0x009c3616 in nscd_run (p=0x2) at connections.c:1179
#3  0x0089b947 in start_thread (arg=0xb7af4bb0) at pthread_create.c:261
#4  0x0055a55e in ?? () from /lib/libc.so.6

Comment 7 Ulrich Drepper 2005-07-30 06:43:44 UTC

The crash is certainly due to a corruption of the cache.  It happens probably
while remoing a host entry (both backtraces show this).  There was a compiler
bug which might be the cause.

Try the test glibc from

http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/4/

Version 2.3.5-10.2 is the current test release.  This binary is know to be
compiled with a good compiler.

Comment 8 Giovani 2005-07-30 13:37:28 UTC

All right! It appears to be working with glibc-2.3.5-10.2 and nscd 2.3.5-10.2. 
I'll run some more tests and let you know.

Comment 9 Petr Krištof 2005-08-05 13:01:23 UTC

I did noticed one problem on both glibc-2.3.5-10 and glibc-2.3.5-10.2.
Program getent is not working properly. It doesnt return info about known user.

[root@server ~]# id joe
uid=1009(joe) gid=513(Domain Users) groups=513(Domain Users)
[root@server ~]#
[root@server ~]# getent passwd | grep joe
[root@server ~]#

Even worse. After several hours of uptime, the some glibc internal call
(pwent() ???) stops working so all daemons (sshd, dovecot) are not able
to check users and goes to mud.

I cannt provide more info about this problem because the remote system
did died finaly just now.

Comment 10 David Trask 2005-08-11 01:05:55 UTC

I can verify that this is indeed happening.  I have installed Samba/LDAP on an
FC4 box and nscd keeps dying.  Can I help provide any info?  Let me know what
you need.

David Trask

Comment 11 Jakub Jelinek 2005-08-11 07:01:18 UTC

First certainly start with upgrading to nscd-2.3.5-10.2 (in FC4 updates testing)
and rm -rf /var/db/nscd/* after the upgrade, then restart nscd.  nscd-2.3.5-10
is miscompiled and creates corrupt database, so any crash is possible because of
that.

Comment 12 Jakub Jelinek 2005-08-18 09:23:15 UTC

Or better yet upgrade to nscd-2.3.5-10.3 (in FC4 updates testing), then
manual removal of /var/db/nscd/* should be unnecessary.  Please reopen
only if you can reproduce it with 2.3.5-10.3.