Bug 164464 - nscd 'segmentation fault' with LDAP authentication
nscd 'segmentation fault' with LDAP authentication
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
http://www.mrinformatica.com.br/nscd.txt
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-27 20:59 EDT by Giovani
Modified: 2007-11-30 17:11 EST (History)
1 user (show)

See Also:
Fixed In Version: 2.3.5-10.3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-08-18 05:23:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Giovani 2005-07-27 20:59:43 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Description of problem:
When using ncsd and LDAP authentication, after a sucessfull login, nscd gives an segmentation fault and crashes.

Version-Release number of selected component (if applicable):
nscd-2.3.5-10

How reproducible:
Always

Steps to Reproduce:
1.Set up a system with LDAP authentication
2.Start nscd
3.Login and wait
  

Actual Results:  After about 30 to 60 seconds, nscd crashes. 

Expected Results:  Nscd should stay up and running.

Additional info:

All debug information I could fetch is avaliable at http://www.mrinformatica.com.br/nscd.txt
Comment 1 Jakub Jelinek 2005-07-28 09:15:58 EDT
The valgrind output is really not very much useful, much more useful would be
to know the backtrace at the point where the program segfaulted, and that's
something gdb would be far better tool for.
Just start gdb --args /usr/sbin/nscd -d
and do whatever you do to trigger the failure.
Note that this might very well be a nss_ldap bug, not nscd.
Comment 2 Giovani 2005-07-28 13:46:10 EDT
Done.

You can see the outputs at http://www.mrinformatica.com.br/nscd-debug.txt (nscd 
degub information) and http://www.mrinformatica.com.br/nscd-gdb.txt (gdb 
information, including backtrace)

Well, it really could be an nss_ldap bug, but it seens that the segmentation 
fault occurs when nscd is cleaning up the cache, more precisely at the "remove 
GETHOSTBYADDR entry "192.168.0.100"" function. But I'll leave the conclusions 
to the ones who knows better than me.

P.S.: My system is in brazilian portuguese, so some of the output might be in 
portuguese. FYI, "Falha de Segmentação" means segmentation fault.
Comment 3 Ulrich Drepper 2005-07-28 15:39:01 EDT
None of this information is really useful.  There is no way to locate the
instruction which fails.

Try installing the debuginfo package for glibc:

  yum install glibc-debuginfo

and then run the program again in gdb.  This should provide a better backtrace
with line numbers.  When you see the segv also disassemble the code around the
location of the error.

There is also a test release for a new glibc out there now.  I don't think any
of the problems fixed applies but it's certainly a more recent code base.
Comment 4 Giovani 2005-07-29 12:48:41 EDT
Ok, done.

Sorry for taking so long. I was having trouble with the debuginfo packages, so 
I rebuilded glibc source rpm.

The gdb output is avaliable at http://www.mrinformatica.com.br/nscd-gdb2.txt

While testing, I realized that nscd crashes even without ldap. I ran 
authconfig, unchecked all LDAP references, and let "Cache Information" checked.

Even without logging in  on my system at all, nscd crashed after a few seconds.
Comment 5 Ulrich Drepper 2005-07-29 12:59:19 EDT
One thing to do before we go further: remove the cache files /var/db/nscd/* and
restart nscd.  Myabe the databases are corrupted.  This is known to cause problems.
Comment 6 Giovani 2005-07-29 13:10:06 EDT
Still the same.

cd /var/db/nscd 
rm -rf *

No LDAP in authconfig, only Cache Information.

run: gdb --args /usr/sbin/nscd -d, run

Login with as valid passwd account. After a few seconds:

22677: provide access to FD 7, for passwd
22677: handle_request: request received (Version = 2) from PID 22707
22677:  GETFDPW
22677: provide access to FD 7, for passwd
22677: remove GETHOSTBYADDR entry "192.168.0.100"
22677: remove GETHOSTBYADDR entry "192.168.0.100"

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1213248592 (LWP 22682)]
0x009c8da9 in gc (db=0x9d1140) at mem.c:171
171       qsort (he_data, cnt, sizeof (struct hashentry *), sort_he_data);
(gdb) bt
#0  0x009c8da9 in gc (db=0x9d1140) at mem.c:171
#1  0x009c8948 in prune_cache (table=0x9d1140, now=1122656713) at cache.c:429
#2  0x009c3616 in nscd_run (p=0x2) at connections.c:1179
#3  0x0089b947 in start_thread (arg=0xb7af4bb0) at pthread_create.c:261
#4  0x0055a55e in ?? () from /lib/libc.so.6
Comment 7 Ulrich Drepper 2005-07-30 02:43:44 EDT
The crash is certainly due to a corruption of the cache.  It happens probably
while remoing a host entry (both backtraces show this).  There was a compiler
bug which might be the cause.

Try the test glibc from

http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/4/

Version 2.3.5-10.2 is the current test release.  This binary is know to be
compiled with a good compiler.
Comment 8 Giovani 2005-07-30 09:37:28 EDT
All right! It appears to be working with glibc-2.3.5-10.2 and nscd 2.3.5-10.2. 
I'll run some more tests and let you know.
Comment 9 Petr Krištof 2005-08-05 09:01:23 EDT
I did noticed one problem on both glibc-2.3.5-10 and glibc-2.3.5-10.2.
Program getent is not working properly. It doesnt return info about known user.

[root@server ~]# id joe
uid=1009(joe) gid=513(Domain Users) groups=513(Domain Users)
[root@server ~]#
[root@server ~]# getent passwd | grep joe
[root@server ~]#

Even worse. After several hours of uptime, the some glibc internal call
(pwent() ???) stops working so all daemons (sshd, dovecot) are not able
to check users and goes to mud.

I cannt provide more info about this problem because the remote system
did died finaly just now.
Comment 10 David Trask 2005-08-10 21:05:55 EDT
I can verify that this is indeed happening.  I have installed Samba/LDAP on an
FC4 box and nscd keeps dying.  Can I help provide any info?  Let me know what
you need.

David Trask
Comment 11 Jakub Jelinek 2005-08-11 03:01:18 EDT
First certainly start with upgrading to nscd-2.3.5-10.2 (in FC4 updates testing)
and rm -rf /var/db/nscd/* after the upgrade, then restart nscd.  nscd-2.3.5-10
is miscompiled and creates corrupt database, so any crash is possible because of
that.
Comment 12 Jakub Jelinek 2005-08-18 05:23:15 EDT
Or better yet upgrade to nscd-2.3.5-10.3 (in FC4 updates testing), then
manual removal of /var/db/nscd/* should be unnecessary.  Please reopen
only if you can reproduce it with 2.3.5-10.3.

Note You need to log in before you can comment on or make changes to this bug.