Bug 164464
Summary: | nscd 'segmentation fault' with LDAP authentication | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Giovani <giovani> |
Component: | glibc | Assignee: | Jakub Jelinek <jakub> |
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | drepper |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
URL: | http://www.mrinformatica.com.br/nscd.txt | ||
Whiteboard: | |||
Fixed In Version: | 2.3.5-10.3 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-08-18 09:23:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Giovani
2005-07-28 00:59:43 UTC
The valgrind output is really not very much useful, much more useful would be to know the backtrace at the point where the program segfaulted, and that's something gdb would be far better tool for. Just start gdb --args /usr/sbin/nscd -d and do whatever you do to trigger the failure. Note that this might very well be a nss_ldap bug, not nscd. Done. You can see the outputs at http://www.mrinformatica.com.br/nscd-debug.txt (nscd degub information) and http://www.mrinformatica.com.br/nscd-gdb.txt (gdb information, including backtrace) Well, it really could be an nss_ldap bug, but it seens that the segmentation fault occurs when nscd is cleaning up the cache, more precisely at the "remove GETHOSTBYADDR entry "192.168.0.100"" function. But I'll leave the conclusions to the ones who knows better than me. P.S.: My system is in brazilian portuguese, so some of the output might be in portuguese. FYI, "Falha de Segmentação" means segmentation fault. None of this information is really useful. There is no way to locate the instruction which fails. Try installing the debuginfo package for glibc: yum install glibc-debuginfo and then run the program again in gdb. This should provide a better backtrace with line numbers. When you see the segv also disassemble the code around the location of the error. There is also a test release for a new glibc out there now. I don't think any of the problems fixed applies but it's certainly a more recent code base. Ok, done. Sorry for taking so long. I was having trouble with the debuginfo packages, so I rebuilded glibc source rpm. The gdb output is avaliable at http://www.mrinformatica.com.br/nscd-gdb2.txt While testing, I realized that nscd crashes even without ldap. I ran authconfig, unchecked all LDAP references, and let "Cache Information" checked. Even without logging in on my system at all, nscd crashed after a few seconds. One thing to do before we go further: remove the cache files /var/db/nscd/* and restart nscd. Myabe the databases are corrupted. This is known to cause problems. Still the same. cd /var/db/nscd rm -rf * No LDAP in authconfig, only Cache Information. run: gdb --args /usr/sbin/nscd -d, run Login with as valid passwd account. After a few seconds: 22677: provide access to FD 7, for passwd 22677: handle_request: request received (Version = 2) from PID 22707 22677: GETFDPW 22677: provide access to FD 7, for passwd 22677: remove GETHOSTBYADDR entry "192.168.0.100" 22677: remove GETHOSTBYADDR entry "192.168.0.100" Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1213248592 (LWP 22682)] 0x009c8da9 in gc (db=0x9d1140) at mem.c:171 171 qsort (he_data, cnt, sizeof (struct hashentry *), sort_he_data); (gdb) bt #0 0x009c8da9 in gc (db=0x9d1140) at mem.c:171 #1 0x009c8948 in prune_cache (table=0x9d1140, now=1122656713) at cache.c:429 #2 0x009c3616 in nscd_run (p=0x2) at connections.c:1179 #3 0x0089b947 in start_thread (arg=0xb7af4bb0) at pthread_create.c:261 #4 0x0055a55e in ?? () from /lib/libc.so.6 The crash is certainly due to a corruption of the cache. It happens probably while remoing a host entry (both backtraces show this). There was a compiler bug which might be the cause. Try the test glibc from http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/4/ Version 2.3.5-10.2 is the current test release. This binary is know to be compiled with a good compiler. All right! It appears to be working with glibc-2.3.5-10.2 and nscd 2.3.5-10.2. I'll run some more tests and let you know. I did noticed one problem on both glibc-2.3.5-10 and glibc-2.3.5-10.2. Program getent is not working properly. It doesnt return info about known user. [root@server ~]# id joe uid=1009(joe) gid=513(Domain Users) groups=513(Domain Users) [root@server ~]# [root@server ~]# getent passwd | grep joe [root@server ~]# Even worse. After several hours of uptime, the some glibc internal call (pwent() ???) stops working so all daemons (sshd, dovecot) are not able to check users and goes to mud. I cannt provide more info about this problem because the remote system did died finaly just now. I can verify that this is indeed happening. I have installed Samba/LDAP on an FC4 box and nscd keeps dying. Can I help provide any info? Let me know what you need. David Trask First certainly start with upgrading to nscd-2.3.5-10.2 (in FC4 updates testing) and rm -rf /var/db/nscd/* after the upgrade, then restart nscd. nscd-2.3.5-10 is miscompiled and creates corrupt database, so any crash is possible because of that. Or better yet upgrade to nscd-2.3.5-10.3 (in FC4 updates testing), then manual removal of /var/db/nscd/* should be unnecessary. Please reopen only if you can reproduce it with 2.3.5-10.3. |