Bug 155124
| Summary: | nscd segfaults | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Enrico Scholz <rh-bugzilla> | ||||
| Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4 | CC: | jbourne, pierre-bugzilla | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2005-07-08 07:21:49 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 136451 | ||||||
| Attachments: |
|
||||||
|
Description
Enrico Scholz
2005-04-16 17:51:55 UTC
Created attachment 113272 [details]
'catchsegv nscd -d' output
stacktrace in gdb is: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1750770768 (LWP 9419)] 0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd (gdb) bt #0 0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd #1 0x6aab263e in sighup_handler () from /usr/sbin/nscd #2 0x97fc4943 in start_thread () from /lib/libpthread.so.0 #3 0x97f3ed4e in clone () from /lib/libc.so.6 I can reproduce this with glibc i686 on FC4 test 3.
I got a similar backtrace before installing the debuginfo RPMs. After installing
glibc-debuginfo-common i386 and glibc-debuginfo i686, I get:
(gdb) bt full
#0 prune_cache (table=0x7bc040, now=1116277037) at cache.c:245
runp = (struct hashentry *) 0xb72f64d9
dh = (struct datahead *) 0x9b2f63b8
run = Variable "run" is not available.
(gdb) bt
#0 prune_cache (table=0x7bc040, now=1116277037) at cache.c:245
#1 0x007ae63a in nscd_run (p=0x0) at connections.c:1179
#2 0x00547b80 in start_thread (arg=0xb72c0bb0) at pthread_create.c:261
#3 0x00c47b9e in ?? () from /lib/libc.so.6
fedora core 4 test 3 (should update this entry to reflect that).
I'm finding this is caused ONLY when ssl is set to start_tls. If ssl is set to
on, authentication fails to work and turning off ssl fixes the problem.
#0 0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2
(gdb) bt
#0 0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2
#1 0x0036bc1a in ldap_pvt_tls_inplace () from /lib/libnss_ldap.so.2
#2 0x0036d917 in ldap_start_tls_s () from /lib/libnss_ldap.so.2
#3 0x00347e3d in do_open () at ldap-nss.c:1273
#4 0x00348025 in do_init2 () at ldap-nss.c:959
#5 0x0034a8b5 in _nss_ldap_initgroups_dyn (
user=0x3 <Address 0x3 out of bounds>, group=3, start=0x3, size=0x3,
groupsp=0x3, limit=3, errnop=0x3) at ldap-grp.c:912
#6 0x0028fbe4 in internal_getgrouplist (user=0x8d38cc8 "nscd", group=28,
size=0xbfac5b80, groupsp=0xbfac5b84, limit=-1) at initgroups.c:104
#7 0x0028fde1 in getgrouplist (user=0x8d38cc8 "nscd", group=28, groups=0x3,
ngroups=0xca1344) at initgroups.c:158
#8 0x00c91aed in nscd_init () at connections.c:1598
#9 0x00c910ad in main (argc=1, argv=0xbfac5ef4) at nscd.c:286
Hope that helps.
Regards
James
Crash in /lib/libnss_ldap.so.2 is almost surely a bug in nss_ldap (until proven otherwise), so please file that separately, under nss_ldap component. Still with nscd-2.3.5-10 Same problem here. It crashes in the garbage collector. Version 2.3.5-10. Chances are high, that it is related with bug #154782 It would be nice to see an errata soon... With ssl turned off (in this case) it is still happening. Now nscd (FC4 release) is crashing. Using catchsegv I get: 14140: Reloading "0" in password cache! 14140: Reloading "89" in password cache! 14140: Reloading "101" in password cache! 14140: remove INITGROUPS entry "mailman" 14140: remove INITGROUPS entry "cacti" 14140: remove GETHOSTBYADDR entry "198.161.98.242" *** Segmentation fault Register dump: EAX: b7f45708 EBX: 008c1cc0 ECX: b7465af0 EDX: 00000350 ESI: b7465af0 EDI: 008c2140 EBP: b7d41ba0 ESP: b6b89ad4 EIP: 008b9ece EFLAGS: 00010282 CS: 0073 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Trap: 0000000e Error: 00000006 OldMask: 00000000 ESP/signal: b6b89ad4 CR2: b7465af0 Backtrace: /lib/libSegFault.so[0x908115] [0x53a420] nscd[0x8b9948] nscd[0x8b4616] /lib/libpthread.so.0[0x685b80] /lib/libc.so.6(__clone+0x5e)[0xc8bdee] When I run nscd inside of gdb I get. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208730704 (LWP 14254)] 0x00126ece in gc (db=0x12f040) at mem.c:143 143 he[cnt] = (struct hashentry *) (db->data + run); (gdb) bt #0 0x00126ece in gc (db=0x12f040) at mem.c:143 #1 0x00126948 in prune_cache (table=0x12f040, now=1119985124) at cache.c:429 #2 0x00121616 in nscd_run (p=0x0) at connections.c:1179 #3 0x00764b80 in start_thread (arg=0xb7f43bb0) at pthread_create.c:261 #4 0x001fadee in ?? () from /lib/libc.so.6 I personally now view this as critical as this is in a production system and with or without ssl the problem occurs. nscd at this point is completely unusable. Exact back trace on a second machine now. I've also discovered two other things, this only happens after shutting down nscd, removing the contents of /var/db/nscd and then starting nscd. Second, dropping back to nscd from FC3 fixes the issue, even after deleting the cache in /var/db/nscd/. I'm thinking this is not the same issue. comments? You could try the valgrind command from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154782#c3 and look if it reports the same uninitialized data. I would really like to see an updated 'nscd' package; then it would be easy to check whether this bug disappears also. I installed nscd-2.3.5-11 from rawhide (can be installed alone without additional dependencies) and cleared the database with 'rm -f /var/db/nscd/*' (do not forget that!!). 'nscd' is now running nearly one day on several machines where it crashed before. I think this is the same issue as bug 154782 (i.e., miscompiled code due to gcc bug). This bug can cause all kinds of problems. *** This bug has been marked as a duplicate of 154782 *** |