Description of problem: | # nscd -d | ... | Segmentation fault Version-Release number of selected component (if applicable): nscd-2.3.4-21 glibc-2.3.4-21 (i386 arch) How reproducible: 100% Additional information: can be reproduced with the i386 version of glibc only; i686 seems to work.
Created attachment 113272 [details] 'catchsegv nscd -d' output
stacktrace in gdb is: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1750770768 (LWP 9419)] 0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd (gdb) bt #0 0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd #1 0x6aab263e in sighup_handler () from /usr/sbin/nscd #2 0x97fc4943 in start_thread () from /lib/libpthread.so.0 #3 0x97f3ed4e in clone () from /lib/libc.so.6
I can reproduce this with glibc i686 on FC4 test 3. I got a similar backtrace before installing the debuginfo RPMs. After installing glibc-debuginfo-common i386 and glibc-debuginfo i686, I get: (gdb) bt full #0 prune_cache (table=0x7bc040, now=1116277037) at cache.c:245 runp = (struct hashentry *) 0xb72f64d9 dh = (struct datahead *) 0x9b2f63b8 run = Variable "run" is not available. (gdb) bt #0 prune_cache (table=0x7bc040, now=1116277037) at cache.c:245 #1 0x007ae63a in nscd_run (p=0x0) at connections.c:1179 #2 0x00547b80 in start_thread (arg=0xb72c0bb0) at pthread_create.c:261 #3 0x00c47b9e in ?? () from /lib/libc.so.6
fedora core 4 test 3 (should update this entry to reflect that). I'm finding this is caused ONLY when ssl is set to start_tls. If ssl is set to on, authentication fails to work and turning off ssl fixes the problem. #0 0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2 (gdb) bt #0 0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2 #1 0x0036bc1a in ldap_pvt_tls_inplace () from /lib/libnss_ldap.so.2 #2 0x0036d917 in ldap_start_tls_s () from /lib/libnss_ldap.so.2 #3 0x00347e3d in do_open () at ldap-nss.c:1273 #4 0x00348025 in do_init2 () at ldap-nss.c:959 #5 0x0034a8b5 in _nss_ldap_initgroups_dyn ( user=0x3 <Address 0x3 out of bounds>, group=3, start=0x3, size=0x3, groupsp=0x3, limit=3, errnop=0x3) at ldap-grp.c:912 #6 0x0028fbe4 in internal_getgrouplist (user=0x8d38cc8 "nscd", group=28, size=0xbfac5b80, groupsp=0xbfac5b84, limit=-1) at initgroups.c:104 #7 0x0028fde1 in getgrouplist (user=0x8d38cc8 "nscd", group=28, groups=0x3, ngroups=0xca1344) at initgroups.c:158 #8 0x00c91aed in nscd_init () at connections.c:1598 #9 0x00c910ad in main (argc=1, argv=0xbfac5ef4) at nscd.c:286 Hope that helps. Regards James
Crash in /lib/libnss_ldap.so.2 is almost surely a bug in nss_ldap (until proven otherwise), so please file that separately, under nss_ldap component.
Still with nscd-2.3.5-10
Same problem here. It crashes in the garbage collector. Version 2.3.5-10.
Chances are high, that it is related with bug #154782 It would be nice to see an errata soon...
With ssl turned off (in this case) it is still happening. Now nscd (FC4 release) is crashing. Using catchsegv I get: 14140: Reloading "0" in password cache! 14140: Reloading "89" in password cache! 14140: Reloading "101" in password cache! 14140: remove INITGROUPS entry "mailman" 14140: remove INITGROUPS entry "cacti" 14140: remove GETHOSTBYADDR entry "198.161.98.242" *** Segmentation fault Register dump: EAX: b7f45708 EBX: 008c1cc0 ECX: b7465af0 EDX: 00000350 ESI: b7465af0 EDI: 008c2140 EBP: b7d41ba0 ESP: b6b89ad4 EIP: 008b9ece EFLAGS: 00010282 CS: 0073 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Trap: 0000000e Error: 00000006 OldMask: 00000000 ESP/signal: b6b89ad4 CR2: b7465af0 Backtrace: /lib/libSegFault.so[0x908115] [0x53a420] nscd[0x8b9948] nscd[0x8b4616] /lib/libpthread.so.0[0x685b80] /lib/libc.so.6(__clone+0x5e)[0xc8bdee] When I run nscd inside of gdb I get. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208730704 (LWP 14254)] 0x00126ece in gc (db=0x12f040) at mem.c:143 143 he[cnt] = (struct hashentry *) (db->data + run); (gdb) bt #0 0x00126ece in gc (db=0x12f040) at mem.c:143 #1 0x00126948 in prune_cache (table=0x12f040, now=1119985124) at cache.c:429 #2 0x00121616 in nscd_run (p=0x0) at connections.c:1179 #3 0x00764b80 in start_thread (arg=0xb7f43bb0) at pthread_create.c:261 #4 0x001fadee in ?? () from /lib/libc.so.6 I personally now view this as critical as this is in a production system and with or without ssl the problem occurs. nscd at this point is completely unusable.
Exact back trace on a second machine now. I've also discovered two other things, this only happens after shutting down nscd, removing the contents of /var/db/nscd and then starting nscd. Second, dropping back to nscd from FC3 fixes the issue, even after deleting the cache in /var/db/nscd/. I'm thinking this is not the same issue. comments?
You could try the valgrind command from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154782#c3 and look if it reports the same uninitialized data. I would really like to see an updated 'nscd' package; then it would be easy to check whether this bug disappears also.
I installed nscd-2.3.5-11 from rawhide (can be installed alone without additional dependencies) and cleared the database with 'rm -f /var/db/nscd/*' (do not forget that!!). 'nscd' is now running nearly one day on several machines where it crashed before.
I think this is the same issue as bug 154782 (i.e., miscompiled code due to gcc bug). This bug can cause all kinds of problems. *** This bug has been marked as a duplicate of 154782 ***