Bug 155124 - nscd segfaults
nscd segfaults
Status: CLOSED DUPLICATE of bug 154782
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
Depends On:
Blocks: FC4Target
  Show dependency treegraph
 
Reported: 2005-04-16 13:51 EDT by Enrico Scholz
Modified: 2007-11-30 17:11 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-07-08 03:21:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
'catchsegv nscd -d' output (4.90 KB, text/plain)
2005-04-16 13:51 EDT, Enrico Scholz
no flags Details

  None (edit)
Description Enrico Scholz 2005-04-16 13:51:55 EDT
Description of problem:

| # nscd -d
| ...
| Segmentation fault


Version-Release number of selected component (if applicable):

nscd-2.3.4-21
glibc-2.3.4-21 (i386 arch)


How reproducible:

100%


Additional information:

can be reproduced with the i386 version of glibc only; i686 seems to work.
Comment 1 Enrico Scholz 2005-04-16 13:51:55 EDT
Created attachment 113272 [details]
'catchsegv nscd -d' output
Comment 2 Enrico Scholz 2005-04-16 14:05:08 EDT
stacktrace in gdb is:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1750770768 (LWP 9419)]
0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd
(gdb) bt
#0  0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd
#1  0x6aab263e in sighup_handler () from /usr/sbin/nscd
#2  0x97fc4943 in start_thread () from /lib/libpthread.so.0
#3  0x97f3ed4e in clone () from /lib/libc.so.6
Comment 3 Mark Goodman 2005-05-16 17:13:49 EDT
I can reproduce this with glibc i686 on FC4 test 3.

I got a similar backtrace before installing the debuginfo RPMs. After installing
glibc-debuginfo-common i386 and glibc-debuginfo i686, I get:

(gdb) bt full
#0  prune_cache (table=0x7bc040, now=1116277037) at cache.c:245
        runp = (struct hashentry *) 0xb72f64d9
        dh = (struct datahead *) 0x9b2f63b8
        run = Variable "run" is not available.
(gdb) bt
#0  prune_cache (table=0x7bc040, now=1116277037) at cache.c:245
#1  0x007ae63a in nscd_run (p=0x0) at connections.c:1179
#2  0x00547b80 in start_thread (arg=0xb72c0bb0) at pthread_create.c:261
#3  0x00c47b9e in ?? () from /lib/libc.so.6
Comment 4 James Bourne 2005-05-19 11:43:12 EDT
fedora core 4 test 3 (should update this entry to reflect that).
I'm finding this is caused ONLY when ssl is set to start_tls.  If ssl is set to
on, authentication fails to work and turning off ssl fixes the problem.

#0  0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2
(gdb) bt
#0  0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2
#1  0x0036bc1a in ldap_pvt_tls_inplace () from /lib/libnss_ldap.so.2
#2  0x0036d917 in ldap_start_tls_s () from /lib/libnss_ldap.so.2
#3  0x00347e3d in do_open () at ldap-nss.c:1273
#4  0x00348025 in do_init2 () at ldap-nss.c:959
#5  0x0034a8b5 in _nss_ldap_initgroups_dyn (
    user=0x3 <Address 0x3 out of bounds>, group=3, start=0x3, size=0x3, 
    groupsp=0x3, limit=3, errnop=0x3) at ldap-grp.c:912
#6  0x0028fbe4 in internal_getgrouplist (user=0x8d38cc8 "nscd", group=28, 
    size=0xbfac5b80, groupsp=0xbfac5b84, limit=-1) at initgroups.c:104
#7  0x0028fde1 in getgrouplist (user=0x8d38cc8 "nscd", group=28, groups=0x3, 
    ngroups=0xca1344) at initgroups.c:158
#8  0x00c91aed in nscd_init () at connections.c:1598
#9  0x00c910ad in main (argc=1, argv=0xbfac5ef4) at nscd.c:286

Hope that helps.

Regards
James
Comment 5 Jakub Jelinek 2005-05-19 12:15:36 EDT
Crash in /lib/libnss_ldap.so.2 is almost surely a bug in nss_ldap (until proven
otherwise), so please file that separately, under nss_ldap component.
Comment 6 Enrico Scholz 2005-06-01 16:09:09 EDT
Still with nscd-2.3.5-10
Comment 7 Pierre Ossman 2005-06-20 13:50:19 EDT
Same problem here. It crashes in the garbage collector. Version 2.3.5-10.
Comment 8 Enrico Scholz 2005-06-21 06:07:32 EDT
Chances are high, that it is related with bug #154782

It would be nice to see an errata soon...
Comment 9 James Bourne 2005-06-28 15:03:21 EDT
With ssl turned off (in this case) it is still happening.  Now nscd (FC4
release) is crashing.  Using catchsegv I get:
14140: Reloading "0" in password cache!
14140: Reloading "89" in password cache!
14140: Reloading "101" in password cache!
14140: remove INITGROUPS entry "mailman"
14140: remove INITGROUPS entry "cacti"
14140: remove GETHOSTBYADDR entry "198.161.98.242"
*** Segmentation fault
Register dump:

 EAX: b7f45708   EBX: 008c1cc0   ECX: b7465af0   EDX: 00000350
 ESI: b7465af0   EDI: 008c2140   EBP: b7d41ba0   ESP: b6b89ad4

 EIP: 008b9ece   EFLAGS: 00010282

 CS: 0073   DS: 007b   ES: 007b   FS: 0000   GS: 0033   SS: 007b

 Trap: 0000000e   Error: 00000006   OldMask: 00000000
 ESP/signal: b6b89ad4   CR2: b7465af0

Backtrace:
/lib/libSegFault.so[0x908115]
[0x53a420]
nscd[0x8b9948]
nscd[0x8b4616]
/lib/libpthread.so.0[0x685b80]
/lib/libc.so.6(__clone+0x5e)[0xc8bdee]

When I run nscd inside of gdb I get.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208730704 (LWP 14254)]
0x00126ece in gc (db=0x12f040) at mem.c:143
143               he[cnt] = (struct hashentry *) (db->data + run);
(gdb) bt
#0  0x00126ece in gc (db=0x12f040) at mem.c:143
#1  0x00126948 in prune_cache (table=0x12f040, now=1119985124) at cache.c:429
#2  0x00121616 in nscd_run (p=0x0) at connections.c:1179
#3  0x00764b80 in start_thread (arg=0xb7f43bb0) at pthread_create.c:261
#4  0x001fadee in ?? () from /lib/libc.so.6

I personally now view this as critical as this is in a production system and
with or without ssl the problem occurs.  nscd at this point is completely unusable.

Comment 10 James Bourne 2005-06-29 02:46:38 EDT
Exact back trace on a second machine now.  I've also discovered two other
things, this only happens after shutting down nscd, removing the contents of
/var/db/nscd and then starting nscd.  Second, dropping back to nscd from FC3
fixes the issue, even after deleting the cache in /var/db/nscd/.

I'm thinking this is not the same issue.  comments?
Comment 11 Enrico Scholz 2005-06-29 03:59:56 EDT
You could try the valgrind command from

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154782#c3

and look if it reports the same uninitialized data. I would really like to see
an updated 'nscd' package; then it would be easy to check whether this bug
disappears also.
Comment 12 Enrico Scholz 2005-07-03 04:55:24 EDT
I installed nscd-2.3.5-11 from rawhide (can be installed alone without
additional dependencies) and cleared the database with 'rm -f /var/db/nscd/*'
(do not forget that!!). 

'nscd' is now running nearly one day on several machines where it crashed before.
Comment 13 Ulrich Drepper 2005-07-08 03:21:49 EDT
I think this is the same issue as bug 154782 (i.e., miscompiled code due to gcc
bug).  This bug can cause all kinds of problems.

*** This bug has been marked as a duplicate of 154782 ***

Note You need to log in before you can comment on or make changes to this bug.