Bug 155124

Summary:

nscd segfaults

Product:

[Fedora] Fedora

Reporter:

Enrico Scholz <rh-bugzilla>

Component:

glibc

Assignee:

Jakub Jelinek <jakub>

Status:

CLOSED DUPLICATE

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

jbourne, pierre-bugzilla

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2005-07-08 07:21:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

136451

Attachments:

Description	Flags
'catchsegv nscd -d' output	none

Description Enrico Scholz 2005-04-16 17:51:55 UTC

Description of problem:

| # nscd -d
| ...
| Segmentation fault


Version-Release number of selected component (if applicable):

nscd-2.3.4-21
glibc-2.3.4-21 (i386 arch)


How reproducible:

100%


Additional information:

can be reproduced with the i386 version of glibc only; i686 seems to work.

Comment 1 Enrico Scholz 2005-04-16 17:51:55 UTC

Created attachment 113272 [details]
'catchsegv nscd -d' output

Comment 2 Enrico Scholz 2005-04-16 18:05:08 UTC

stacktrace in gdb is:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1750770768 (LWP 9419)]
0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd
(gdb) bt
#0  0x6aab7756 in gethostbyname2_r () from /usr/sbin/nscd
#1  0x6aab263e in sighup_handler () from /usr/sbin/nscd
#2  0x97fc4943 in start_thread () from /lib/libpthread.so.0
#3  0x97f3ed4e in clone () from /lib/libc.so.6

Comment 3 Mark Goodman 2005-05-16 21:13:49 UTC

I can reproduce this with glibc i686 on FC4 test 3.

I got a similar backtrace before installing the debuginfo RPMs. After installing
glibc-debuginfo-common i386 and glibc-debuginfo i686, I get:

(gdb) bt full
#0  prune_cache (table=0x7bc040, now=1116277037) at cache.c:245
        runp = (struct hashentry *) 0xb72f64d9
        dh = (struct datahead *) 0x9b2f63b8
        run = Variable "run" is not available.
(gdb) bt
#0  prune_cache (table=0x7bc040, now=1116277037) at cache.c:245
#1  0x007ae63a in nscd_run (p=0x0) at connections.c:1179
#2  0x00547b80 in start_thread (arg=0xb72c0bb0) at pthread_create.c:261
#3  0x00c47b9e in ?? () from /lib/libc.so.6

Comment 4 James Bourne 2005-05-19 15:43:12 UTC

fedora core 4 test 3 (should update this entry to reflect that).
I'm finding this is caused ONLY when ssl is set to start_tls.  If ssl is set to
on, authentication fails to work and turning off ssl fixes the problem.

#0  0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2
(gdb) bt
#0  0x00376f1e in ber_sockbuf_ctrl () from /lib/libnss_ldap.so.2
#1  0x0036bc1a in ldap_pvt_tls_inplace () from /lib/libnss_ldap.so.2
#2  0x0036d917 in ldap_start_tls_s () from /lib/libnss_ldap.so.2
#3  0x00347e3d in do_open () at ldap-nss.c:1273
#4  0x00348025 in do_init2 () at ldap-nss.c:959
#5  0x0034a8b5 in _nss_ldap_initgroups_dyn (
    user=0x3 <Address 0x3 out of bounds>, group=3, start=0x3, size=0x3, 
    groupsp=0x3, limit=3, errnop=0x3) at ldap-grp.c:912
#6  0x0028fbe4 in internal_getgrouplist (user=0x8d38cc8 "nscd", group=28, 
    size=0xbfac5b80, groupsp=0xbfac5b84, limit=-1) at initgroups.c:104
#7  0x0028fde1 in getgrouplist (user=0x8d38cc8 "nscd", group=28, groups=0x3, 
    ngroups=0xca1344) at initgroups.c:158
#8  0x00c91aed in nscd_init () at connections.c:1598
#9  0x00c910ad in main (argc=1, argv=0xbfac5ef4) at nscd.c:286

Hope that helps.

Regards
James

Comment 5 Jakub Jelinek 2005-05-19 16:15:36 UTC

Crash in /lib/libnss_ldap.so.2 is almost surely a bug in nss_ldap (until proven
otherwise), so please file that separately, under nss_ldap component.

Comment 6 Enrico Scholz 2005-06-01 20:09:09 UTC

Still with nscd-2.3.5-10

Comment 7 Pierre Ossman 2005-06-20 17:50:19 UTC

Same problem here. It crashes in the garbage collector. Version 2.3.5-10.

Comment 8 Enrico Scholz 2005-06-21 10:07:32 UTC

Chances are high, that it is related with bug #154782

It would be nice to see an errata soon...

Comment 9 James Bourne 2005-06-28 19:03:21 UTC

With ssl turned off (in this case) it is still happening.  Now nscd (FC4
release) is crashing.  Using catchsegv I get:
14140: Reloading "0" in password cache!
14140: Reloading "89" in password cache!
14140: Reloading "101" in password cache!
14140: remove INITGROUPS entry "mailman"
14140: remove INITGROUPS entry "cacti"
14140: remove GETHOSTBYADDR entry "198.161.98.242"
*** Segmentation fault
Register dump:

 EAX: b7f45708   EBX: 008c1cc0   ECX: b7465af0   EDX: 00000350
 ESI: b7465af0   EDI: 008c2140   EBP: b7d41ba0   ESP: b6b89ad4

 EIP: 008b9ece   EFLAGS: 00010282

 CS: 0073   DS: 007b   ES: 007b   FS: 0000   GS: 0033   SS: 007b

 Trap: 0000000e   Error: 00000006   OldMask: 00000000
 ESP/signal: b6b89ad4   CR2: b7465af0

Backtrace:
/lib/libSegFault.so[0x908115]
[0x53a420]
nscd[0x8b9948]
nscd[0x8b4616]
/lib/libpthread.so.0[0x685b80]
/lib/libc.so.6(__clone+0x5e)[0xc8bdee]

When I run nscd inside of gdb I get.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208730704 (LWP 14254)]
0x00126ece in gc (db=0x12f040) at mem.c:143
143               he[cnt] = (struct hashentry *) (db->data + run);
(gdb) bt
#0  0x00126ece in gc (db=0x12f040) at mem.c:143
#1  0x00126948 in prune_cache (table=0x12f040, now=1119985124) at cache.c:429
#2  0x00121616 in nscd_run (p=0x0) at connections.c:1179
#3  0x00764b80 in start_thread (arg=0xb7f43bb0) at pthread_create.c:261
#4  0x001fadee in ?? () from /lib/libc.so.6

I personally now view this as critical as this is in a production system and
with or without ssl the problem occurs.  nscd at this point is completely unusable.

Comment 10 James Bourne 2005-06-29 06:46:38 UTC

Exact back trace on a second machine now.  I've also discovered two other
things, this only happens after shutting down nscd, removing the contents of
/var/db/nscd and then starting nscd.  Second, dropping back to nscd from FC3
fixes the issue, even after deleting the cache in /var/db/nscd/.

I'm thinking this is not the same issue.  comments?

Comment 11 Enrico Scholz 2005-06-29 07:59:56 UTC

You could try the valgrind command from

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154782#c3

and look if it reports the same uninitialized data. I would really like to see
an updated 'nscd' package; then it would be easy to check whether this bug
disappears also.

Comment 12 Enrico Scholz 2005-07-03 08:55:24 UTC

I installed nscd-2.3.5-11 from rawhide (can be installed alone without
additional dependencies) and cleared the database with 'rm -f /var/db/nscd/*'
(do not forget that!!). 

'nscd' is now running nearly one day on several machines where it crashed before.

Comment 13 Ulrich Drepper 2005-07-08 07:21:49 UTC

I think this is the same issue as bug 154782 (i.e., miscompiled code due to gcc
bug).  This bug can cause all kinds of problems.

*** This bug has been marked as a duplicate of 154782 ***