208831 – [RHEL4] Numerous Segfaults after Installation of Update 4

Bug 208831 - [RHEL4] Numerous Segfaults after Installation of Update 4

Summary: [RHEL4] Numerous Segfaults after Installation of Update 4

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	4.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jakub Jelinek
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-10-02 10:00 UTC by Thomas Scheunemann
Modified:	2016-11-24 15:04 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHBA-2007-0210
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-05-01 23:09:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0210	0	normal	SHIPPED_LIVE	glibc bug fix update	2007-04-28 17:26:43 UTC

Description Thomas Scheunemann 2006-10-02 10:00:31 UTC

Description of problem:
After Installation of Update 4 we encountered several Segmentation Faults in
sendmail (from RedHat), nscd (from RedHat), mimedefang, uxmon (bigsister) and
dsmc (from Tivoli). The main culprit seems to be nscd-2.3.4-2.25, which was
enabled using the default configuration. Disabling the host-cache portion of
nscd "cured" this problem (at least i haven't seen a related segfault since).

Version-Release number of selected component (if applicable):
nscd-2.3.4-2.25

How reproducible:
It happened on three Dell Poweredge (2850 and 1850) server. One is mainly used
as a web-server the other two as mail-servers. Since these are production
servers testing is a bit difficult. On one machine I specifically installed only
nscd and the kernel update and encountered 2 segfaults without rebooting the
machine and several thousand in sendmail after booting the machine. Three other
i386 architecture Machines using the appropriate version have not shown any
remotely comparable problems after installation of update 4. The segmentation
faults occur mostly after a boot of the machine. The frequency was reduced after
several hours of uptime. I am not entirely sure if this coincided with a crash
of nscd itself, which at least in one case seemed to be the reason.

Steps to Reproduce:
1. Install RedHat 4AS x86_64 Update 3
2. Enable nscd with default configuration
3. Install Update 4
  
Actual results:
segfaults in sendmail up to sendmail crashing

Expected results:
No segfault as before.

Additional info:

Comment 1 Jakub Jelinek 2006-10-02 15:44:31 UTC

By any chance, could this be related to nscd database growing (i.e. do you have
really many concurrent hosts lookups that the default database size is too
small)?
We've just been able to reproduce such an issue today and are still working on
a fix.
You could try to increase suggested-size hosts to a (much) bigger (prime) value,
say 8191, rm -f /var/db/nscd/hosts and restart nscd to see if that's the case.

Comment 2 Thomas Scheunemann 2006-10-02 18:23:30 UTC

I am currently trying your suggestion on one of our servers.

It is a medium sized mail server, serving about 30000 mailadresses but not
handling the mailboxes itself. It uses mimedefang and spamassassin for spam
detection so it sees a bit of host lookups but not that many concurrent.
Typically there are not more than 30 sendmail processes running at the same time.

It may take a few hours before I can tell if this makes a difference and will
report back then.

Comment 3 Thomas Scheunemann 2006-10-03 20:16:12 UTC

The Server has now been running over 24 hours with the increased host cache size
and there has not been a single segfault. So it looks like the database size was
too small.

Comment 4 Jakub Jelinek 2006-10-04 09:38:41 UTC

http://people.redhat.com/jakub/glibc/2.3.4-2.27/
contains a testing glibc that should hopefully fix this problem.  Note this
hasn't gone through QA, no guarantees about it.
To test, you'd need to:
a) decrease suggested-size back in nscd.conf
b) rm -f /var/db/nscd/*
c) restart nscd
so that the database keeps growing again.

Comment 5 Thomas Scheunemann 2006-10-04 12:30:57 UTC

So I downloaded only nscd-2.3.4-2.27.x86_64.rpm and installed it on one of ours
servers with the default configuration.

It took about 30 minutes till the server started logging segfaults again. So it
did not help.

I am not entirely sure if downoading the entire glibc should make a difference.

Comment 6 Jakub Jelinek 2006-10-04 13:27:08 UTC

Yes, you need not only new nscd, but also glibc.  Most of the changes were
actually on the glibc side (in libc.so.6) that affect the applications that
connect to nscd, only one fix was actually in nscd itself.

Comment 7 Thomas Scheunemann 2006-10-04 14:18:55 UTC

Sorry for the misunderstanding on my part.

I have now replaced the entire glibc with the patched version and booted the
machine. I will report back later about success or failure.

Comment 8 Thomas Scheunemann 2006-10-05 20:27:10 UTC

The server has now been running 30 hours without a single segfault, so it is
looking good.

Comment 11 RHEL Program Management 2006-10-06 21:02:42 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 30 Red Hat Bugzilla 2007-05-01 23:09:53 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0210.html

Note You need to log in before you can comment on or make changes to this bug.