Description of problem: After Installation of Update 4 we encountered several Segmentation Faults in sendmail (from RedHat), nscd (from RedHat), mimedefang, uxmon (bigsister) and dsmc (from Tivoli). The main culprit seems to be nscd-2.3.4-2.25, which was enabled using the default configuration. Disabling the host-cache portion of nscd "cured" this problem (at least i haven't seen a related segfault since). Version-Release number of selected component (if applicable): nscd-2.3.4-2.25 How reproducible: It happened on three Dell Poweredge (2850 and 1850) server. One is mainly used as a web-server the other two as mail-servers. Since these are production servers testing is a bit difficult. On one machine I specifically installed only nscd and the kernel update and encountered 2 segfaults without rebooting the machine and several thousand in sendmail after booting the machine. Three other i386 architecture Machines using the appropriate version have not shown any remotely comparable problems after installation of update 4. The segmentation faults occur mostly after a boot of the machine. The frequency was reduced after several hours of uptime. I am not entirely sure if this coincided with a crash of nscd itself, which at least in one case seemed to be the reason. Steps to Reproduce: 1. Install RedHat 4AS x86_64 Update 3 2. Enable nscd with default configuration 3. Install Update 4 Actual results: segfaults in sendmail up to sendmail crashing Expected results: No segfault as before. Additional info:
By any chance, could this be related to nscd database growing (i.e. do you have really many concurrent hosts lookups that the default database size is too small)? We've just been able to reproduce such an issue today and are still working on a fix. You could try to increase suggested-size hosts to a (much) bigger (prime) value, say 8191, rm -f /var/db/nscd/hosts and restart nscd to see if that's the case.
I am currently trying your suggestion on one of our servers. It is a medium sized mail server, serving about 30000 mailadresses but not handling the mailboxes itself. It uses mimedefang and spamassassin for spam detection so it sees a bit of host lookups but not that many concurrent. Typically there are not more than 30 sendmail processes running at the same time. It may take a few hours before I can tell if this makes a difference and will report back then.
The Server has now been running over 24 hours with the increased host cache size and there has not been a single segfault. So it looks like the database size was too small.
http://people.redhat.com/jakub/glibc/2.3.4-2.27/ contains a testing glibc that should hopefully fix this problem. Note this hasn't gone through QA, no guarantees about it. To test, you'd need to: a) decrease suggested-size back in nscd.conf b) rm -f /var/db/nscd/* c) restart nscd so that the database keeps growing again.
So I downloaded only nscd-2.3.4-2.27.x86_64.rpm and installed it on one of ours servers with the default configuration. It took about 30 minutes till the server started logging segfaults again. So it did not help. I am not entirely sure if downoading the entire glibc should make a difference.
Yes, you need not only new nscd, but also glibc. Most of the changes were actually on the glibc side (in libc.so.6) that affect the applications that connect to nscd, only one fix was actually in nscd itself.
Sorry for the misunderstanding on my part. I have now replaced the entire glibc with the patched version and booted the machine. I will report back later about success or failure.
The server has now been running 30 hours without a single segfault, so it is looking good.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0210.html