Bug 137140 - postfix segfaults accessing nscd hosts database
Summary: postfix segfaults accessing nscd hosts database
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 3
Hardware: i386
OS: Linux
medium
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-10-26 01:38 UTC by Matthew Costello
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-11-10 08:40:52 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
YUM update log with list of packages updated (4.35 KB, text/plain)
2004-10-26 14:49 UTC, Matthew Costello
no flags Details

Description Matthew Costello 2004-10-26 01:38:52 UTC
Description of problem:
After updating FC3t3 (via yum) to
 nscd-2.3.3-73, glibc-2.3.3-73, etc.
postfix failed to start when the machine was rebooted because it was
segfaulting.  The 'newaliases' (postfix version) command also
segfaulted.  I strace'd the newaliases command:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"hosts\0", 6}],
msg_controllen=16, {cmsg_len=16, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_RIGHTS, {5}}, msg_flags=0}, 0) = 6
fstat64(5, {st_mode=S_IFREG|0600, st_size=217016, ...}) = 0
pread(5, "\1\0\0\0h\0\0\0\0\0\0\0\1\0\0\0\333T}A\0\0\0\0\323\0\0"...,
104, 0) = 104
mmap2(NULL, 217016, PROT_READ, MAP_SHARED, 5, 0) = 0xf6fb7000
close(5)                                = 0
close(4)                                = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---

The file being passed in the recvmsg() was /var/db/nscd/hosts, so I
stopped nscd.  This fixed the problem, so I removed the 3 files in
/var/run/nscd/ and restarted nscd. Everything still worked.  My guess
is that there was a change in the binary format of thenscd database files.

I am reporting this incident because it indicates a possible security
problem.  Postfix & newaliases should not have segfaulted due to the
contents of a nscd file.  The access library (in /lib/libresolv.so.2
?) should check the file and not segfault if the file is in the wrong
format.

Additionally, updating nscd should remove the cache files in
/var/db/nscd when the format of those files changes.

Version-Release number of selected component (if applicable):
nscd-2.3.3-73
glibc-2.3.3-73

How reproducible:
probably not easily reporducible.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
This bug should probably be filed under "nscd", which does exist on
the bugzilla query form, but not on the bug entry form.

Comment 1 Thomas Woerner 2004-10-26 09:51:14 UTC
Assigning this to glibc for now. 

If this is fixed in nscd, please reassign to me if there is something
to fix in postfix, too.


Comment 2 Jakub Jelinek 2004-10-26 10:20:12 UTC
From what exact glibc/nscd version you have updated?

Comment 3 Matthew Costello 2004-10-26 14:47:02 UTC
I "yum update"d late on October 24.  The previous "yum update" was on
October 20. This updated a total of 85 packages, but the only other
libraries updated are the X11 packages, and this box does not run X.

from glibc-2.3.3-70.i686.rpm
to   glibc-2.3.3-73.i686.rpm

also upgraded with same versions (but i386 arch)
  glibc-common, glibc-devel, glibc-headers & nscd

The box is an EPIA CL10000 w/ a VIA Nehemiah processor.

Comment 4 Matthew Costello 2004-10-26 14:49:43 UTC
Created attachment 105797 [details]
YUM update  log with list of packages updated

Comment 5 Jakub Jelinek 2004-10-26 14:58:07 UTC
There were no nscd related changes between -70 and -73 at all.
By any chance, have you saved a copy of the problematic /var/db/nscd/hosts?

Comment 6 Matthew Costello 2004-10-27 15:19:35 UTC
I was still in fix-it mode and didn't think to save a copy of
/var/db/nscd/hosts until afterwards.  Don't waste your time trying to
reproduce this problem. I assume the file got corrupted somehow.  It
wouldn't be the first time... I had to remove the /var/db/nscd/users
file a week earlier because nscd would not let me access my own
userid; it had somehow cached an incorrect negative lookup and would
not flush it out.  This box is also my master LDAP server, although
that should not matter; the DNS entry for ldap-r.ottix.com has the A
records for both the master and the copy (on FC2).  I did reorder the
entries in /etc/rc3.d/S* to start LDAP before nscd.

This bug report should probably be passed to the nscd maintainers to
do a code audit of that portion of nscd in glibc to verify that a
corrupted nscd database does not cause innocent applications to crash.

I've not looked at the nscd code, but a common problem with mmap'd
databases is that dirty pages are not flushed/written in any
particular order, so it is very easy to get a corrupted database if
the kernel hangs or panics.  If particular pages are being accessed on
a frequent basis then the page flushing algorithm may not write out
those pages for a long long time.

Comment 7 Ulrich Drepper 2004-11-10 08:40:52 UTC
> I've not looked at the nscd code, but a common problem with mmap'd
> databases is that dirty pages are not flushed/written

I know that very well.  This is why I have msync calls all over the
place.  They are async which means there is a bigger chance that
kernel crashes will cause problems.  This is nothing which can be
prevented.

The glibc code has some tests to check for corrupted caches.  Still,
if you have problems again, look at the actual location of the crash.
 This would help determining which checks must be added.

There is not much we can do here.  Corrupt data let to crash and we
cannot reproduce it.  So I'm closing the bug as UPSTREAM since this is
where the new tests I added will come from.

Comment 8 Jakub Jelinek 2004-11-10 13:29:27 UTC
The extra checks are in nscd-2.3.3-76 in rawhide.


Note You need to log in before you can comment on or make changes to this bug.