Description of problem: After updating FC3t3 (via yum) to nscd-2.3.3-73, glibc-2.3.3-73, etc. postfix failed to start when the machine was rebooted because it was segfaulting. The 'newaliases' (postfix version) command also segfaulted. I strace'd the newaliases command: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"hosts\0", 6}], msg_controllen=16, {cmsg_len=16, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {5}}, msg_flags=0}, 0) = 6 fstat64(5, {st_mode=S_IFREG|0600, st_size=217016, ...}) = 0 pread(5, "\1\0\0\0h\0\0\0\0\0\0\0\1\0\0\0\333T}A\0\0\0\0\323\0\0"..., 104, 0) = 104 mmap2(NULL, 217016, PROT_READ, MAP_SHARED, 5, 0) = 0xf6fb7000 close(5) = 0 close(4) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- The file being passed in the recvmsg() was /var/db/nscd/hosts, so I stopped nscd. This fixed the problem, so I removed the 3 files in /var/run/nscd/ and restarted nscd. Everything still worked. My guess is that there was a change in the binary format of thenscd database files. I am reporting this incident because it indicates a possible security problem. Postfix & newaliases should not have segfaulted due to the contents of a nscd file. The access library (in /lib/libresolv.so.2 ?) should check the file and not segfault if the file is in the wrong format. Additionally, updating nscd should remove the cache files in /var/db/nscd when the format of those files changes. Version-Release number of selected component (if applicable): nscd-2.3.3-73 glibc-2.3.3-73 How reproducible: probably not easily reporducible. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: This bug should probably be filed under "nscd", which does exist on the bugzilla query form, but not on the bug entry form.
Assigning this to glibc for now. If this is fixed in nscd, please reassign to me if there is something to fix in postfix, too.
From what exact glibc/nscd version you have updated?
I "yum update"d late on October 24. The previous "yum update" was on October 20. This updated a total of 85 packages, but the only other libraries updated are the X11 packages, and this box does not run X. from glibc-2.3.3-70.i686.rpm to glibc-2.3.3-73.i686.rpm also upgraded with same versions (but i386 arch) glibc-common, glibc-devel, glibc-headers & nscd The box is an EPIA CL10000 w/ a VIA Nehemiah processor.
Created attachment 105797 [details] YUM update log with list of packages updated
There were no nscd related changes between -70 and -73 at all. By any chance, have you saved a copy of the problematic /var/db/nscd/hosts?
I was still in fix-it mode and didn't think to save a copy of /var/db/nscd/hosts until afterwards. Don't waste your time trying to reproduce this problem. I assume the file got corrupted somehow. It wouldn't be the first time... I had to remove the /var/db/nscd/users file a week earlier because nscd would not let me access my own userid; it had somehow cached an incorrect negative lookup and would not flush it out. This box is also my master LDAP server, although that should not matter; the DNS entry for ldap-r.ottix.com has the A records for both the master and the copy (on FC2). I did reorder the entries in /etc/rc3.d/S* to start LDAP before nscd. This bug report should probably be passed to the nscd maintainers to do a code audit of that portion of nscd in glibc to verify that a corrupted nscd database does not cause innocent applications to crash. I've not looked at the nscd code, but a common problem with mmap'd databases is that dirty pages are not flushed/written in any particular order, so it is very easy to get a corrupted database if the kernel hangs or panics. If particular pages are being accessed on a frequent basis then the page flushing algorithm may not write out those pages for a long long time.
> I've not looked at the nscd code, but a common problem with mmap'd > databases is that dirty pages are not flushed/written I know that very well. This is why I have msync calls all over the place. They are async which means there is a bigger chance that kernel crashes will cause problems. This is nothing which can be prevented. The glibc code has some tests to check for corrupted caches. Still, if you have problems again, look at the actual location of the crash. This would help determining which checks must be added. There is not much we can do here. Corrupt data let to crash and we cannot reproduce it. So I'm closing the bug as UPSTREAM since this is where the new tests I added will come from.
The extra checks are in nscd-2.3.3-76 in rawhide.