Description of problem: After the F18 to F19 upgrade, via FedUP, I can't use nfs because rpcbind crashes with a segfault, error 4, in libc-2.17.so. Sshd generates the exact same crash, though not all the times, i.e. it works fine when I ssh directly into the machine, but always crashes when I try through a reverse tunnel. Version-Release number of selected component (if applicable): glibc-2.17-11.fc19.x86_64 rpcbind-0.2.0-21.fc19.x86_64 openssh-server-6.2p2-3.fc19.x86_64 How reproducible: Always. Steps to Reproduce: 1. cd to any nfs mountpoint (automounted via autofs and e.g. /net -hosts) or 2. ssh out with reverse tunnel setup, then try to ssh back in using the tunnel Actual results: rpcbind and sshd crash, nfs unusable, reverse tunnel unusable Expected results: Working as it used to work prior to the upgrade, way back several Fedora versions, as this setup is not new. Additional info: /var/log/messages corresponding to several crashes: Jul 19 10:33:21 kernel: [ 1428.721350] rpcbind[920]: segfault at 4e595040 ip 000000385de48e29 sp 00007fff4e592820 error 4 in libc-2.17.so[385de00000+1b5000] Jul 19 10:47:00 kernel: [ 68.001945] rpcbind[929]: segfault at af411e20 ip 000000385de48e29 sp 00007fffaf40f600 error 4 in libc-2.17.so[385de00000+1b5000] Jul 19 11:03:05 kernel: [ 55.334272] rpcbind[947]: segfault at f368b210 ip 000000385de48e29 sp 00007ffff36889f0 error 4 in libc-2.17.so[385de00000+1b5000] Jul 19 11:08:41 kernel: [ 140.148559] rpcbind[937]: segfault at 18fb2080 ip 000000385de48e29 sp 00007fff18faf860 error 4 in libc-2.17.so[385de00000+1b5000] Jul 19 10:38:42 kernel: [ 1749.268885] sshd[3985]: segfault at f1480c0 ip 00007ff4631a5e29 sp 00007fff0f1458a0 error 4 in libc-2.17.so[7ff46315d000+1b5000] At first I suspected bad RAM (which may still be the case, thus I powered down the machine and tried with three different 2 GBytes modules, one at the time, the first three lines above, then with all 6 GBytes installed, the fourth line. Only tried the sshd case along with the first rpcbind line shown.
I've got similar errors for alsactl and mission-control [ 16.010949] alsactl[1482]: segfault at 1 ip 00007f9f2a55aa7d sp 00007ffff28e3a50 error 4 in libc-2.17.so[7f9f2a512000+1b5000] [ 115.715330] mission-control[2740]: segfault at 1 ip 00007f1617866e29 sp 00007fffcc26c540 error 4 in libc-2.17.so[7f161781e000+1b5000] A quick check shows I still have a bunch of glibc packages installed rpm -qa | grep fc17 libfreebob-1.0.11-11.fc17.x86_64 gnome-pilot-devel-2.91.93-5.fc17.x86_64 smolt-firstboot-1.4.3-6.fc17.noarch kernel-3.6.7-4.fc17.x86_64 gnome-shell-extension-remove-accessibility-icon-20111008-3.fc17.noarch gnome-pilot-2.91.93-5.fc17.x86_64 kernel-modules-extra-3.6.7-4.fc17.x86_64 libfaac0-1.28-6.fc17.x86_64 preupgrade-1.1.10-2.fc17.noarch pbm2l7k-990321-9.fc17.x86_64 kernel-devel-3.4.4-5.fc17.x86_64 libreplaygain-0.9.1-0.1.svn453.fc17.x86_64 printer-filters-1.1-6.fc17.noarch mtpfs-1.1-0.2.svn20120510.fc17.x86_64 system-config-lvm-1.1.18-1.fc17.noarch c2050-0.3b-5.fc17.x86_64 pbm2l2030-1.4-7.fc17.x86_64 kernel-devel-3.4.0-1.fc17.x86_64 kernel-3.6.3-1.fc17.x86_64 xmms-libs-1.2.11-40.fc17.x86_64 glibc-debuginfo-common-2.15-58.fc17.x86_64 libx264_118-0.118-17_20111111.2245.fc17.x86_64 libgoom2-0-3.fc17.x86_64 twinkle-1.4.2-17.fc17.x86_64 cjet-0.8.9-11.fc17.x86_64 lx-20030328-7.fc17.x86_64 grub-efi-0.97-93.fc17.x86_64 nss-softokn-debuginfo-3.13.6-2.fc17.x86_64 gnome-mag-0.16.2-4.fc17.x86_64 ar9170-firmware-2009.05.28-4.fc17.noarch musepack-tools-sv8-3.svn435.fc17.x86_64 libmms-0.6.2-4.fc17.x86_64 java-1.7.0-openjdk-1.7.0.9-2.3.3.2.fc17.i686 kernel-devel-3.6.7-4.fc17.x86_64 smolt-1.4.3-6.fc17.noarch kernel-3.6.6-1.fc17.x86_64 vcdimager-0.7.24-10.fc17.x86_64 c2070-0.99-8.fc17.x86_64 libvcdinfo0-0.7.24-10.fc17.x86_64 fedup-0.7.3-5.fc17.noarch glibc-debuginfo-2.15-58.fc17.x86_64
This by itself is insufficient to determine the root cause of the problem or if it is even a problem in glibc. Can someone plesae provide a core file for any of the segfaults?
I would if I could find one. Seems my abrtd is not behaving properly, or at least /var/spool/abrtd is empty. What's the best way to get a core dump?
Found the cause of the crash for both rpcbind and sshd. Couldn't get a core from either of those programs thus I proceeded to run rpcbind as root under gdb and dump a core with gcore. I'm not going to attach the core file here, as, using strings, I could see it contained names of internal machines I don't want to post in a public forum. However % gdb /usr/sbin/rpcbind ... (gdb) run -a -d -f ... Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7415e29 in vfprintf () from /lib64/libc.so.6 (gdb) t [Current thread is 1 (Thread 0x7ffff7fca840 (LWP 30575))] (gdb) bt #0 0x00007ffff7415e29 in vfprintf () from /lib64/libc.so.6 #1 0x00007ffff74d78a6 in __vfprintf_chk () from /lib64/libc.so.6 #2 0x00007ffff74bc245 in __vsyslog_chk () from /lib64/libc.so.6 #3 0x00007ffff7793b61 in tcpd_diag.constprop.0 () from /lib64/libwrap.so.0 #4 0x00007ffff7793c48 in tcpd_warn () from /lib64/libwrap.so.0 #5 0x00007ffff7792feb in sock_hostname () from /lib64/libwrap.so.0 #6 0x00007ffff7792539 in eval_hostname () from /lib64/libwrap.so.0 #7 0x00007ffff7790ee5 in host_match () from /lib64/libwrap.so.0 #8 0x00007ffff779054b in list_match () from /lib64/libwrap.so.0 #9 0x00007ffff7790729 in table_match () from /lib64/libwrap.so.0 #10 0x00007ffff7790869 in hosts_access () from /lib64/libwrap.so.0 #11 0x000055555555ce23 in check_access () #12 0x0000555555557f9c in pmap_service () #13 0x00007ffff7bc9361 in svc_getreq_common () from /lib64/libtirpc.so.1 #14 0x00007ffff7bc94cb in svc_getreq_poll () from /lib64/libtirpc.so.1 #15 0x000055555555b4e3 in my_svc_run () #16 0x00005555555572e0 in main () (gdb) gcore warning: target file /proc/30816/ cmdline contained unexpected null characters Saved corefile core.30816 (gdb) quit Looks like we're in tcp wrappers land. Looking at the output of: % strings core.30816 |more I see the following curious two lines: <27>Jul 30 13:27:05 rpcbind: warning: /etc/hosts.deny, line 15: host name/address mismatch: America/Los_Angeles It does seem that the date and the warning message got somehow mixed up, which is possible if we're looking at allocated random string memory, but it lead me to look at hosts.deny, which, besides all the commented out or blank lines, contained a single directive: ALL : <decommissioned_machine.my_company_domain.com> where that <decommissioned_machine.my_company_domain.com> is the FQDN of a machine that no longer exists. Just for fun I commented that line out, rerun rpcbind, and IT WORKS!! Both rpcbind and sshd. Then I replaced that name with the name of a machine that actually exists, and it core dumped again. There's a problem with tcp_wrappers somewhere. After debuginfo-install rpcbind-0.2.0-21.fc19.x86_64 I get Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7415e29 in _IO_vfprintf_internal (s=s@entry=0x555555774dd0, format=<optimized out>, format@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=<optimized out>) at vfprintf.c:1635 1635 process_string_arg (((struct printf_spec *) NULL)); Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.5-4.fc19.x86_64 krb5-libs-1.11.3-2.fc19.x86_64 libcom_err-1.42.7-2.fc19.x86_64 libselinux-2.1.13-15.fc19.x86_64 nss-mdns-0.10-12.fc19.x86_64 pcre-8.32-7.fc19.x86_64 (gdb) bt #0 0x00007ffff7415e29 in _IO_vfprintf_internal (s=s@entry=0x555555774dd0, format=<optimized out>, format@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=<optimized out>) at vfprintf.c:1635 #1 0x00007ffff74d78a6 in ___vfprintf_chk (fp=fp@entry=0x555555774dd0, flag=flag@entry=1, format=format@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=ap@entry=0x7fffffffa648) at vfprintf_chk.c:34 #2 0x00007ffff74bc245 in __GI___vsyslog_chk (pri=<optimized out>, pri@entry=3, flag=flag@entry=1, fmt=fmt@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=ap@entry=0x7fffffffa648) at ../misc/syslog.c:222 #3 0x00007ffff7793b61 in vsyslog (__ap=0x7fffffffa648, __fmt=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", __pri=3) at /usr/include/bits/syslog.h:47 #4 tcpd_diag (tag=tag@entry=0x7ffff77940e9 "warning", format=format@entry=0x7ffff7794418 "host name/address mismatch: %s != %.*s", ap=ap@entry=0x7fffffffa648, severity=3) at diag.c:45 #5 0x00007ffff7793c48 in tcpd_warn ( format=format@entry=0x7ffff7794418 "host name/address mismatch: %s != %.*s") at diag.c:55 #6 0x00007ffff7792feb in sock_hostname (host=0x7fffffffb1f0) at socket.c:277 #7 0x00007ffff7792539 in eval_hostname (host=host@entry=0x7fffffffb1f0) at eval.c:77 #8 0x00007ffff7790ee5 in host_match (tok=0x7fffffffa875 "MACHINE.COMPANY.com", host=0x7fffffffb1f0) at hosts_access.c:378 #9 0x00007ffff779054b in list_match (list=<optimized out>, request=0x7fffffffb0e0, match_fn=0x7ffff7791080 <client_match>) at hosts_access.c:211 #10 0x00007ffff7790729 in table_match (table=<optimized out>, request=request@entry=0x7fffffffb0e0) at hosts_access.c:168 #11 0x00007ffff7790869 in hosts_access (request=request@entry=0x7fffffffb0e0) at hosts_access.c:127 #12 0x000055555555ce23 in check_access (xprt=xprt@entry=0x555555766c10, proc=proc@entry=3, prog=100024, rpcbvers=rpcbvers@entry=2) at src/security.c:110 #13 0x0000555555557f9c in pmapproc_getport (rqstp=0x7fffffffb520, xprt=0x555555766c10) at src/pmap_svc.c:279 #14 pmap_service (rqstp=0x7fffffffb520, xprt=0x555555766c10) at src/pmap_svc.c:111 #15 0x00007ffff7bc9361 in svc_getreq_common (fd=<optimized out>) at svc.c:678 #16 0x00007ffff7bc94cb in svc_getreq_poll (pfdp=pfdp@entry=0x7fffffffbcb0, pollretval=1) at svc.c:761 #17 0x000055555555b4e3 in my_svc_run () at src/rpcb_svc_com.c:1166 #18 0x00005555555572e0 in main (argc=<optimized out>, argv=<optimized out>) at src/rpcbind.c:257 Is this enough to point out where the error is?
Possible duplicate of bug 977995 *** This bug has been marked as a duplicate of bug 977995 ***