Bug 986427 - segfault error 4 at libc-2.17.so
segfault error 4 at libc-2.17.so
Status: CLOSED DUPLICATE of bug 977995
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
19
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Carlos O'Donell
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-19 14:45 EDT by Henrique Martins
Modified: 2016-11-24 10:58 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-30 17:40:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Henrique Martins 2013-07-19 14:45:36 EDT
Description of problem:
After the F18 to F19 upgrade, via FedUP, I can't use nfs because rpcbind crashes with a segfault, error 4, in libc-2.17.so.
Sshd generates the exact same crash, though not all the times, i.e. it works fine when I ssh directly into the machine, but always crashes when I try through a reverse tunnel.


Version-Release number of selected component (if applicable):
glibc-2.17-11.fc19.x86_64
rpcbind-0.2.0-21.fc19.x86_64
openssh-server-6.2p2-3.fc19.x86_64


How reproducible:
Always.


Steps to Reproduce:
1. cd to any nfs mountpoint (automounted via autofs and e.g. /net -hosts) or
2. ssh out with reverse tunnel setup, then try to ssh back in using the tunnel

Actual results:
rpcbind and sshd crash, nfs unusable, reverse tunnel unusable

Expected results:
Working as it used to work prior to the upgrade, way back several Fedora versions, as this setup is not new.

Additional info:
/var/log/messages corresponding to several crashes:
Jul 19 10:33:21 kernel: [ 1428.721350] rpcbind[920]: segfault at 4e595040 ip 000000385de48e29 sp 00007fff4e592820 error 4 in libc-2.17.so[385de00000+1b5000]
Jul 19 10:47:00 kernel: [   68.001945] rpcbind[929]: segfault at af411e20 ip 000000385de48e29 sp 00007fffaf40f600 error 4 in libc-2.17.so[385de00000+1b5000]
Jul 19 11:03:05 kernel: [   55.334272] rpcbind[947]: segfault at f368b210 ip 000000385de48e29 sp 00007ffff36889f0 error 4 in libc-2.17.so[385de00000+1b5000]
Jul 19 11:08:41 kernel: [  140.148559] rpcbind[937]: segfault at 18fb2080 ip 000000385de48e29 sp 00007fff18faf860 error 4 in libc-2.17.so[385de00000+1b5000]
Jul 19 10:38:42 kernel: [ 1749.268885] sshd[3985]: segfault at f1480c0 ip 00007ff4631a5e29 sp 00007fff0f1458a0 error 4 in libc-2.17.so[7ff46315d000+1b5000]

At first I suspected bad RAM (which may still be the case, thus I powered down the machine and tried with three different 2 GBytes modules, one at the time, the first three lines above, then with all 6 GBytes installed, the fourth line. Only tried the sshd case along with the first rpcbind line shown.
Comment 1 Steven Ellis 2013-07-27 07:14:35 EDT
I've got similar errors for alsactl and  mission-control

[   16.010949] alsactl[1482]: segfault at 1 ip 00007f9f2a55aa7d sp 00007ffff28e3a50 error 4 in libc-2.17.so[7f9f2a512000+1b5000]
[  115.715330] mission-control[2740]: segfault at 1 ip 00007f1617866e29 sp 00007fffcc26c540 error 4 in libc-2.17.so[7f161781e000+1b5000]


A quick check shows I still have a bunch of glibc packages installed
rpm -qa | grep fc17

libfreebob-1.0.11-11.fc17.x86_64
gnome-pilot-devel-2.91.93-5.fc17.x86_64
smolt-firstboot-1.4.3-6.fc17.noarch
kernel-3.6.7-4.fc17.x86_64
gnome-shell-extension-remove-accessibility-icon-20111008-3.fc17.noarch
gnome-pilot-2.91.93-5.fc17.x86_64
kernel-modules-extra-3.6.7-4.fc17.x86_64
libfaac0-1.28-6.fc17.x86_64
preupgrade-1.1.10-2.fc17.noarch
pbm2l7k-990321-9.fc17.x86_64
kernel-devel-3.4.4-5.fc17.x86_64
libreplaygain-0.9.1-0.1.svn453.fc17.x86_64
printer-filters-1.1-6.fc17.noarch
mtpfs-1.1-0.2.svn20120510.fc17.x86_64
system-config-lvm-1.1.18-1.fc17.noarch
c2050-0.3b-5.fc17.x86_64
pbm2l2030-1.4-7.fc17.x86_64
kernel-devel-3.4.0-1.fc17.x86_64
kernel-3.6.3-1.fc17.x86_64
xmms-libs-1.2.11-40.fc17.x86_64
glibc-debuginfo-common-2.15-58.fc17.x86_64
libx264_118-0.118-17_20111111.2245.fc17.x86_64
libgoom2-0-3.fc17.x86_64
twinkle-1.4.2-17.fc17.x86_64
cjet-0.8.9-11.fc17.x86_64
lx-20030328-7.fc17.x86_64
grub-efi-0.97-93.fc17.x86_64
nss-softokn-debuginfo-3.13.6-2.fc17.x86_64
gnome-mag-0.16.2-4.fc17.x86_64
ar9170-firmware-2009.05.28-4.fc17.noarch
musepack-tools-sv8-3.svn435.fc17.x86_64
libmms-0.6.2-4.fc17.x86_64
java-1.7.0-openjdk-1.7.0.9-2.3.3.2.fc17.i686
kernel-devel-3.6.7-4.fc17.x86_64
smolt-1.4.3-6.fc17.noarch
kernel-3.6.6-1.fc17.x86_64
vcdimager-0.7.24-10.fc17.x86_64
c2070-0.99-8.fc17.x86_64
libvcdinfo0-0.7.24-10.fc17.x86_64
fedup-0.7.3-5.fc17.noarch
glibc-debuginfo-2.15-58.fc17.x86_64
Comment 2 Carlos O'Donell 2013-07-30 14:24:15 EDT
This by itself is insufficient to determine the root cause of the problem or if it is even a problem in glibc.

Can someone plesae provide a core file for any of the segfaults?
Comment 3 Henrique Martins 2013-07-30 15:03:21 EDT
I would if I could find one.  Seems my abrtd is not behaving properly, or at least /var/spool/abrtd is empty.  What's the best way to get a core dump?
Comment 4 Henrique Martins 2013-07-30 17:30:06 EDT
Found the cause of the crash for both rpcbind and sshd. Couldn't get a core from either of those programs thus I proceeded to run rpcbind as root under gdb and dump a core with gcore. I'm not going to attach the core file here, as, using strings, I could see it contained names of internal machines I don't want to post in a public forum.

However
% gdb /usr/sbin/rpcbind
...
(gdb)  run -a -d -f
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7415e29 in vfprintf () from /lib64/libc.so.6
(gdb) t
[Current thread is 1 (Thread 0x7ffff7fca840 (LWP 30575))]
(gdb) bt
#0  0x00007ffff7415e29 in vfprintf () from /lib64/libc.so.6
#1  0x00007ffff74d78a6 in __vfprintf_chk () from /lib64/libc.so.6
#2  0x00007ffff74bc245 in __vsyslog_chk () from /lib64/libc.so.6
#3  0x00007ffff7793b61 in tcpd_diag.constprop.0 () from /lib64/libwrap.so.0
#4  0x00007ffff7793c48 in tcpd_warn () from /lib64/libwrap.so.0
#5  0x00007ffff7792feb in sock_hostname () from /lib64/libwrap.so.0
#6  0x00007ffff7792539 in eval_hostname () from /lib64/libwrap.so.0
#7  0x00007ffff7790ee5 in host_match () from /lib64/libwrap.so.0
#8  0x00007ffff779054b in list_match () from /lib64/libwrap.so.0
#9  0x00007ffff7790729 in table_match () from /lib64/libwrap.so.0
#10 0x00007ffff7790869 in hosts_access () from /lib64/libwrap.so.0
#11 0x000055555555ce23 in check_access ()
#12 0x0000555555557f9c in pmap_service ()
#13 0x00007ffff7bc9361 in svc_getreq_common () from /lib64/libtirpc.so.1
#14 0x00007ffff7bc94cb in svc_getreq_poll () from /lib64/libtirpc.so.1
#15 0x000055555555b4e3 in my_svc_run ()
#16 0x00005555555572e0 in main ()
(gdb) gcore
warning: target file /proc/30816/ cmdline contained unexpected null characters
Saved corefile core.30816
(gdb) quit

Looks like we're in tcp wrappers land. Looking at the output of:
% strings core.30816 |more 
I see the following curious two lines:
<27>Jul 30 13:27:05 rpcbind: warning: /etc/hosts.deny, line 15: host name/address mismatch:
America/Los_Angeles

It does seem that the date and the warning message got somehow mixed up, which is possible if we're looking at allocated random string memory, but it lead me to look at hosts.deny, which, besides all the commented out or blank lines, contained a single directive:

ALL : <decommissioned_machine.my_company_domain.com>

where that <decommissioned_machine.my_company_domain.com> is the FQDN of a machine that no longer exists.
Just for fun I commented that line out, rerun rpcbind, and IT WORKS!!  Both rpcbind and sshd.  
Then I replaced that name with the name of a machine that actually exists, and it core dumped again.
There's a problem with tcp_wrappers somewhere.

After
  debuginfo-install rpcbind-0.2.0-21.fc19.x86_64
I get

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7415e29 in _IO_vfprintf_internal (s=s@entry=0x555555774dd0, format=<optimized out>, 
    format@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=<optimized out>) at vfprintf.c:1635
1635              process_string_arg (((struct printf_spec *) NULL));
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.5-4.fc19.x86_64 krb5-libs-1.11.3-2.fc19.x86_64 libcom_err-1.42.7-2.fc19.x86_64 libselinux-2.1.13-15.fc19.x86_64 nss-mdns-0.10-12.fc19.x86_64 pcre-8.32-7.fc19.x86_64
(gdb) bt
#0  0x00007ffff7415e29 in _IO_vfprintf_internal (s=s@entry=0x555555774dd0, format=<optimized out>, 
    format@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=<optimized out>) at vfprintf.c:1635
#1  0x00007ffff74d78a6 in ___vfprintf_chk (fp=fp@entry=0x555555774dd0, flag=flag@entry=1, 
    format=format@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=ap@entry=0x7fffffffa648) at vfprintf_chk.c:34
#2  0x00007ffff74bc245 in __GI___vsyslog_chk (pri=<optimized out>, pri@entry=3, flag=flag@entry=1, 
    fmt=fmt@entry=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", ap=ap@entry=0x7fffffffa648) at ../misc/syslog.c:222
#3  0x00007ffff7793b61 in vsyslog (__ap=0x7fffffffa648, 
    __fmt=0x7fffffff8610 "warning: /etc/hosts.deny, line 15: host name/address mismatch: %s != %.*s", __pri=3) at /usr/include/bits/syslog.h:47
#4  tcpd_diag (tag=tag@entry=0x7ffff77940e9 "warning", 
    format=format@entry=0x7ffff7794418 "host name/address mismatch: %s != %.*s", 
    ap=ap@entry=0x7fffffffa648, severity=3) at diag.c:45
#5  0x00007ffff7793c48 in tcpd_warn (
    format=format@entry=0x7ffff7794418 "host name/address mismatch: %s != %.*s") at diag.c:55
#6  0x00007ffff7792feb in sock_hostname (host=0x7fffffffb1f0) at socket.c:277
#7  0x00007ffff7792539 in eval_hostname (host=host@entry=0x7fffffffb1f0) at eval.c:77
#8  0x00007ffff7790ee5 in host_match (tok=0x7fffffffa875 "MACHINE.COMPANY.com", 
    host=0x7fffffffb1f0) at hosts_access.c:378
#9  0x00007ffff779054b in list_match (list=<optimized out>, request=0x7fffffffb0e0, 
    match_fn=0x7ffff7791080 <client_match>) at hosts_access.c:211
#10 0x00007ffff7790729 in table_match (table=<optimized out>, request=request@entry=0x7fffffffb0e0)
    at hosts_access.c:168
#11 0x00007ffff7790869 in hosts_access (request=request@entry=0x7fffffffb0e0) at hosts_access.c:127
#12 0x000055555555ce23 in check_access (xprt=xprt@entry=0x555555766c10, proc=proc@entry=3, 
    prog=100024, rpcbvers=rpcbvers@entry=2) at src/security.c:110
#13 0x0000555555557f9c in pmapproc_getport (rqstp=0x7fffffffb520, xprt=0x555555766c10)
    at src/pmap_svc.c:279
#14 pmap_service (rqstp=0x7fffffffb520, xprt=0x555555766c10) at src/pmap_svc.c:111
#15 0x00007ffff7bc9361 in svc_getreq_common (fd=<optimized out>) at svc.c:678
#16 0x00007ffff7bc94cb in svc_getreq_poll (pfdp=pfdp@entry=0x7fffffffbcb0, pollretval=1)
    at svc.c:761
#17 0x000055555555b4e3 in my_svc_run () at src/rpcb_svc_com.c:1166
#18 0x00005555555572e0 in main (argc=<optimized out>, argv=<optimized out>) at src/rpcbind.c:257

Is this enough to point out where the error is?
Comment 5 Henrique Martins 2013-07-30 17:40:51 EDT
Possible duplicate of bug 977995

*** This bug has been marked as a duplicate of bug 977995 ***

Note You need to log in before you can comment on or make changes to this bug.