Bug 1108615 - [abrt] nfs-utils: strcmp(): rpc.gssd killed by SIGSEGV
Summary: [abrt] nfs-utils: strcmp(): rpc.gssd killed by SIGSEGV
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 20
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:d1581b4e97f300e68b5e13c4ba9...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-12 10:29 UTC by Brian J. Murrell
Modified: 2014-10-15 17:40 UTC (History)
6 users (show)

Fixed In Version: nfs-utils-1.3.0-7.0.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-27 09:59:15 UTC
Type: ---


Attachments (Terms of Use)
File: backtrace (69.40 KB, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: cgroup (161 bytes, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: core_backtrace (2.69 KB, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: dso_list (1.67 KB, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: environ (215 bytes, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: exploitable (82 bytes, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: limits (1.29 KB, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: maps (8.52 KB, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: open_fds (969 bytes, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: proc_pid_status (903 bytes, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details
File: var_log_messages (720 bytes, text/plain)
2014-06-12 10:29 UTC, Brian J. Murrell
no flags Details

Description Brian J. Murrell 2014-06-12 10:29:27 UTC
Description of problem:
Happens at system startup.

Version-Release number of selected component:
nfs-utils-1.3.0-0.1.fc20

Additional info:
reporter:       libreport-2.2.2
backtrace_rating: 4
cmdline:        /usr/sbin/rpc.gssd
crash_function: strcmp
executable:     /usr/sbin/rpc.gssd
kernel:         3.14.4-200.fc20.x86_64
runlevel:       N 5
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (6 frames)
 #0 strcmp at ../sysdeps/x86_64/strcmp.S:210
 #1 find_keytab_entry
 #2 gssd_refresh_krb5_machine_credential
 #3 process_krb5_upcall
 #4 handle_gssd_upcall
 #5 gssd_run

Comment 1 Brian J. Murrell 2014-06-12 10:29:31 UTC
Created attachment 908050 [details]
File: backtrace

Comment 2 Brian J. Murrell 2014-06-12 10:29:32 UTC
Created attachment 908051 [details]
File: cgroup

Comment 3 Brian J. Murrell 2014-06-12 10:29:35 UTC
Created attachment 908052 [details]
File: core_backtrace

Comment 4 Brian J. Murrell 2014-06-12 10:29:36 UTC
Created attachment 908053 [details]
File: dso_list

Comment 5 Brian J. Murrell 2014-06-12 10:29:38 UTC
Created attachment 908054 [details]
File: environ

Comment 6 Brian J. Murrell 2014-06-12 10:29:39 UTC
Created attachment 908055 [details]
File: exploitable

Comment 7 Brian J. Murrell 2014-06-12 10:29:41 UTC
Created attachment 908056 [details]
File: limits

Comment 8 Brian J. Murrell 2014-06-12 10:29:43 UTC
Created attachment 908057 [details]
File: maps

Comment 9 Brian J. Murrell 2014-06-12 10:29:44 UTC
Created attachment 908058 [details]
File: open_fds

Comment 10 Brian J. Murrell 2014-06-12 10:29:46 UTC
Created attachment 908059 [details]
File: proc_pid_status

Comment 11 Brian J. Murrell 2014-06-12 10:29:48 UTC
Created attachment 908060 [details]
File: var_log_messages

Comment 12 Steve Dickson 2014-07-02 11:55:43 UTC
How reproducible is this? 

Also are you setting a realm with the -R flag?

Comment 13 Brian J. Murrell 2014-07-02 12:37:04 UTC
(In reply to Steve Dickson from comment #12)
> How reproducible is this? 

100%.  It happens without fail on every login[1]

> Also are you setting a realm with the -R flag?

I'm just letting systemd start it as it sees fit both on it's own during system boot, before it crashes and afterward when I restart it.  It doesn't look like systemd is setting the realm flag:

$ ps -efwww | grep gss
root      1371     1  0 Jun25 ?        00:00:00 /usr/sbin/rpc.svcgssd
root      2763     1  0 Jun25 ?        00:00:07 /usr/sbin/rpc.gssd


[1] at least every login after a system restart -- but I only typically ever login once between reboots and never log out until the next reboot so I don't know what the behaviour is if I just log out and back in again

Comment 14 Steve Dickson 2014-07-02 14:12:58 UTC
(In reply to Brian J. Murrell from comment #13)
> (In reply to Steve Dickson from comment #12)
> > How reproducible is this? 
> 
> 100%.  It happens without fail on every login[1]
> 
> > Also are you setting a realm with the -R flag?
> 
> I'm just letting systemd start it as it sees fit both on it's own during
> system boot, before it crashes and afterward when I restart it.  It doesn't
> look like systemd is setting the realm flag:
> 
> $ ps -efwww | grep gss
> root      1371     1  0 Jun25 ?        00:00:00 /usr/sbin/rpc.svcgssd
> root      2763     1  0 Jun25 ?        00:00:07 /usr/sbin/rpc.gssd
> 
> 
> [1] at least every login after a system restart -- but I only typically ever
> login once between reboots and never log out until the next reboot so I
> don't know what the behaviour is if I just log out and back in again

I think you might be running into this problem

commit 25e83c2270b2d2966c992885faed0b79be09f474
Author: Jeff Layton <jlayton>
Date:   Thu May 1 11:15:16 2014 -0400

    mountd: fix segfault in add_name with newer gcc compilers

but in rpc.gssd... 

This is fix in the latest nfs-utils release
  nfs-utils-1.3.0-2.1.fc20

Would it be possible to upgrade to that package?

Comment 15 Brian J. Murrell 2014-07-29 13:39:36 UTC
I have upgraded.  I guess we'll see if it's fixed on next reboot.

Comment 16 Brian J. Murrell 2014-08-10 19:40:16 UTC
Still a problem.  It crashed on log in to my completely up-to-date system this morning.  I had to:

# systemctl restart rpcgssd

after logging is as systemctl status was reporting:

# systemctl status rpcgssd
nfs-secure.service - Secure NFS
   Loaded: loaded (/usr/lib/systemd/system/nfs-secure.service; enabled)
   Active: active (running) since Sun 2014-08-10 10:45:39 EDT; 4min 32s ago
  Process: 1251 ExecStart=/usr/sbin/rpc.gssd $RPCGSSDARGS (code=exited, status=0/SUCCESS)
 Main PID: 1310 (rpc.gssd)
   CGroup: /system.slice/nfs-secure.service
           └─1310 /usr/sbin/rpc.gssd

Aug 10 10:45:39 pc.interlinx.bc.ca systemd[1]: Started Secure NFS.
Aug 10 10:46:48 pc.interlinx.bc.ca rpc.gssd[1310]: WARNING: forked child was killed with signal 11

Here's my version of nfs-utils:

$ rpm -q nfs-utils
nfs-utils-1.3.0-2.1.fc20.x86_64

Comment 17 Brian J. Murrell 2014-08-21 22:13:30 UTC
Just had to log out/in again today and got one of these.  abrt's not sending any more since it's detecting the dupe.

Comment 18 Brian J. Murrell 2014-08-21 23:06:34 UTC
Still seeing this on up-to-date (as of right now) Fedora 20.

Comment 19 Brian J. Murrell 2014-08-25 00:17:23 UTC
Is there anything further that we can do about this given that it happens 100% of every single time I reboot/login and is quite annoying.

Comment 20 Steve Dickson 2014-08-25 12:08:01 UTC
(In reply to Brian J. Murrell from comment #19)
> Is there anything further that we can do about this given that it happens
> 100% of every single time I reboot/login and is quite annoying.

I can image this is pretty annoying... but unfortunately I just
can seems to reproduce this...  by chance are you using an AD
vs. a KDC for your kerberos?

Comment 21 Steve Dickson 2014-08-25 14:51:14 UTC
Also how many realms are defined in you krb5.conf file? It could
be something in your configuration that causing this...

Comment 22 J. Bruce Fields 2014-08-25 18:52:11 UTC
Could we get better backtraces with "debuginfo-install nfs-utils"?  (Following http://fedoraproject.org/wiki/StackTraces#debuginfo).

find_keytab_entry() is unfortunately a rather long function with at least 4 strcmp's in it, we'd have a better chance if we could narrow this down a little.

Comment 23 Brian J. Murrell 2014-08-26 00:24:19 UTC
(In reply to J. Bruce Fields from comment #22)
> Could we get better backtraces with "debuginfo-install nfs-utils"? 

I wish we could, but this is abrt doing the backtracing and it seems to have completely fallen down at managing it's /var/cache/abrt-di cache and not freeing up space in it to allow for newer/more relevant debuginfo packages to be installed.  This is bug #811978.

> find_keytab_entry() is unfortunately a rather long function with at least 4
> strcmp's in it, we'd have a better chance if we could narrow this down a
> little.

Agreed.  It would probably be useful all around, for many bug reports even if bug #811978 was fixed so that everyone with a full /var/cache/abrt-di isn't sending useless bug reports.  Surely this has enough value internally to give bug #811978 some priority, yes?

(In reply to Steve Dickson from comment #21)
> Also how many realms are defined in you krb5.conf file? It could
> be something in your configuration that causing this...

$ cat /etc/krb5.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 dns_lookup_realm = true
 dns_lookup_kdc = true
 allow_weak_crypto = true
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 rdns = true
# default_realm = EXAMPLE.COM
# default_ccache_name = KEYRING:persistent:%{uid}
 default_ccache_name = FILE:/tmp/krb5cc_%{uid}

[realms]
# EXAMPLE.COM = {
#  kdc = kerberos.example.com
#  admin_server = kerberos.example.com
# }

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM

With the relevant realms (just one though) defined in DNS here.

Comment 24 Brian J. Murrell 2014-08-27 10:18:13 UTC
I think I have the backtrace you are looking for:

(gdb) where
#0  __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:210
#1  0x00007f60e0f6d36d in find_keytab_entry (context=0x7f60e2c7d920, 
    kt=0x7f60e2c77a60, 
    tgtname=tgtname@entry=0x7f60e2c768a0 "linux.interlinx.bc.ca", 
    kte=kte@entry=0x7fff49345840, svcnames=svcnames@entry=0x7fff49345810)
    at krb5_util.c:866
#2  0x00007f60e0f6e09d in gssd_refresh_krb5_machine_credential (
    hostname=0x7f60e2c768a0 "linux.interlinx.bc.ca", ple=ple@entry=0x0, 
    service=service@entry=0x7f60e2c7d110 "*") at krb5_util.c:1284
#3  0x00007f60e0f6afaf in process_krb5_upcall (clp=clp@entry=0x7f60e2c777c0, 
    uid=0, fd=16, tgtname=tgtname@entry=0x0, 
    service=service@entry=0x7f60e2c7d110 "*") at gssd_proc.c:1130
#4  0x00007f60e0f6c093 in handle_gssd_upcall (clp=clp@entry=0x7f60e2c777c0)
    at gssd_proc.c:1352
#5  0x00007f60e0f69b69 in scan_poll_results (ret=1) at gssd_main_loop.c:85
#6  gssd_poll (nfds=<optimized out>, fds=<optimized out>)
    at gssd_main_loop.c:197
#7  gssd_run () at gssd_main_loop.c:253
#8  0x00007f60e0f6860a in main (argc=<optimized out>, argv=0x7fff49345d38)
    at gssd.c:212

Comment 25 J. Bruce Fields 2014-08-27 14:00:24 UTC
Thanks!  I'm assuming steved will look at it, but meanwhile I'm curious, for future reference--how did you work around the abrt problem?

Comment 26 Brian J. Murrell 2014-08-27 14:25:02 UTC
(In reply to J. Bruce Fields from comment #25)
> I'm curious, for
> future reference--how did you work around the abrt problem?

I installed the required debuginfo packages on the main system using debuginfo-install and then went hunting (i.e. brute force search) for the abrt report in /var/tmp/abrt and then used gdb on the corefile in the found dir.

Comment 27 Steve Dickson 2014-09-02 17:55:01 UTC
(In reply to Brian J. Murrell from comment #23)
> (In reply to Steve Dickson from comment #21)
> > Also how many realms are defined in you krb5.conf file? It could
> > be something in your configuration that causing this...
> 
> $ cat /etc/krb5.conf
> [logging]
>  default = FILE:/var/log/krb5libs.log
>  kdc = FILE:/var/log/krb5kdc.log
>  admin_server = FILE:/var/log/kadmind.log
> 
> [libdefaults]
>  dns_lookup_realm = true
>  dns_lookup_kdc = true
>  allow_weak_crypto = true
>  ticket_lifetime = 24h
>  renew_lifetime = 7d
>  forwardable = true
>  rdns = true
> # default_realm = EXAMPLE.COM
> # default_ccache_name = KEYRING:persistent:%{uid}
>  default_ccache_name = FILE:/tmp/krb5cc_%{uid}
> 
> [realms]
> # EXAMPLE.COM = {
> #  kdc = kerberos.example.com
> #  admin_server = kerberos.example.com
> # }
> 
> [domain_realm]
> # .example.com = EXAMPLE.COM
> # example.com = EXAMPLE.COM
> 
> With the relevant realms (just one though) defined in DNS here.
I bet this is the problem... 

Just to be clear how have SRV records that looks similar to

_kerberos._udp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com
_kerberos._tcp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com
_kpasswd._udp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com
_kpasswd._tcp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com

If so let me set somthing up and give it a try...

Comment 28 Brian J. Murrell 2014-09-02 18:24:26 UTC
I only have the _udp... records:

_kerberos._udp.interlinx.bc.ca.	60 IN	SRV	0 0 88 linux.interlinx.bc.ca.
_kpasswd._udp.interlinx.bc.ca. 60 IN	SRV	0 0 464 linux.interlinx.bc.ca.

The Kerberos V5 System Administrator's Guide, in section 4.2 Hostnames for KDCs says:

_kerberos._tcp
This is for contacting any KDC by TCP. The MIT KDC by default will not listen on any TCP ports, so unless you've changed the configuration or you're running another KDC implementation, you should leave this unspecified. If you do enable TCP support, normally you should use port 88.

Comment 29 Steve Dickson 2014-09-04 15:29:23 UTC
(In reply to Brian J. Murrell from comment #28)
> I only have the _udp... records:
> 
> _kerberos._udp.interlinx.bc.ca.	60 IN	SRV	0 0 88 linux.interlinx.bc.ca.
> _kpasswd._udp.interlinx.bc.ca. 60 IN	SRV	0 0 464 linux.interlinx.bc.ca.

I went a head and reconfigured my DNS server to hand out the
realms and I still am not seeing this crash.... 

There has to be something in your configuration that making rpc.gssd
tip over... I just don't what it is...

Comment 30 Jeff Layton 2014-09-04 15:43:53 UTC
Might be best to try and analyze the core. The strcmp in question is here, I think:

        if (strcmp (realm, preferred_realm) != 0) {
                realm = preferred_realm;
                /* resetting the realmnames index */
                i = -1;
        }


...which suggests that either "realm" or "preferred_realm" is bogus. Brian, perhaps you can do a bit of poking with gdb to figure out which it is?

Comment 31 Steve Dickson 2014-09-04 16:09:14 UTC
(In reply to Jeff Layton from comment #30)
> Might be best to try and analyze the core. The strcmp in question is here, I
> think:
> 
>         if (strcmp (realm, preferred_realm) != 0) {
>                 realm = preferred_realm;
>                 /* resetting the realmnames index */
>                 i = -1;
>         }
> 
> 
> ...which suggests that either "realm" or "preferred_realm" is bogus. Brian,
> perhaps you can do a bit of poking with gdb to figure out which it is?
Right... I was thinking the realm(s) coming back from the DNS 
query were bad... hoping the bug was in the kerberos code ;-) 

But things work just fine in my world...

Comment 32 Brian J. Murrell 2014-09-04 16:55:50 UTC
Damn.  I don't think I have that core file around any more.  I will have to wait until an opportune time to reboot and create a new one.  Hopefully shortly.

Comment 33 Brian J. Murrell 2014-09-07 13:15:39 UTC
OK.  Here it is:

(gdb) frame 1
#1  0x00007f22dc34a36d in find_keytab_entry (context=0x7f22dc88ca80, 
    kt=0x7f22dc885b30, 
    tgtname=tgtname@entry=0x7f22dc88a9d0 "linux.interlinx.bc.ca", 
    kte=kte@entry=0x7fff4b1b5b20, svcnames=svcnames@entry=0x7fff4b1b5af0)
    at krb5_util.c:866
866		if (strcmp (realm, preferred_realm) != 0) {
(gdb) print realm
$1 = 0x7f22dc88d110 ""
(gdb) print preferred_realm
$2 = 0x0

Comment 34 Jeff Layton 2014-09-07 13:42:48 UTC
Ok, looks like a fairly straightforward bug. "preferred_realm" is NULL, most likely because default_realm in your krb5.conf is commented out. If you uncomment that and set it to something sane, then it will probably work around the bug.

To fix the segfault, I think that if statement should read:

    if (preferred_realm && strcmp (realm, preferred_realm) != 0)

Comment 35 Steve Dickson 2014-09-08 14:30:38 UTC
(In reply to Brian J. Murrell from comment #33)
> OK.  Here it is:
> 
> (gdb) frame 1
> #1  0x00007f22dc34a36d in find_keytab_entry (context=0x7f22dc88ca80, 
>     kt=0x7f22dc885b30, 
>     tgtname=tgtname@entry=0x7f22dc88a9d0 "linux.interlinx.bc.ca", 
>     kte=kte@entry=0x7fff4b1b5b20, svcnames=svcnames@entry=0x7fff4b1b5af0)
>     at krb5_util.c:866
> 866		if (strcmp (realm, preferred_realm) != 0) {
> (gdb) print realm
> $1 = 0x7f22dc88d110 ""
> (gdb) print preferred_realm
> $2 = 0x0
Yep... that is the needed info... thanks!

Comment 36 Brian J. Murrell 2014-09-09 20:04:33 UTC
(In reply to Jeff Layton from comment #34)
> If you
> uncomment that and set it to something sane, then it will probably work
> around the bug.

But which is preferred, the krb5.conf:default_realm or what's in my DNS SRV records?  Ultimately I don't want to usurp what my network operators are telling me.

Comment 37 Jeff Layton 2014-09-09 20:08:42 UTC
AIUI, the SRV records just tell you which host serves the given service. That doesn't tell you anything about the realm.

Comment 38 Jeff Layton 2014-09-09 20:14:36 UTC
Oh and fwiw, I sent steved a patch for this yesterday (I meant to cc Brian, but dropped the ball):

    http://article.gmane.org/gmane.linux.nfs/66217

...if you can test that patch then that would be nice too.

Comment 39 Brian J. Murrell 2014-09-09 20:16:19 UTC
Sorry, not SRV records but realm TXT record mappings such as when you have:

[libdefaults]
 dns_lookup_realm = true

and have a _kerberos.$domain TXT record as such:

_kerberos.example.com. 60	IN	TXT	"EXAMPLE.COM"

So to be clear, I don't mind having a "default_realm = EXAMPLE.COM" in my krb5.conf so long as it's secondary to what DNS is telling kerberos.

Comment 40 Brian J. Murrell 2014-09-09 20:17:36 UTC
(In reply to Jeff Layton from comment #38)
> Oh and fwiw, I sent steved a patch for this yesterday (I meant to cc Brian,
> but dropped the ball):
> 
>     http://article.gmane.org/gmane.linux.nfs/66217

Ahhh.  Nice.
 
> ...if you can test that patch then that would be nice too.

Isn't there a way for people with access to do "scratch builds"?  I have not looked into what I would need to be able to do one of those, but if you already knew and had the access and it was easy enough, I'd be happy to install a scratch build.  :-)

Comment 41 Jeff Layton 2014-09-09 20:24:31 UTC
Ahh ok. I see what you're saying now. TBH, my experience with multiple realms is pretty limited, so I can't really speak authoritatively here.

My suggestion to add a default_realm was simply to work around the segfault -- not to provide a long-term solution to your problem. The real fix for that problem is to patch gssd.

Based on how the gssd code is structured, I imagine you could put a bogus realm name in there and it should end up falling back to finding other realms, but I'd recommend testing that theory to make sure.

Comment 42 Fedora Update System 2014-09-16 14:33:06 UTC
nfs-utils-1.3.0-2.2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/nfs-utils-1.3.0-2.2.fc20

Comment 43 Fedora Update System 2014-09-16 14:34:14 UTC
nfs-utils-1.3.0-7.0.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/nfs-utils-1.3.0-7.0.fc21

Comment 44 Fedora Update System 2014-09-16 18:42:58 UTC
Package nfs-utils-1.3.0-7.0.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing nfs-utils-1.3.0-7.0.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-10842/nfs-utils-1.3.0-7.0.fc21
then log in and leave karma (feedback).

Comment 45 Fedora Update System 2014-09-27 09:59:15 UTC
nfs-utils-1.3.0-2.2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 46 Fedora Update System 2014-10-14 04:31:21 UTC
nfs-utils-1.3.0-7.0.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 47 Brian J. Murrell 2014-10-15 17:40:49 UTC
Rebooted this morning and got logged in without reproducing this issue.

Thanks much!


Note You need to log in before you can comment on or make changes to this bug.