Hide Forgot
Description of problem: Happens at system startup. Version-Release number of selected component: nfs-utils-1.3.0-0.1.fc20 Additional info: reporter: libreport-2.2.2 backtrace_rating: 4 cmdline: /usr/sbin/rpc.gssd crash_function: strcmp executable: /usr/sbin/rpc.gssd kernel: 3.14.4-200.fc20.x86_64 runlevel: N 5 type: CCpp uid: 0 Truncated backtrace: Thread no. 1 (6 frames) #0 strcmp at ../sysdeps/x86_64/strcmp.S:210 #1 find_keytab_entry #2 gssd_refresh_krb5_machine_credential #3 process_krb5_upcall #4 handle_gssd_upcall #5 gssd_run
Created attachment 908050 [details] File: backtrace
Created attachment 908051 [details] File: cgroup
Created attachment 908052 [details] File: core_backtrace
Created attachment 908053 [details] File: dso_list
Created attachment 908054 [details] File: environ
Created attachment 908055 [details] File: exploitable
Created attachment 908056 [details] File: limits
Created attachment 908057 [details] File: maps
Created attachment 908058 [details] File: open_fds
Created attachment 908059 [details] File: proc_pid_status
Created attachment 908060 [details] File: var_log_messages
How reproducible is this? Also are you setting a realm with the -R flag?
(In reply to Steve Dickson from comment #12) > How reproducible is this? 100%. It happens without fail on every login[1] > Also are you setting a realm with the -R flag? I'm just letting systemd start it as it sees fit both on it's own during system boot, before it crashes and afterward when I restart it. It doesn't look like systemd is setting the realm flag: $ ps -efwww | grep gss root 1371 1 0 Jun25 ? 00:00:00 /usr/sbin/rpc.svcgssd root 2763 1 0 Jun25 ? 00:00:07 /usr/sbin/rpc.gssd [1] at least every login after a system restart -- but I only typically ever login once between reboots and never log out until the next reboot so I don't know what the behaviour is if I just log out and back in again
(In reply to Brian J. Murrell from comment #13) > (In reply to Steve Dickson from comment #12) > > How reproducible is this? > > 100%. It happens without fail on every login[1] > > > Also are you setting a realm with the -R flag? > > I'm just letting systemd start it as it sees fit both on it's own during > system boot, before it crashes and afterward when I restart it. It doesn't > look like systemd is setting the realm flag: > > $ ps -efwww | grep gss > root 1371 1 0 Jun25 ? 00:00:00 /usr/sbin/rpc.svcgssd > root 2763 1 0 Jun25 ? 00:00:07 /usr/sbin/rpc.gssd > > > [1] at least every login after a system restart -- but I only typically ever > login once between reboots and never log out until the next reboot so I > don't know what the behaviour is if I just log out and back in again I think you might be running into this problem commit 25e83c2270b2d2966c992885faed0b79be09f474 Author: Jeff Layton <jlayton> Date: Thu May 1 11:15:16 2014 -0400 mountd: fix segfault in add_name with newer gcc compilers but in rpc.gssd... This is fix in the latest nfs-utils release nfs-utils-1.3.0-2.1.fc20 Would it be possible to upgrade to that package?
I have upgraded. I guess we'll see if it's fixed on next reboot.
Still a problem. It crashed on log in to my completely up-to-date system this morning. I had to: # systemctl restart rpcgssd after logging is as systemctl status was reporting: # systemctl status rpcgssd nfs-secure.service - Secure NFS Loaded: loaded (/usr/lib/systemd/system/nfs-secure.service; enabled) Active: active (running) since Sun 2014-08-10 10:45:39 EDT; 4min 32s ago Process: 1251 ExecStart=/usr/sbin/rpc.gssd $RPCGSSDARGS (code=exited, status=0/SUCCESS) Main PID: 1310 (rpc.gssd) CGroup: /system.slice/nfs-secure.service └─1310 /usr/sbin/rpc.gssd Aug 10 10:45:39 pc.interlinx.bc.ca systemd[1]: Started Secure NFS. Aug 10 10:46:48 pc.interlinx.bc.ca rpc.gssd[1310]: WARNING: forked child was killed with signal 11 Here's my version of nfs-utils: $ rpm -q nfs-utils nfs-utils-1.3.0-2.1.fc20.x86_64
Just had to log out/in again today and got one of these. abrt's not sending any more since it's detecting the dupe.
Still seeing this on up-to-date (as of right now) Fedora 20.
Is there anything further that we can do about this given that it happens 100% of every single time I reboot/login and is quite annoying.
(In reply to Brian J. Murrell from comment #19) > Is there anything further that we can do about this given that it happens > 100% of every single time I reboot/login and is quite annoying. I can image this is pretty annoying... but unfortunately I just can seems to reproduce this... by chance are you using an AD vs. a KDC for your kerberos?
Also how many realms are defined in you krb5.conf file? It could be something in your configuration that causing this...
Could we get better backtraces with "debuginfo-install nfs-utils"? (Following http://fedoraproject.org/wiki/StackTraces#debuginfo). find_keytab_entry() is unfortunately a rather long function with at least 4 strcmp's in it, we'd have a better chance if we could narrow this down a little.
(In reply to J. Bruce Fields from comment #22) > Could we get better backtraces with "debuginfo-install nfs-utils"? I wish we could, but this is abrt doing the backtracing and it seems to have completely fallen down at managing it's /var/cache/abrt-di cache and not freeing up space in it to allow for newer/more relevant debuginfo packages to be installed. This is bug #811978. > find_keytab_entry() is unfortunately a rather long function with at least 4 > strcmp's in it, we'd have a better chance if we could narrow this down a > little. Agreed. It would probably be useful all around, for many bug reports even if bug #811978 was fixed so that everyone with a full /var/cache/abrt-di isn't sending useless bug reports. Surely this has enough value internally to give bug #811978 some priority, yes? (In reply to Steve Dickson from comment #21) > Also how many realms are defined in you krb5.conf file? It could > be something in your configuration that causing this... $ cat /etc/krb5.conf [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] dns_lookup_realm = true dns_lookup_kdc = true allow_weak_crypto = true ticket_lifetime = 24h renew_lifetime = 7d forwardable = true rdns = true # default_realm = EXAMPLE.COM # default_ccache_name = KEYRING:persistent:%{uid} default_ccache_name = FILE:/tmp/krb5cc_%{uid} [realms] # EXAMPLE.COM = { # kdc = kerberos.example.com # admin_server = kerberos.example.com # } [domain_realm] # .example.com = EXAMPLE.COM # example.com = EXAMPLE.COM With the relevant realms (just one though) defined in DNS here.
I think I have the backtrace you are looking for: (gdb) where #0 __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:210 #1 0x00007f60e0f6d36d in find_keytab_entry (context=0x7f60e2c7d920, kt=0x7f60e2c77a60, tgtname=tgtname@entry=0x7f60e2c768a0 "linux.interlinx.bc.ca", kte=kte@entry=0x7fff49345840, svcnames=svcnames@entry=0x7fff49345810) at krb5_util.c:866 #2 0x00007f60e0f6e09d in gssd_refresh_krb5_machine_credential ( hostname=0x7f60e2c768a0 "linux.interlinx.bc.ca", ple=ple@entry=0x0, service=service@entry=0x7f60e2c7d110 "*") at krb5_util.c:1284 #3 0x00007f60e0f6afaf in process_krb5_upcall (clp=clp@entry=0x7f60e2c777c0, uid=0, fd=16, tgtname=tgtname@entry=0x0, service=service@entry=0x7f60e2c7d110 "*") at gssd_proc.c:1130 #4 0x00007f60e0f6c093 in handle_gssd_upcall (clp=clp@entry=0x7f60e2c777c0) at gssd_proc.c:1352 #5 0x00007f60e0f69b69 in scan_poll_results (ret=1) at gssd_main_loop.c:85 #6 gssd_poll (nfds=<optimized out>, fds=<optimized out>) at gssd_main_loop.c:197 #7 gssd_run () at gssd_main_loop.c:253 #8 0x00007f60e0f6860a in main (argc=<optimized out>, argv=0x7fff49345d38) at gssd.c:212
Thanks! I'm assuming steved will look at it, but meanwhile I'm curious, for future reference--how did you work around the abrt problem?
(In reply to J. Bruce Fields from comment #25) > I'm curious, for > future reference--how did you work around the abrt problem? I installed the required debuginfo packages on the main system using debuginfo-install and then went hunting (i.e. brute force search) for the abrt report in /var/tmp/abrt and then used gdb on the corefile in the found dir.
(In reply to Brian J. Murrell from comment #23) > (In reply to Steve Dickson from comment #21) > > Also how many realms are defined in you krb5.conf file? It could > > be something in your configuration that causing this... > > $ cat /etc/krb5.conf > [logging] > default = FILE:/var/log/krb5libs.log > kdc = FILE:/var/log/krb5kdc.log > admin_server = FILE:/var/log/kadmind.log > > [libdefaults] > dns_lookup_realm = true > dns_lookup_kdc = true > allow_weak_crypto = true > ticket_lifetime = 24h > renew_lifetime = 7d > forwardable = true > rdns = true > # default_realm = EXAMPLE.COM > # default_ccache_name = KEYRING:persistent:%{uid} > default_ccache_name = FILE:/tmp/krb5cc_%{uid} > > [realms] > # EXAMPLE.COM = { > # kdc = kerberos.example.com > # admin_server = kerberos.example.com > # } > > [domain_realm] > # .example.com = EXAMPLE.COM > # example.com = EXAMPLE.COM > > With the relevant realms (just one though) defined in DNS here. I bet this is the problem... Just to be clear how have SRV records that looks similar to _kerberos._udp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com _kerberos._tcp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com _kpasswd._udp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com _kpasswd._tcp.realm.redhat.com IN SRV 0 0 88 kdc1.realm.redhat.com If so let me set somthing up and give it a try...
I only have the _udp... records: _kerberos._udp.interlinx.bc.ca. 60 IN SRV 0 0 88 linux.interlinx.bc.ca. _kpasswd._udp.interlinx.bc.ca. 60 IN SRV 0 0 464 linux.interlinx.bc.ca. The Kerberos V5 System Administrator's Guide, in section 4.2 Hostnames for KDCs says: _kerberos._tcp This is for contacting any KDC by TCP. The MIT KDC by default will not listen on any TCP ports, so unless you've changed the configuration or you're running another KDC implementation, you should leave this unspecified. If you do enable TCP support, normally you should use port 88.
(In reply to Brian J. Murrell from comment #28) > I only have the _udp... records: > > _kerberos._udp.interlinx.bc.ca. 60 IN SRV 0 0 88 linux.interlinx.bc.ca. > _kpasswd._udp.interlinx.bc.ca. 60 IN SRV 0 0 464 linux.interlinx.bc.ca. I went a head and reconfigured my DNS server to hand out the realms and I still am not seeing this crash.... There has to be something in your configuration that making rpc.gssd tip over... I just don't what it is...
Might be best to try and analyze the core. The strcmp in question is here, I think: if (strcmp (realm, preferred_realm) != 0) { realm = preferred_realm; /* resetting the realmnames index */ i = -1; } ...which suggests that either "realm" or "preferred_realm" is bogus. Brian, perhaps you can do a bit of poking with gdb to figure out which it is?
(In reply to Jeff Layton from comment #30) > Might be best to try and analyze the core. The strcmp in question is here, I > think: > > if (strcmp (realm, preferred_realm) != 0) { > realm = preferred_realm; > /* resetting the realmnames index */ > i = -1; > } > > > ...which suggests that either "realm" or "preferred_realm" is bogus. Brian, > perhaps you can do a bit of poking with gdb to figure out which it is? Right... I was thinking the realm(s) coming back from the DNS query were bad... hoping the bug was in the kerberos code ;-) But things work just fine in my world...
Damn. I don't think I have that core file around any more. I will have to wait until an opportune time to reboot and create a new one. Hopefully shortly.
OK. Here it is: (gdb) frame 1 #1 0x00007f22dc34a36d in find_keytab_entry (context=0x7f22dc88ca80, kt=0x7f22dc885b30, tgtname=tgtname@entry=0x7f22dc88a9d0 "linux.interlinx.bc.ca", kte=kte@entry=0x7fff4b1b5b20, svcnames=svcnames@entry=0x7fff4b1b5af0) at krb5_util.c:866 866 if (strcmp (realm, preferred_realm) != 0) { (gdb) print realm $1 = 0x7f22dc88d110 "" (gdb) print preferred_realm $2 = 0x0
Ok, looks like a fairly straightforward bug. "preferred_realm" is NULL, most likely because default_realm in your krb5.conf is commented out. If you uncomment that and set it to something sane, then it will probably work around the bug. To fix the segfault, I think that if statement should read: if (preferred_realm && strcmp (realm, preferred_realm) != 0)
(In reply to Brian J. Murrell from comment #33) > OK. Here it is: > > (gdb) frame 1 > #1 0x00007f22dc34a36d in find_keytab_entry (context=0x7f22dc88ca80, > kt=0x7f22dc885b30, > tgtname=tgtname@entry=0x7f22dc88a9d0 "linux.interlinx.bc.ca", > kte=kte@entry=0x7fff4b1b5b20, svcnames=svcnames@entry=0x7fff4b1b5af0) > at krb5_util.c:866 > 866 if (strcmp (realm, preferred_realm) != 0) { > (gdb) print realm > $1 = 0x7f22dc88d110 "" > (gdb) print preferred_realm > $2 = 0x0 Yep... that is the needed info... thanks!
(In reply to Jeff Layton from comment #34) > If you > uncomment that and set it to something sane, then it will probably work > around the bug. But which is preferred, the krb5.conf:default_realm or what's in my DNS SRV records? Ultimately I don't want to usurp what my network operators are telling me.
AIUI, the SRV records just tell you which host serves the given service. That doesn't tell you anything about the realm.
Oh and fwiw, I sent steved a patch for this yesterday (I meant to cc Brian, but dropped the ball): http://article.gmane.org/gmane.linux.nfs/66217 ...if you can test that patch then that would be nice too.
Sorry, not SRV records but realm TXT record mappings such as when you have: [libdefaults] dns_lookup_realm = true and have a _kerberos.$domain TXT record as such: _kerberos.example.com. 60 IN TXT "EXAMPLE.COM" So to be clear, I don't mind having a "default_realm = EXAMPLE.COM" in my krb5.conf so long as it's secondary to what DNS is telling kerberos.
(In reply to Jeff Layton from comment #38) > Oh and fwiw, I sent steved a patch for this yesterday (I meant to cc Brian, > but dropped the ball): > > http://article.gmane.org/gmane.linux.nfs/66217 Ahhh. Nice. > ...if you can test that patch then that would be nice too. Isn't there a way for people with access to do "scratch builds"? I have not looked into what I would need to be able to do one of those, but if you already knew and had the access and it was easy enough, I'd be happy to install a scratch build. :-)
Ahh ok. I see what you're saying now. TBH, my experience with multiple realms is pretty limited, so I can't really speak authoritatively here. My suggestion to add a default_realm was simply to work around the segfault -- not to provide a long-term solution to your problem. The real fix for that problem is to patch gssd. Based on how the gssd code is structured, I imagine you could put a bogus realm name in there and it should end up falling back to finding other realms, but I'd recommend testing that theory to make sure.
nfs-utils-1.3.0-2.2.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/nfs-utils-1.3.0-2.2.fc20
nfs-utils-1.3.0-7.0.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/nfs-utils-1.3.0-7.0.fc21
Package nfs-utils-1.3.0-7.0.fc21: * should fix your issue, * was pushed to the Fedora 21 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing nfs-utils-1.3.0-7.0.fc21' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-10842/nfs-utils-1.3.0-7.0.fc21 then log in and leave karma (feedback).
nfs-utils-1.3.0-2.2.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
nfs-utils-1.3.0-7.0.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
Rebooted this morning and got logged in without reproducing this issue. Thanks much!