Bug 812936 - RPC: AUTH_GSS upcall timed out messages
RPC: AUTH_GSS upcall timed out messages
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: nfs-utils (Show other bugs)
6.4
All Linux
unspecified Severity low
: rc
: ---
Assigned To: Steve Dickson
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-16 11:43 EDT by Orion Poplawski
Modified: 2017-12-06 05:36 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-06 05:36:28 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Orion Poplawski 2012-04-16 11:43:28 EDT
Description of problem:

I'm seeing these periodically on our servers:

Apr 15 13:11:32 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 13:11:32 alexandria2 kernel: Please check user daemon is running.

gss daemons are running:

root     27128     1  0 09:36 ?        00:00:00 rpc.svcgssd -v
root     27170     1  0 09:36 ?        00:00:00 rpc.gssd -v

Version-Release number of selected component (if applicable):
2.6.32-220.7.1.el6.x86_64
nfs-utils-1.2.3-15.el6.x86_64

How reproducible:
Seeing about 6-14/day.

I've just bumped the verbosity a bit to see if I can see why it's timing out.
Comment 2 Orion Poplawski 2012-04-16 13:28:06 EDT
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname = nfs/pyramid.cora.nwra.com@CORA.NWRA.COM
Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695] nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: doing downcall
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sending null reply
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: finished handling null request
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: entering poll
Apr 16 10:50:28 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 16 10:50:28 alexandria2 kernel: Please check user daemon is running.
Apr 16 10:50:28 alexandria2 rpc.gssd[27170]: handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfsd4_cb/clnt16c)
Apr 16 10:50:28 alexandria2 rpc.gssd[27170]: handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfsd4_cb/clnt16c)
Apr 16 10:50:30 alexandria2 rpc.gssd[27170]: WARNING: Failed to create machine krb5 context with any credentials cache for server pyramid.cora.nwra.com
Apr 16 10:50:30 alexandria2 rpc.gssd[27170]: doing error downcall

Looks like the nslcd messages often occur with the timed out messages, but not always:

Apr 15 17:06:09 alexandria2 nslcd[2013]: [e3eefb] nslcd_passwd_byname(nfs/pueo.cora.nwra.com): invalid user name
Apr 15 17:06:09 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 17:06:09 alexandria2 kernel: Please check user daemon is running.
Apr 15 17:07:13 alexandria2 rpc.mountd[3154]: authenticated mount request from hobbes.cora.nwra.com:956 for /export/cora6 (/export/cora6)
Apr 15 17:11:40 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 17:11:40 alexandria2 kernel: Please check user daemon is running.
Apr 15 17:23:04 alexandria2 rpc.mountd[3154]: authenticated unmount request from hobbes.cora.nwra.com:947 for /export/cora6 (/export/cora6)
Apr 15 17:27:22 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 17:27:22 alexandria2 kernel: Please check user daemon is running.
Comment 3 RHEL Product and Program Management 2012-05-03 01:14:49 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 4 Steve Dickson 2012-05-21 10:03:57 EDT
(In reply to comment #2)
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname =
> nfs/pyramid.cora.nwra.com@CORA.NWRA.COM
> Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695]
> nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name
This is not good... why would rpc.gssd passing LDAP a principal...
Comment 5 Orion Poplawski 2012-05-31 14:04:29 EDT
Well, in svcgssd_proc.c:get_ids() it calls nfs4_gss_princ_to_ids().  Presumably many times this is an actual user principal rather than a machine principal?  Anyways, still seeing a lot of these messages.
Comment 6 Orion Poplawski 2012-05-31 14:07:23 EDT
Example:

May 31 11:49:45 alexandria2 rpc.svcgssd[18388]: sname = nfs/hawk.cora.nwra.com@CORA.NWRA.COM
May 31 11:49:45 alexandria2 nslcd[2015]: [d70bbe] nslcd_passwd_byname(nfs/hawk.cora.nwra.com): invalid user name
May 31 11:49:45 alexandria2 rpc.svcgssd[18388]: doing downcall

May 31 11:58:18 alexandria2 rpc.svcgssd[18388]: sname = apache@CORA.NWRA.COM
May 31 11:58:18 alexandria2 rpc.svcgssd[18388]: doing downcall

Although perhaps nfs4_gss_princ_to_ids() shouldn't be accessing LDAP?  Perhaps not for principals of the form */* ?
Comment 7 Orion Poplawski 2012-05-31 14:18:36 EDT
In nss_gss_princ_to_ids() there is this comment:

        /* XXX: this should call something like getgssauthnam instead? */
        pw = nss_getpwnam(princ, NULL, &err);

So since we use ldap in nsswitch.conf, it's going to call ldap.
Comment 8 J. Bruce Fields 2012-09-19 14:13:45 EDT
I believe that what svcgssd *should* be doing is responding with a credential that gives the user anonymous access, and includes the string principal name.
Comment 9 J. Bruce Fields 2012-09-19 14:32:14 EDT
Oh, right, see the comment in utils/gssd/svcgssd_proc.c:get_ids().  If libnfsidmapd returns ENOENT, svcgssd will do the right thing (pass down anonymous id's, etc.).  If you return any other error, it will error out the context initiation.

I think svcgssd is in the right here, and libnfsidmapd should be fixed?

Not sure what the correct fix is there....  What error is actually being returned?  Is nslcd_passwd_byname() returning EINVAL, and is that being passed back to svcgssd?  If nss forbids usernames with /'s, maybe we should catch that in the caller and return -ENOENT.  Or maybe we should turn EINVALs into ENOENTs.
Comment 10 Orion Poplawski 2013-08-13 18:14:46 EDT
Well, I've since switched to sssd but I'm still seeing these messages.  I'm afraid though that I'm not entirely sure what more information is needed from me.  I'm not sure how to trace the gssd processes further.
Comment 11 Yi Zhang 2013-10-24 11:38:58 EDT
this bug seems relate to another behave I discovered during ipa client test, please double check. If they are produce of same root cause, please close it as well.
https://bugzilla.redhat.com/show_bug.cgi?id=1023059
-- Yi Zhang (IPA QA)
Comment 12 Steve Dickson 2014-04-27 15:49:14 EDT
(In reply to Orion Poplawski from comment #2)
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname =
> nfs/pyramid.cora.nwra.com@CORA.NWRA.COM
> Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695]
> nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name
This looks to be the problem ^^^^

> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: doing downcall
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sending null reply
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: finished handling null
> request
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: entering poll
rpc.svcgssd answers the upcall with a NULL reply^^^^

> Apr 16 10:50:28 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
> Apr 16 10:50:28 alexandria2 kernel: Please check user daemon is running.
So this is a loopback secure mount... hmm....

I'm thinking this is just bad error handling on a secure loopback
mount... Those are iffy at best... Very racy... and I don't mean
X-raided! ;-) 

Fix the  "invalid user name" problem. If the upcall time still happens
then its a problem... otherwise its not...
Comment 13 Jan Kurik 2017-12-06 05:36:28 EST
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.