RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 812936 - RPC: AUTH_GSS upcall timed out messages
Summary: RPC: AUTH_GSS upcall timed out messages
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: nfs-utils
Version: 6.4
Hardware: All
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-16 15:43 UTC by Orion Poplawski
Modified: 2017-12-06 10:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-06 10:36:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Orion Poplawski 2012-04-16 15:43:28 UTC
Description of problem:

I'm seeing these periodically on our servers:

Apr 15 13:11:32 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 13:11:32 alexandria2 kernel: Please check user daemon is running.

gss daemons are running:

root     27128     1  0 09:36 ?        00:00:00 rpc.svcgssd -v
root     27170     1  0 09:36 ?        00:00:00 rpc.gssd -v

Version-Release number of selected component (if applicable):
2.6.32-220.7.1.el6.x86_64
nfs-utils-1.2.3-15.el6.x86_64

How reproducible:
Seeing about 6-14/day.

I've just bumped the verbosity a bit to see if I can see why it's timing out.

Comment 2 Orion Poplawski 2012-04-16 17:28:06 UTC
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname = nfs/pyramid.cora.nwra.com.COM
Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695] nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: doing downcall
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sending null reply
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: finished handling null request
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: entering poll
Apr 16 10:50:28 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 16 10:50:28 alexandria2 kernel: Please check user daemon is running.
Apr 16 10:50:28 alexandria2 rpc.gssd[27170]: handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfsd4_cb/clnt16c)
Apr 16 10:50:28 alexandria2 rpc.gssd[27170]: handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfsd4_cb/clnt16c)
Apr 16 10:50:30 alexandria2 rpc.gssd[27170]: WARNING: Failed to create machine krb5 context with any credentials cache for server pyramid.cora.nwra.com
Apr 16 10:50:30 alexandria2 rpc.gssd[27170]: doing error downcall

Looks like the nslcd messages often occur with the timed out messages, but not always:

Apr 15 17:06:09 alexandria2 nslcd[2013]: [e3eefb] nslcd_passwd_byname(nfs/pueo.cora.nwra.com): invalid user name
Apr 15 17:06:09 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 17:06:09 alexandria2 kernel: Please check user daemon is running.
Apr 15 17:07:13 alexandria2 rpc.mountd[3154]: authenticated mount request from hobbes.cora.nwra.com:956 for /export/cora6 (/export/cora6)
Apr 15 17:11:40 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 17:11:40 alexandria2 kernel: Please check user daemon is running.
Apr 15 17:23:04 alexandria2 rpc.mountd[3154]: authenticated unmount request from hobbes.cora.nwra.com:947 for /export/cora6 (/export/cora6)
Apr 15 17:27:22 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
Apr 15 17:27:22 alexandria2 kernel: Please check user daemon is running.

Comment 3 RHEL Program Management 2012-05-03 05:14:49 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Steve Dickson 2012-05-21 14:03:57 UTC
(In reply to comment #2)
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname =
> nfs/pyramid.cora.nwra.com.COM
> Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695]
> nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name
This is not good... why would rpc.gssd passing LDAP a principal...

Comment 5 Orion Poplawski 2012-05-31 18:04:29 UTC
Well, in svcgssd_proc.c:get_ids() it calls nfs4_gss_princ_to_ids().  Presumably many times this is an actual user principal rather than a machine principal?  Anyways, still seeing a lot of these messages.

Comment 6 Orion Poplawski 2012-05-31 18:07:23 UTC
Example:

May 31 11:49:45 alexandria2 rpc.svcgssd[18388]: sname = nfs/hawk.cora.nwra.com.COM
May 31 11:49:45 alexandria2 nslcd[2015]: [d70bbe] nslcd_passwd_byname(nfs/hawk.cora.nwra.com): invalid user name
May 31 11:49:45 alexandria2 rpc.svcgssd[18388]: doing downcall

May 31 11:58:18 alexandria2 rpc.svcgssd[18388]: sname = apache.COM
May 31 11:58:18 alexandria2 rpc.svcgssd[18388]: doing downcall

Although perhaps nfs4_gss_princ_to_ids() shouldn't be accessing LDAP?  Perhaps not for principals of the form */* ?

Comment 7 Orion Poplawski 2012-05-31 18:18:36 UTC
In nss_gss_princ_to_ids() there is this comment:

        /* XXX: this should call something like getgssauthnam instead? */
        pw = nss_getpwnam(princ, NULL, &err);

So since we use ldap in nsswitch.conf, it's going to call ldap.

Comment 8 J. Bruce Fields 2012-09-19 18:13:45 UTC
I believe that what svcgssd *should* be doing is responding with a credential that gives the user anonymous access, and includes the string principal name.

Comment 9 J. Bruce Fields 2012-09-19 18:32:14 UTC
Oh, right, see the comment in utils/gssd/svcgssd_proc.c:get_ids().  If libnfsidmapd returns ENOENT, svcgssd will do the right thing (pass down anonymous id's, etc.).  If you return any other error, it will error out the context initiation.

I think svcgssd is in the right here, and libnfsidmapd should be fixed?

Not sure what the correct fix is there....  What error is actually being returned?  Is nslcd_passwd_byname() returning EINVAL, and is that being passed back to svcgssd?  If nss forbids usernames with /'s, maybe we should catch that in the caller and return -ENOENT.  Or maybe we should turn EINVALs into ENOENTs.

Comment 10 Orion Poplawski 2013-08-13 22:14:46 UTC
Well, I've since switched to sssd but I'm still seeing these messages.  I'm afraid though that I'm not entirely sure what more information is needed from me.  I'm not sure how to trace the gssd processes further.

Comment 11 Yi Zhang 2013-10-24 15:38:58 UTC
this bug seems relate to another behave I discovered during ipa client test, please double check. If they are produce of same root cause, please close it as well.
https://bugzilla.redhat.com/show_bug.cgi?id=1023059
-- Yi Zhang (IPA QA)

Comment 12 Steve Dickson 2014-04-27 19:49:14 UTC
(In reply to Orion Poplawski from comment #2)
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname =
> nfs/pyramid.cora.nwra.com.COM
> Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695]
> nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name
This looks to be the problem ^^^^

> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: doing downcall
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sending null reply
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: finished handling null
> request
> Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: entering poll
rpc.svcgssd answers the upcall with a NULL reply^^^^

> Apr 16 10:50:28 alexandria2 kernel: RPC: AUTH_GSS upcall timed out.
> Apr 16 10:50:28 alexandria2 kernel: Please check user daemon is running.
So this is a loopback secure mount... hmm....

I'm thinking this is just bad error handling on a secure loopback
mount... Those are iffy at best... Very racy... and I don't mean
X-raided! ;-) 

Fix the  "invalid user name" problem. If the upcall time still happens
then its a problem... otherwise its not...

Comment 13 Jan Kurik 2017-12-06 10:36:28 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/


Note You need to log in before you can comment on or make changes to this bug.