Bug 812936
Summary: | RPC: AUTH_GSS upcall timed out messages | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Orion Poplawski <orion> |
Component: | nfs-utils | Assignee: | Steve Dickson <steved> |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4 | CC: | bfields, dpal, igeorgex, jgalipea, marianne, orion, redhat-bugs, sassmann |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-06 10:36:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Orion Poplawski
2012-04-16 15:43:28 UTC
Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname = nfs/pyramid.cora.nwra.com.COM Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695] nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: doing downcall Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sending null reply Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: finished handling null request Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: entering poll Apr 16 10:50:28 alexandria2 kernel: RPC: AUTH_GSS upcall timed out. Apr 16 10:50:28 alexandria2 kernel: Please check user daemon is running. Apr 16 10:50:28 alexandria2 rpc.gssd[27170]: handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfsd4_cb/clnt16c) Apr 16 10:50:28 alexandria2 rpc.gssd[27170]: handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfsd4_cb/clnt16c) Apr 16 10:50:30 alexandria2 rpc.gssd[27170]: WARNING: Failed to create machine krb5 context with any credentials cache for server pyramid.cora.nwra.com Apr 16 10:50:30 alexandria2 rpc.gssd[27170]: doing error downcall Looks like the nslcd messages often occur with the timed out messages, but not always: Apr 15 17:06:09 alexandria2 nslcd[2013]: [e3eefb] nslcd_passwd_byname(nfs/pueo.cora.nwra.com): invalid user name Apr 15 17:06:09 alexandria2 kernel: RPC: AUTH_GSS upcall timed out. Apr 15 17:06:09 alexandria2 kernel: Please check user daemon is running. Apr 15 17:07:13 alexandria2 rpc.mountd[3154]: authenticated mount request from hobbes.cora.nwra.com:956 for /export/cora6 (/export/cora6) Apr 15 17:11:40 alexandria2 kernel: RPC: AUTH_GSS upcall timed out. Apr 15 17:11:40 alexandria2 kernel: Please check user daemon is running. Apr 15 17:23:04 alexandria2 rpc.mountd[3154]: authenticated unmount request from hobbes.cora.nwra.com:947 for /export/cora6 (/export/cora6) Apr 15 17:27:22 alexandria2 kernel: RPC: AUTH_GSS upcall timed out. Apr 15 17:27:22 alexandria2 kernel: Please check user daemon is running. Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. (In reply to comment #2) > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname = > nfs/pyramid.cora.nwra.com.COM > Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695] > nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name This is not good... why would rpc.gssd passing LDAP a principal... Well, in svcgssd_proc.c:get_ids() it calls nfs4_gss_princ_to_ids(). Presumably many times this is an actual user principal rather than a machine principal? Anyways, still seeing a lot of these messages. Example: May 31 11:49:45 alexandria2 rpc.svcgssd[18388]: sname = nfs/hawk.cora.nwra.com.COM May 31 11:49:45 alexandria2 nslcd[2015]: [d70bbe] nslcd_passwd_byname(nfs/hawk.cora.nwra.com): invalid user name May 31 11:49:45 alexandria2 rpc.svcgssd[18388]: doing downcall May 31 11:58:18 alexandria2 rpc.svcgssd[18388]: sname = apache.COM May 31 11:58:18 alexandria2 rpc.svcgssd[18388]: doing downcall Although perhaps nfs4_gss_princ_to_ids() shouldn't be accessing LDAP? Perhaps not for principals of the form */* ? In nss_gss_princ_to_ids() there is this comment: /* XXX: this should call something like getgssauthnam instead? */ pw = nss_getpwnam(princ, NULL, &err); So since we use ldap in nsswitch.conf, it's going to call ldap. I believe that what svcgssd *should* be doing is responding with a credential that gives the user anonymous access, and includes the string principal name. Oh, right, see the comment in utils/gssd/svcgssd_proc.c:get_ids(). If libnfsidmapd returns ENOENT, svcgssd will do the right thing (pass down anonymous id's, etc.). If you return any other error, it will error out the context initiation. I think svcgssd is in the right here, and libnfsidmapd should be fixed? Not sure what the correct fix is there.... What error is actually being returned? Is nslcd_passwd_byname() returning EINVAL, and is that being passed back to svcgssd? If nss forbids usernames with /'s, maybe we should catch that in the caller and return -ENOENT. Or maybe we should turn EINVALs into ENOENTs. Well, I've since switched to sssd but I'm still seeing these messages. I'm afraid though that I'm not entirely sure what more information is needed from me. I'm not sure how to trace the gssd processes further. this bug seems relate to another behave I discovered during ipa client test, please double check. If they are produce of same root cause, please close it as well. https://bugzilla.redhat.com/show_bug.cgi?id=1023059 -- Yi Zhang (IPA QA) (In reply to Orion Poplawski from comment #2) > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: leaving poll > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: handling null request > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sname = > nfs/pyramid.cora.nwra.com.COM > Apr 16 10:50:28 alexandria2 nslcd[2013]: [cb5695] > nslcd_passwd_byname(nfs/pyramid.cora.nwra.com): invalid user name This looks to be the problem ^^^^ > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: doing downcall > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: sending null reply > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: finished handling null > request > Apr 16 10:50:28 alexandria2 rpc.svcgssd[27128]: entering poll rpc.svcgssd answers the upcall with a NULL reply^^^^ > Apr 16 10:50:28 alexandria2 kernel: RPC: AUTH_GSS upcall timed out. > Apr 16 10:50:28 alexandria2 kernel: Please check user daemon is running. So this is a loopback secure mount... hmm.... I'm thinking this is just bad error handling on a secure loopback mount... Those are iffy at best... Very racy... and I don't mean X-raided! ;-) Fix the "invalid user name" problem. If the upcall time still happens then its a problem... otherwise its not... Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/ |