Bug 1048661
Summary: | kernel-3.12.6-300 nfs v4 fails to mount and hangs | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Andy Wang <dopey> |
Component: | nfs-utils | Assignee: | Steve Dickson <steved> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 20 | CC: | bfields, dopey, gansalmon, itamar, jcmcken, jonathan, kernel-maint, madhu.chinakonda, nfs-maint, rmainz, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | nfs-utils-1.3.0-2.2.fc20 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-10-31 14:01:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andy Wang
2014-01-06 03:52:27 UTC
Stopping rpc.gssd will likely also work around the issue... There were some recent fixes to gssd that made it always respond to upcalls instead of hanging. What nfs-utils version are you running? $ rpm -q nfs-utils nfs-utils-1.2.8-6.0.fc20.x86_64 I don't actually need rpc.gssd for functionality. What I had come across was at some point rpc.gssd needed to be running to avoid a 15 second delay and the RPC: AUTH_GSS upcall timed out message. I just stopped nfs-secure and you're right, the problem went away. That's a good enough solution for me, but I'm still curious what's going on and noticed that 2 other systems I have do not have the same problem. It's only occurring on systems where rpc.gssd is starting during boot before the network is up (for example my laptop where wifi hasn't connected yet). On these systems if I restart nfs-secure after connecting to the network the problem also goes away. Ok, so that's a workaround for now... The problem I think is in rpc.gssd. I think we just need to pull in some of the more recent fixes to it since they affect it. Could you test the package here and see if it helps? http://koji.fedoraproject.org/koji/buildinfo?buildID=479747 1.2.9-1.0 has slightly different behavior. Initial mount has a slight lag (a few seconds) and the unable to resolve message occurs once but doesn't repeat and it doesn't hang. Same thing though, only occurs after an initial boot. If I restart nfs-secure after establishing a wifi connection it works without any delay. That's pretty much expected behavior. The kernel code now tries to establish krb5 creds for NFSv4 sessions now regardless of what sort of security the initial mount uses. If rpc.gssd isn't running, it'll skip trying to upcall for those creds. In your case however, rpc.gssd is running but it's unable to reverse-resolve the address, so it gives up soon after. If you aren't using krb5 and want to get rid of the delay you can just disable rpc.gssd since it's not buying you anything anyway: # systemctl disable nfs-secure.service I think at this point we just need to wait for Steve to push a more recent nfs-utils package to F20. Yeah, I've disabled nfs-secure for now and that's workable for me. Guess I'm more curious than anything else as to why rpc.gssd is trying to reverse resolve using a dns on 127.0.0.1 when it's started without a network connection. If I simply restart rpc.gssd after networking is established it seems to do the right thing. What is it about rpc.gssd that appears to not be able to adjust to that without a restart? Oh, and why did a kernel downgrade "resolve" it too? That I'm not sure of. rpc.gssd just does a getnameinfo(), so it may be that glibc tries to query localhost if resolv.conf isn't configured. As to why it would continue querying that even after it's configured, I'm not sure. Probably bears some investigation if you're willing to do so. As far as why the downgrade helps, I'm again not clear. The only difference I see between the two kernels is some patches that add a dummy "info" rpc_pipefs file that gssd uses. That shouldn't really affect how this works, but maybe I'm overlooking some subtlety in how gssd works with a missing resolv.conf. In any case, I'm fairly sure the problem is in gssd so I think the fix needs to be done there. I'd be more than happy to provide whatever details to help investigate this. What additional data would be helpful? I believe this have been fixed in nfs-utils-1.3.0-2.2.fc20 I know the product is Fedora here, but would anyone be able to say whether this issue (or something similar) exists in EL7? I'm encountering similar symptoms to those described here, i.e.: * rpc.gssd is running (nfs-secure.service started) * Executing a plain sec=sys nfs4 mount hangs indefinitely * Stopping rpc.gssd / nfs-secure.service allows me to mount successfully In my case, I am actually using sec=krb5p for certain mounts (which appear to work). However I also have an "anonymous" read-only mount with sec=sys, and it's this mount that hangs. When this occurs, I also get the following in dmesg: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO ...and rpc.gssd errors in syslog, e.g.: WARNING: Failed to create krb5 context for user with uid 0 for server nfs@<host> WARNING: Failed to create machine krb5 context with credentials cache FILE:/tmp/krb5ccmachine_<REALM> for server <host> WARNING: Failed to create machine krb5 context with any credentials cache for server <host> doing error downcall Failed to write error downcall! Here's some additional info: NFS client: kernel: kernel-3.10.0-229.el7.x86_64 (3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux) nfs-utils: nfs-utils-1.3.0-0.8.el7.x86_64 NFS server: kernel: kernel-2.6.18-407.el5 (2.6.18-407.el5 #1 SMP Wed Nov 11 08:39:12 CST 2015 x86_64 x86_64 x86_64 GNU/Linux) nfs-utils: nfs-utils-1.0.9-71.el5_11 (In reply to Jon McKenzie from comment #11) > I know the product is Fedora here, but would anyone be able to say whether > this issue (or something similar) exists in EL7? > > I'm encountering similar symptoms to those described here, i.e.: > > * rpc.gssd is running (nfs-secure.service started) > * Executing a plain sec=sys nfs4 mount hangs indefinitely > * Stopping rpc.gssd / nfs-secure.service allows me to mount successfully Yes as Jeff says in Comment 5, the RHEL7 kernels also try to establish some krb5 creds for NFSv4 sessions regardless security flavor being used. So if rpc.gssd is running the kernel will make an upcall. > WARNING: Failed to create krb5 context for user with uid 0 for server > nfs@<host> > WARNING: Failed to create machine krb5 context with credentials cache >FILE:/tmp/krb5ccmachine_<REALM> for server <host> > WARNING: Failed to create machine krb5 context with any credentials cache for > server <host> > doing error downcall > Failed to write error downcall! The mount should fail since this upcall fails. It appears you machine is not known by the KDC or AD. |