Description of problem: I have a synology NAS configured for nfs v4 and when I try to mount (either via autofs or explicitly mounting it) mount.nfs hangs and fails to mount the file system. Version-Release number of selected component (if applicable): This occurs only with kernel-3.12.6-300.fc20.x86_64. Dropping back to kernel-3.12.5-302.fc20.x86_64 makes the problem go away as does disabling nfs v4 (revert to v3) on the synology NAS How reproducible: Always Steps to Reproduce: 1. Try to mount NFS v4 from synology NAS Actual results: mount.nfs hangs Expected results: mount succeeds Additional info: One thing I noticed when this occurs is: Jan 04 22:35:47 host kernel: Key type dns_resolver registered Jan 04 22:35:48 host kernel: NFS: Registering the id_resolver key type Jan 04 22:35:48 host kernel: Key type id_resolver registered Jan 04 22:35:48 host kernel: Key type id_legacy registered Jan 04 22:35:48 host rpc.gssd[798]: ERROR: unable to resolve 10.x.x.x to hostname: Name or service not known Jan 04 22:35:48 host rpc.gssd[798]: ERROR: failed to read service info The rpc.gssd errors repeat a handful of times. wireshark captures show that the reverse DNS resolution is attempted to a DNS server on 127.0.0.1. I don't understand that at all considering I have no local DNS server configured in /etc/resolv.conf
Stopping rpc.gssd will likely also work around the issue... There were some recent fixes to gssd that made it always respond to upcalls instead of hanging. What nfs-utils version are you running?
$ rpm -q nfs-utils nfs-utils-1.2.8-6.0.fc20.x86_64 I don't actually need rpc.gssd for functionality. What I had come across was at some point rpc.gssd needed to be running to avoid a 15 second delay and the RPC: AUTH_GSS upcall timed out message. I just stopped nfs-secure and you're right, the problem went away. That's a good enough solution for me, but I'm still curious what's going on and noticed that 2 other systems I have do not have the same problem. It's only occurring on systems where rpc.gssd is starting during boot before the network is up (for example my laptop where wifi hasn't connected yet). On these systems if I restart nfs-secure after connecting to the network the problem also goes away.
Ok, so that's a workaround for now... The problem I think is in rpc.gssd. I think we just need to pull in some of the more recent fixes to it since they affect it. Could you test the package here and see if it helps? http://koji.fedoraproject.org/koji/buildinfo?buildID=479747
1.2.9-1.0 has slightly different behavior. Initial mount has a slight lag (a few seconds) and the unable to resolve message occurs once but doesn't repeat and it doesn't hang. Same thing though, only occurs after an initial boot. If I restart nfs-secure after establishing a wifi connection it works without any delay.
That's pretty much expected behavior. The kernel code now tries to establish krb5 creds for NFSv4 sessions now regardless of what sort of security the initial mount uses. If rpc.gssd isn't running, it'll skip trying to upcall for those creds. In your case however, rpc.gssd is running but it's unable to reverse-resolve the address, so it gives up soon after. If you aren't using krb5 and want to get rid of the delay you can just disable rpc.gssd since it's not buying you anything anyway: # systemctl disable nfs-secure.service I think at this point we just need to wait for Steve to push a more recent nfs-utils package to F20.
Yeah, I've disabled nfs-secure for now and that's workable for me. Guess I'm more curious than anything else as to why rpc.gssd is trying to reverse resolve using a dns on 127.0.0.1 when it's started without a network connection. If I simply restart rpc.gssd after networking is established it seems to do the right thing. What is it about rpc.gssd that appears to not be able to adjust to that without a restart?
Oh, and why did a kernel downgrade "resolve" it too?
That I'm not sure of. rpc.gssd just does a getnameinfo(), so it may be that glibc tries to query localhost if resolv.conf isn't configured. As to why it would continue querying that even after it's configured, I'm not sure. Probably bears some investigation if you're willing to do so. As far as why the downgrade helps, I'm again not clear. The only difference I see between the two kernels is some patches that add a dummy "info" rpc_pipefs file that gssd uses. That shouldn't really affect how this works, but maybe I'm overlooking some subtlety in how gssd works with a missing resolv.conf. In any case, I'm fairly sure the problem is in gssd so I think the fix needs to be done there.
I'd be more than happy to provide whatever details to help investigate this. What additional data would be helpful?
I believe this have been fixed in nfs-utils-1.3.0-2.2.fc20
I know the product is Fedora here, but would anyone be able to say whether this issue (or something similar) exists in EL7? I'm encountering similar symptoms to those described here, i.e.: * rpc.gssd is running (nfs-secure.service started) * Executing a plain sec=sys nfs4 mount hangs indefinitely * Stopping rpc.gssd / nfs-secure.service allows me to mount successfully In my case, I am actually using sec=krb5p for certain mounts (which appear to work). However I also have an "anonymous" read-only mount with sec=sys, and it's this mount that hangs. When this occurs, I also get the following in dmesg: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO ...and rpc.gssd errors in syslog, e.g.: WARNING: Failed to create krb5 context for user with uid 0 for server nfs@<host> WARNING: Failed to create machine krb5 context with credentials cache FILE:/tmp/krb5ccmachine_<REALM> for server <host> WARNING: Failed to create machine krb5 context with any credentials cache for server <host> doing error downcall Failed to write error downcall! Here's some additional info: NFS client: kernel: kernel-3.10.0-229.el7.x86_64 (3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux) nfs-utils: nfs-utils-1.3.0-0.8.el7.x86_64 NFS server: kernel: kernel-2.6.18-407.el5 (2.6.18-407.el5 #1 SMP Wed Nov 11 08:39:12 CST 2015 x86_64 x86_64 x86_64 GNU/Linux) nfs-utils: nfs-utils-1.0.9-71.el5_11
(In reply to Jon McKenzie from comment #11) > I know the product is Fedora here, but would anyone be able to say whether > this issue (or something similar) exists in EL7? > > I'm encountering similar symptoms to those described here, i.e.: > > * rpc.gssd is running (nfs-secure.service started) > * Executing a plain sec=sys nfs4 mount hangs indefinitely > * Stopping rpc.gssd / nfs-secure.service allows me to mount successfully Yes as Jeff says in Comment 5, the RHEL7 kernels also try to establish some krb5 creds for NFSv4 sessions regardless security flavor being used. So if rpc.gssd is running the kernel will make an upcall. > WARNING: Failed to create krb5 context for user with uid 0 for server > nfs@<host> > WARNING: Failed to create machine krb5 context with credentials cache >FILE:/tmp/krb5ccmachine_<REALM> for server <host> > WARNING: Failed to create machine krb5 context with any credentials cache for > server <host> > doing error downcall > Failed to write error downcall! The mount should fail since this upcall fails. It appears you machine is not known by the KDC or AD.