Red Hat Bugzilla – Bug 964319
Upgrading to 3.9 kernel breaks NFS
Last modified: 2013-06-13 15:30:00 EDT
Description of problem:
This is a complex one. I have noticed that I could not access the shares I have configured in openelec. Here is the exports file on the host:
I could mount the shares via command line:
# mount -t nfs 192.168.0.10:/media/realcrypt1/tv tvshows
Yet, they would not work when XBMC ui would be used. Program's log said:
19:45:20 T:2941772896 DEBUG: SECTION:LoadDLL(libnfs.so.1)
19:45:20 T:2941772896 DEBUG: Loading: libnfs.so.1
19:45:28 T:2941772896 ERROR: NFS: Failed to mount nfs share: /media/realcrypt1/filmy (mount/mnt call failed with "RPC Packet not accepted by the server")
19:45:28 T:3041095680 ERROR: GetDirectory - Error getting nfs://192.168.0.10/media/realcrypt1/filmy/
19:45:28 T:3041095680 ERROR: CGUIDialogFileBrowser::GetDirectory(nfs://192.168.0.10/media/realcrypt1/filmy/) failed
On the server side, both looked the same:
maj 17 21:44:58 snowball2 rpc.mountd: authenticated mount request from 192.168.0.11:782 for /media/realcrypt1/tv (/m...t1/tv)
maj 17 21:45:28 snowball2 rpc.mountd: authenticated mount request from 192.168.0.11:720 for /media/realcrypt1/filmy ...filmy)
Eventually I have discovered that downgrading back to kernel-3.8.11-200.fc18 has solved the problem
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. configure NFS shares in openelec
2. upgrade the Fedora server
3. try to access shared files
shares stop working after kernel upgrade
shares keep working as before
I am happy to provide more logs if necessary, just tell me which ones.
Updating to 3.9.3-201.fc18 does not help.
I wonder what libnfs does.
Might be worth strace'ing xmbc to see what the mount system call looks like and how that compares with the one done by mount.
"Eventually I have discovered that downgrading back to kernel-3.8.11-200.fc18 has solved the problem"
Is it the client or server that you're upgrading and downgrading here?
Comparing network traces in the two cases might also be interesting.
(tcpdump -s0 -wtmp.pcap, then look at tmp.pcap in wireshark or attach it to this bug.)
It is the server that I'm downgrading. Keep in mind that the client is a minimalistic linux distribution (openelec), and thus it might be hard to debug the problem from that end. xbmc is booted by openelec automatically, so I'm not sure how to strace it.
When it comes to tcpdump, should I run on the client or the server?
Created attachment 752837 [details]
Ok, there seems to be no strace nor tcpdump on the client. On the server, the command you gave returns the following:
$ tcpdump -s0 -wtmp.pcap
tcpdump: no suitable device found
$ tcpdump -s0 -i wlan0 -wtmp.pcap
has worked. Let me know if there is any useful information in that file.
Thanks! I assume that trace was taken in the failing case?
The last call there is a LOOKUP which returns a badcred authentication error.
OK, I see--this is another consequence of recent user namespace changes which tend to treat -1 id's as invalid. And for some reason xmbc is sending that last lookup with a gid of 0xffff.
I'll see if I can come up with a patch....
Created attachment 752881 [details]
[PATCH] svcrpc: fix failures to handle -1 uid's and gid's
Could you see whether this patch helps?
Hi, I hit this issue with openelec too (sending gid 0xffff) returning bad cred when moving from 3.8 to 3.9. Tried your patch on top of gentoo-sources 3.9.3 but still returns bad cred.
The patch seems to work for me, thank you. Aidan, why don't you try it with the Fedora kernel:
I am uploading the files as of writing this comment, there is also a source rpm if you prefer to rebuild the kernel yourself.
Sorry, my mistake, yes the patch does work. Please commit to master and 3.9 branches.
I have now uploaded a fixed 3.9.4-200.fc18 kernel.
Thanks for the testing!
Should be going upstream as well in the next few days.
BTW this should also be reported to xmbc (or whoever maintains their nfs library); -1 is an extremely poor choice of gid and it wouldn't be surprising to see it cause problems for other NFS servers as well.
I think it might be fixed already: