964319 – Upgrading to 3.9 kernel breaks NFS

Bug 964319 - Upgrading to 3.9 kernel breaks NFS

Summary: Upgrading to 3.9 kernel breaks NFS

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	18
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	nfs-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://github.com/sahlberg/libnfs/is...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-05-17 20:50 UTC by Julian Sikorski
Modified:	2013-06-13 19:30 UTC (History)
CC List:	7 users (show)
Fixed In Version:	3.9.5-201.fc18.x86_64
Clone Of:
Environment:
Last Closed:	2013-06-13 19:30:00 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tcpdump file (11.55 KB, application/vnd.tcpdump.pcap) 2013-05-24 20:38 UTC, Julian Sikorski	no flags	Details
[PATCH] svcrpc: fix failures to handle -1 uid's and gid's (2.01 KB, patch) 2013-05-24 22:02 UTC, J. Bruce Fields	no flags	Details \| Diff
View All

Description Julian Sikorski 2013-05-17 20:50:34 UTC

Description of problem:
This is a complex one. I have noticed that I could not access the shares I have configured in openelec. Here is the exports file on the host:

/media/realcrypt1/tv 192.168.0.11(ro,sync)
/media/realcrypt1/stand-up 192.168.0.11(ro,sync)
/media/realcrypt1/filmy 192.168.0.11(ro,sync)

I could mount the shares via command line:

# mount -t nfs 192.168.0.10:/media/realcrypt1/tv tvshows

Yet, they would not work when XBMC ui would be used. Program's log said:

19:45:20 T:2941772896 DEBUG: SECTION:LoadDLL(libnfs.so.1)
19:45:20 T:2941772896 DEBUG: Loading: libnfs.so.1
19:45:28 T:2941772896 ERROR: NFS: Failed to mount nfs share: /media/realcrypt1/filmy (mount/mnt call failed with "RPC Packet not accepted by the server")
19:45:28 T:3041095680 ERROR: GetDirectory - Error getting nfs://192.168.0.10/media/realcrypt1/filmy/
19:45:28 T:3041095680 ERROR: CGUIDialogFileBrowser::GetDirectory(nfs://192.168.0.10/media/realcrypt1/filmy/) failed 

On the server side, both looked the same:

maj 17 21:44:58 snowball2 rpc.mountd[18256]: authenticated mount request from 192.168.0.11:782 for /media/realcrypt1/tv (/m...t1/tv)
maj 17 21:45:28 snowball2 rpc.mountd[18256]: authenticated mount request from 192.168.0.11:720 for /media/realcrypt1/filmy ...filmy)

Eventually I have discovered that downgrading back to kernel-3.8.11-200.fc18 has solved the problem

Version-Release number of selected component (if applicable):
kernel-3.9.2-200.fc18

How reproducible:
always

Steps to Reproduce:
1. configure NFS shares in openelec
2. upgrade the Fedora server
3. try to access shared files
  
Actual results:
shares stop working after kernel upgrade

Expected results:
shares keep working as before

Additional info:
I am happy to provide more logs if necessary, just tell me which ones.

Comment 1 Julian Sikorski 2013-05-24 16:26:40 UTC

Updating to 3.9.3-201.fc18 does not help.

Comment 2 J. Bruce Fields 2013-05-24 19:57:22 UTC

I wonder what libnfs does.

Might be worth strace'ing xmbc to see what the mount system call looks like and how that compares with the one done by mount.

"Eventually I have discovered that downgrading back to kernel-3.8.11-200.fc18 has solved the problem"

Is it the client or server that you're upgrading and downgrading here?

Comparing network traces in the two cases might also be interesting.

(tcpdump -s0 -wtmp.pcap, then look at tmp.pcap in wireshark or attach it to this bug.)

Comment 3 Julian Sikorski 2013-05-24 20:16:37 UTC

It is the server that I'm downgrading. Keep in mind that the client is a minimalistic linux distribution (openelec), and thus it might be hard to debug the problem from that end. xbmc is booted by openelec automatically, so I'm not sure how to strace it.
When it comes to tcpdump, should I run on the client or the server?

Comment 4 Julian Sikorski 2013-05-24 20:38:58 UTC

Created attachment 752837 [details]
tcpdump file

Ok, there seems to be no strace nor tcpdump on the client. On the server, the command you gave returns the following:
$ tcpdump -s0 -wtmp.pcap
tcpdump: no suitable device found
$ tcpdump -s0 -i wlan0 -wtmp.pcap
has worked. Let me know if there is any useful information in that file.

Comment 5 J. Bruce Fields 2013-05-24 21:07:45 UTC

Thanks!  I assume that trace was taken in the failing case?

The last call there is a LOOKUP which returns a badcred authentication error.

Comment 6 J. Bruce Fields 2013-05-24 21:10:02 UTC

OK, I see--this is another consequence of recent user namespace changes which tend to treat -1 id's as invalid.  And for some reason xmbc is sending that last lookup with a gid of 0xffff.

I'll see if I can come up with a patch....

Comment 7 J. Bruce Fields 2013-05-24 22:02:50 UTC

Created attachment 752881 [details]
[PATCH] svcrpc: fix failures to handle -1 uid's and gid's

Could you see whether this patch helps?

Comment 8 Aidan Marks 2013-05-25 02:20:48 UTC

Hi, I hit this issue with openelec too (sending gid 0xffff) returning bad cred when moving from 3.8 to 3.9.  Tried your patch on top of gentoo-sources 3.9.3 but still returns bad cred.

Comment 9 Julian Sikorski 2013-05-25 07:27:30 UTC

The patch seems to work for me, thank you. Aidan, why don't you try it with the Fedora kernel:
http://belegdol.fedorapeople.org/nfs-xbmc-fix/
I am uploading the files as of writing this comment, there is also a source rpm if you prefer to rebuild the kernel yourself.

Comment 10 Aidan Marks 2013-05-25 21:58:27 UTC

Sorry, my mistake, yes the patch does work.  Please commit to master and 3.9 branches.

Comment 11 Julian Sikorski 2013-05-26 09:25:39 UTC

I have now uploaded a fixed 3.9.4-200.fc18 kernel.

Comment 12 J. Bruce Fields 2013-05-29 14:41:07 UTC

Thanks for the testing!

Should be going upstream as well in the next few days.

Comment 13 J. Bruce Fields 2013-05-30 20:37:34 UTC

BTW this should also be reported to xmbc (or whoever maintains their nfs library); -1 is an extremely poor choice of gid and it wouldn't be surprising to see it cause problems for other NFS servers as well.

Comment 14 Julian Sikorski 2013-05-30 20:46:15 UTC

I think it might be fixed already:
https://github.com/sahlberg/libnfs/commit/43e0e7a7e6cbec9ba55db89eac368d42e969ad55

Note You need to log in before you can comment on or make changes to this bug.