Bug 1228272 - NFS: nfs4_discover_server_trunking unhandled error -22. Exiting with error EIO
Summary: NFS: nfs4_discover_server_trunking unhandled error -22. Exiting with error EIO
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: nfs-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-04 14:07 UTC by Couret Charles-Antoine
Modified: 2017-03-08 14:27 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-26 16:15:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
TCPDump result after mount -a (87.53 KB, application/octet-stream)
2015-06-05 07:40 UTC, Couret Charles-Antoine
no flags Details
Patch submitted to upstream (866 bytes, patch)
2015-06-18 08:08 UTC, Anders Blomdell
no flags Details | Diff

Description Couret Charles-Antoine 2015-06-04 14:07:01 UTC
Description of problem:
In my office, I have a NFS server with shared files. With Fedora 21 I don't have any problems to establish a connection with it and to mount the directory.

After fedup from 21 to 22, and any change in configuration, I can't do that. I have during the boot (and after each "mount -a) : 

juin 04 09:23:34 Ducky kernel: FS-Cache: Loaded
juin 04 09:23:34 Ducky kernel: FS-Cache: Netfs 'nfs' registered for caching
juin 04 09:23:34 Ducky kernel: Key type dns_resolver registered
juin 04 09:23:34 Ducky kernel: NFS: Registering the id_resolver key type
juin 04 09:23:34 Ducky kernel: Key type id_resolver registered
juin 04 09:23:34 Ducky kernel: Key type id_legacy registered
juin 04 09:23:34 Ducky kernel: NFS: nfs4_discover_server_trunking unhandled error -22. Exiting with error EIO


I don't know if the kernel is the real responsible.

My /etc/fstab :

UUID=d2109ad8-3066-4d40-bd94-f174a51897e5 /                       ext4    defaults        1 1
UUID=6b12424b-6b71-4f55-9ef1-29b0cd8d7110 /home                   ext4    defaults        1 2
UUID=eb6cad5e-a06e-434d-bf1e-b49ed5d93566 swap                    swap    defaults        0 0
192.168.1.250:/home/nexvision /nexvs   nfs  auto


Thank you in advance.

PS :
[16:05:53] root@Ducky:~# uname -a
Linux Ducky 4.0.4-303.fc22.x86_64 #1 SMP Thu May 28 12:37:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 J. Bruce Fields 2015-06-04 20:17:57 UTC
22 is -EINVAL.  A network trace might be helpful (tcpdump -s0 -wtmp.pcap, then try the mount, then kill tcpdump and attach the tmp.pcap to this bug and/or take a look at it in wireshark).

"In my office, I have a NFS server with shared files. With Fedora 21 I don't have any problems to establish a connection with it and to mount the directory.

After fedup from 21 to 22, and any change in configuration, I can't do that."

I'm having trouble sorting out client and server here: is it the NFS server you're upgrading, or the NFS client?  If it's the client, what is your NFS server?  (Fedora, or something else?)

Comment 2 Couret Charles-Antoine 2015-06-04 23:15:19 UTC
> 22 is -EINVAL.  A network trace might be helpful (tcpdump -s0 -wtmp.pcap, then try the mount, then kill tcpdump and attach the tmp.pcap to this bug and/or take a look at it in wireshark).

Ok, I will test tomorrow.

> I'm having trouble sorting out client and server here: is it the NFS server you're upgrading, or the NFS client?  If it's the client, what is your NFS server?  (Fedora, or something else?)

It's my workstation only, the client side. The server is based on Debian (probably previous release stable, I'm not sure).

Comment 3 Couret Charles-Antoine 2015-06-04 23:28:06 UTC
Apparently, I am not alone : http://forums.fedoraforum.org/showthread.php?p=1734382

Comment 4 Couret Charles-Antoine 2015-06-05 07:40:00 UTC
Created attachment 1035052 [details]
TCPDump result after mount -a

Comment 5 Steve Dickson 2015-06-10 11:46:54 UTC
(In reply to Couret Charles-Antoine from comment #4)
> Created attachment 1035052 [details]
> TCPDump result after mount -a

V4 NULL Call (Reply In 222)
V4 NULL Reply (Call In 220)
V4 Call (Reply In 225) EXCHANGE_ID
V4 Reply (Call In 224) Status: NFS4ERR_MINOR_VERS_MISMATCH
V4 NULL Call (Reply In 234)
V4 NULL Reply (Call In 232)
V4 Call (Reply In 237) EXCHANGE_ID
V4 Reply (Call In 236) EXCHANGE_ID Status: NFS4ERR_INVAL

I wonder if the VERS_MISMATCH is causing the INVAL?

Comment 6 J. Bruce Fields 2015-06-15 20:48:40 UTC
(In reply to Steve Dickson from comment #5)
> (In reply to Couret Charles-Antoine from comment #4)
> > Created attachment 1035052 [details]
> > TCPDump result after mount -a
> 
> V4 NULL Call (Reply In 222)
> V4 NULL Reply (Call In 220)
> V4 Call (Reply In 225) EXCHANGE_ID
> V4 Reply (Call In 224) Status: NFS4ERR_MINOR_VERS_MISMATCH
> V4 NULL Call (Reply In 234)
> V4 NULL Reply (Call In 232)
> V4 Call (Reply In 237) EXCHANGE_ID
> V4 Reply (Call In 236) EXCHANGE_ID Status: NFS4ERR_INVAL
> 
> I wonder if the VERS_MISMATCH is causing the INVAL?

Note the first EXCHANGE_ID is with minorversion 2, and the second with minorversion 1.  The EXCHANGE_ID's otherwise look the same.

Almost certainly a server bug (is this a Solaris server?), probably a dup of bug 1226387.  Please report to the server vendor and see 1226387 for workarounds (basically, mount with vers=4.0 to broken around the server's broken version negotiation).

Comment 7 Anders Blomdell 2015-06-17 14:15:43 UTC
I see the same problem here, 

# mount -v myserver:/some/dir /mnt/
mount.nfs: mount(2): Protocol not supported
mount.nfs: mount(2): Input/output error
mount.nfs: mount system call failed
mount.nfs: timeout set for Wed Jun 17 16:15:44 2015
mount.nfs: trying text-based options 'vers=4.2,addr=10.0.0.189,clientaddr=10.0.0.143'
mount.nfs: trying text-based options 'vers=4.1,addr=10.0.0.189,clientaddr=10.0.0.143'


Works with this in /etc/nfsmount.conf ():

# Workaround for nfs mount negotiation stopping at v4.1
[ Server "myserver" ]
Defaultvers=4

Comment 8 Anders Blomdell 2015-06-17 14:47:44 UTC
BTW, my server that gives problems is an old Fedora :-(
This is my blind and stupid fix to nfs-utils:

--- utils/mount/stropts.c~	2015-06-17 16:28:16.413539978 +0200
+++ utils/mount/stropts.c	2015-06-17 16:45:22.446536335 +0200
@@ -838,6 +838,7 @@
 		return result;
 
 	switch (errno) {
+	case EIO:
 	case EPROTONOSUPPORT:
 		/* A clear indication that the server or our
 		 * client does not support NFS version 4 and minor */

Comment 9 Steve Dickson 2015-06-17 15:51:23 UTC
(In reply to Anders Blomdell from comment #8)
> BTW, my server that gives problems is an old Fedora :-(
hmm... this is getting a bit worrisome.. 
 
> This is my blind and stupid fix to nfs-utils:
> 
> --- utils/mount/stropts.c~	2015-06-17 16:28:16.413539978 +0200
> +++ utils/mount/stropts.c	2015-06-17 16:45:22.446536335 +0200
> @@ -838,6 +838,7 @@
>  		return result;
>  
>  	switch (errno) {
> +	case EIO:
>  	case EPROTONOSUPPORT:
>  		/* A clear indication that the server or our
>  		 * client does not support NFS version 4 and minor */

Lets use this to start the upstream discussion. Would you mind
posting this to linux-nfs.org using the patch
guidelines under 
   https://www.kernel.org/doc/Documentation/SubmittingPatches

I have no idea where or if it will go, but this new "feature"
is definitely exercising untested code in legacy servers.

Comment 10 J. Bruce Fields 2015-06-17 15:58:05 UTC
(In reply to Anders Blomdell from comment #8)
> BTW, my server that gives problems is an old Fedora :-(

Ugh.  What kernel version is that Fedora server running?

Comment 11 Anders Blomdell 2015-06-18 07:32:41 UTC
(In reply to J. Bruce Fields from comment #10)
> (In reply to Anders Blomdell from comment #8)
> > BTW, my server that gives problems is an old Fedora :-(
> 
> Ugh.  What kernel version is that Fedora server running?
2.6.35.14-106.fc14.i686.PAE

Comment 12 Anders Blomdell 2015-06-18 08:07:40 UTC
(In reply to Steve Dickson from comment #9)
> Lets use this to start the upstream discussion. Would you mind
> posting this to linux-nfs.org using the patch
> guidelines under 
>    https://www.kernel.org/doc/Documentation/SubmittingPatches
> 
> I have no idea where or if it will go, but this new "feature"
> is definitely exercising untested code in legacy servers.
Hopefully done.

Comment 13 Anders Blomdell 2015-06-18 08:08:38 UTC
Created attachment 1040329 [details]
Patch submitted to upstream

Comment 14 Steve Dickson 2015-06-18 11:44:49 UTC
(In reply to Anders Blomdell from comment #13)
> Created attachment 1040329 [details]
> Patch submitted to upstream

Thank you!

Comment 15 Anders Blomdell 2015-06-18 13:04:29 UTC
On 2015-06-18 14:53, Trond Myklebust wrote:> On Thu, Jun 18, 2015 at 8:28 AM, Anders Blomdell
> <anders.blomdell.se> wrote:
>> On 2015-06-18 13:49, Trond Myklebust wrote:
>>> On Thu, Jun 18, 2015 at 4:04 AM, Anders Blomdell
>>> <anders.blomdell.se> wrote:
>>>>
>>>> I have a problem with a 4.0.4 client refusing to mount from a 2.6.35 server
>>>> due to NFS4ERR_INVAL returned during nfs4_discover_server_trunking. See
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1228272.
>>>
>>>
>>> Why should we change the clients if the server is in clear and obvious
>>> violation of the spec?
>> In order to make clients work with servers that worked well with previous versions
>> of nfs-utils, the cultprit probably being commit f9802988 that bumped the default
>> autonegotion version to 4.2, what the patch does is only to negotiate a lower version
>> in case of errors, and hence making 1.3.2 working with servers that worked with
>> 1.3.1 (that only tried version 4[.0]).
>>
>> Will probably save some people some time.
> 
> This is what /etc/nfsmount.conf is for. We don't fix clients that are
> working correctly according to the protocol spec.
> 
> Trond

So this is obviously NOTABUG (sadly enough)

Comment 16 J. Bruce Fields 2015-06-19 18:32:40 UTC
(In reply to Anders Blomdell from comment #11)
> (In reply to J. Bruce Fields from comment #10)
> > (In reply to Anders Blomdell from comment #8)
> > > BTW, my server that gives problems is an old Fedora :-(
> > 
> > Ugh.  What kernel version is that Fedora server running?
> 2.6.35.14-106.fc14.i686.PAE

I took a look at the code for that kernel and can't see an obvious explanation for the difference.  Judging from the network trace it's correctly handling the case of minorversions that are totally unsupported by the code (minorversion >=2), but not a minorversion that's supported by the code by currently runtime-disabled (minorversion=1).  There's probably some obvious logic error I'm not seeing there....  In any case the bug's no longer present upstream so I'm not inclined to investigate further for the sake of an old Fedora version, but if someone wants to it's probably something easy to fix.

Anyway, the fact that this affects both knfsd and Solaris servers leans me a bit more towards doing a workaround, but I'm also sympathetic to Trond's point of view that we shouldn't accumulate workarounds.

Comment 17 William Rueth 2015-06-25 23:21:53 UTC
(In reply to Anders Blomdell from comment #11)
> (In reply to J. Bruce Fields from comment #10)
> > (In reply to Anders Blomdell from comment #8)
> > > BTW, my server that gives problems is an old Fedora :-(
> > 
> > Ugh.  What kernel version is that Fedora server running?
> 2.6.35.14-106.fc14.i686.PAE

I have two (2) old Fedora servers both running   2.6.35.14-106.fc14.i686   and feeding two (2) clients one running Fedora 21 & one running Fedora 22. Fedora 21 mounts NFS files fine, Fedora 22 does not mount NFS files. 

Fedora 22 fails to mount {4.0.5-300.fc22.x86_64}
mount.nfs: timeout set for Thu Jun 25 18:16:58 2015
mount.nfs: trying text-based options 'vers=4.2,addr=192.168.12.102,clientaddr=192.168.12.104'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'vers=4.1,addr=192.168.12.102,clientaddr=192.168.12.104'
mount.nfs: mount(2): Input/output error
mount.nfs: mount system call failed

Fedora 21 mounts fine {4.0.5-200.fc21.x86_64}

Comment 18 Benjamin Coddington 2015-06-26 11:45:55 UTC
Looks very nfsd kernel version specific.. I can probably reproduce that.

Jun 26 06:55:53 olfedorapants kernel: [  220.354879] nfsd_dispatch: vers 4 proc 1
Jun 26 06:55:53 olfedorapants kernel: [  220.354880] nfsv4 compound op #1/1: 42 (OP_EXCHANGE_ID)
Jun 26 06:55:53 olfedorapants kernel: [  220.354882] nfsd4_exchange_id rqstp=ffff880071860000 exid=ffff88007c05a080 clname.len=33 clname.data=ffff88007184b080 ip_addr=10.0.1.82 flags 101, spa_how 0
Jun 26 06:55:53 olfedorapants kernel: [  220.354884] nfsv4 compound op ffff88007c05a078 opcnt 1 #1: 42: status 22
Jun 26 06:55:53 olfedorapants kernel: [  220.354885] nfsv4 compound returned 22

A lack of the return code dprintk for nfsd4_exchange_id makes it seem like we're doing an early return, and there's only one place that can happen - in the exchange_id name/flags check:
fs/nfsd/nfs4state.c
1123 __be32
1124 nfsd4_exchange_id(struct svc_rqst *rqstp,
1125           struct nfsd4_compound_state *cstate,
1126           struct nfsd4_exchange_id *exid)
...
1142     if (!check_name(exid->clname) || (exid->flags & ~EXCHGID4_FLAG_MASK_A))
1143         return nfserr_inval;
1144

and in this knfsd kernel:
#define EXCHGID4_FLAG_MASK_A                   0x40070003

And my flags are (client is 4.0.5-200.fc21 w/ upstream nfs-utils):
flags: 0x00000101
    0... .... .... .... .... .... .... .... = EXCHGID4_FLAG_CONFIRMED_R: Not set
    .0.. .... .... .... .... .... .... .... = EXCHGID4_FLAG_UPD_CONFIRMED_REC_A: Not set
    .... .... .... .0.. .... .... .... .... = EXCHGID4_FLAG_USE_PNFS_DS: Not set
    .... .... .... ..0. .... .... .... .... = EXCHGID4_FLAG_USE_PNFS_MDS: Not set
    .... .... .... ...0 .... .... .... .... = EXCHGID4_FLAG_USE_NON_PNFS: Not set
    .... .... .... .... .... ...1 .... .... = EXCHGID4_FLAG_BIND_PRINC_STATEID: Set
    .... .... .... .... .... .... .... ..0. = EXCHGID4_FLAG_SUPP_MOVED_MIGR: Not set
    .... .... .... .... .... .... .... ...1 = EXCHGID4_FLAG_SUPP_MOVED_REFER: Set

The NFS client started setting EXCHGID4_FLAG_BIND_PRINC_STATEID here:
4f0b429 NFSv4.1: Enable state protection
with the earliest tag in v3.11

The NFS server updated EXCHGID4_FLAG_MASK_A to 0x40070103 here:
357f54d NFS fix the setting of exchange id flag
with the earliest tag in v2.6.38

So clients later than 2.6.38 mounting nfsv4.1 on servers older than 2.6.38 will run into this problem.

Comment 19 Benjamin Coddington 2015-06-26 11:51:26 UTC
(In reply to Benjamin Coddington from comment #18)
> So clients later than 2.6.38 mounting nfsv4.1 on servers older than 2.6.38
> will run into this problem.

Wrong! this should have been:

Clients later than 3.11 mounting nfsv4.1 on servers older than 2.6.38 will run into this problem.

Comment 20 Benjamin Coddington 2015-06-26 12:00:33 UTC
Fixes?

Make the client's stateid/cred binding optional/tunable?
Make mount.nfs downgrade, retry on this particular error?

The workaround should be to disable v4.1 on the old knfsd servers, and the nfs clients will properly, gracefully negotiate down to v4.0 as before.

Comment 21 Steve Dickson 2015-06-26 12:31:33 UTC
(In reply to Benjamin Coddington from comment #20)
> Fixes?
> 
> Make the client's stateid/cred binding optional/tunable?
> Make mount.nfs downgrade, retry on this particular error?
The error would be EIO.... Hmm... The change would not go upstream
but it might make Fedora users bit happier.... which would be a good thing.. 

> 
> The workaround should be to disable v4.1 on the old knfsd servers, and the
> nfs clients will properly, gracefully negotiate down to v4.0 as before.
I'm assuming most people don't like to touch old servers because
they just work...

Comment 22 J. Bruce Fields 2015-06-26 14:59:55 UTC
(In reply to Benjamin Coddington from comment #20)
> The workaround should be to disable v4.1 on the old knfsd servers, and the
> nfs clients will properly, gracefully negotiate down to v4.0 as before.

The kernel was defaulting 4.1 to off until 3.11 (and for reasons, I doubt this is the only case where an old 4.1 server could break on client upgrade).

Was something in nfs-utils overriding that, or are the bug reports all from folks who explicitly turned on 4.1 on their server?

Or is the older minorversion-checking logic somehow letting us get through to the exchange_id code even when 4.1 support is off?  (What's the content of /proc/fs/versions?)

Comment 23 Benjamin Coddington 2015-06-26 15:10:58 UTC
(In reply to J. Bruce Fields from comment #22)
> (In reply to Benjamin Coddington from comment #20)
> > The workaround should be to disable v4.1 on the old knfsd servers, and the
> > nfs clients will properly, gracefully negotiate down to v4.0 as before.
> 
> The kernel was defaulting 4.1 to off until 3.11 (and for reasons, I doubt
> this is the only case where an old 4.1 server could break on client upgrade).
> 
> Was something in nfs-utils overriding that, or are the bug reports all from
> folks who explicitly turned on 4.1 on their server?
> 
> Or is the older minorversion-checking logic somehow letting us get through
> to the exchange_id code even when 4.1 support is off?  (What's the content
> of /proc/fs/versions?)

I don't think so.  When I disabled 4.1 the mount would fall down to 4.0 properly.  These must be reports from people who have enabled 4.1.

Comment 24 J. Bruce Fields 2015-06-26 16:15:55 UTC
(In reply to Benjamin Coddington from comment #23)
> I don't think so.  When I disabled 4.1 the mount would fall down to 4.0
> properly.  These must be reports from people who have enabled 4.1.

OK, so it sounds like 1) these servers are running long-unsupported Fedora versions, and 2) their administrators turned on experimental code.

I have some sympathy, but we can't fix the old server now, and I don't think it's worth a workaround; closing as NOTABUG.

Thanks to everyone who helped investigate.

To help document the workarounds:

- if using a Fedora server: it should be enough just to turn off 4.1 on the server (probably just a matter of fixing the RPCNFSDARGS line in /etc/sysconfig/nfs).

- if using another server: mount with -overs=4.0, or use nfsmount.conf to set the default version for that server to 4.0 (see "man nfsmount.conf")

Comment 25 Anders Blomdell 2015-06-29 10:40:41 UTC
(In reply to J. Bruce Fields from comment #24)
> (In reply to Benjamin Coddington from comment #23)
> > I don't think so.  When I disabled 4.1 the mount would fall down to 4.0
> > properly.  These must be reports from people who have enabled 4.1.
> 
> OK, so it sounds like 1) these servers are running long-unsupported Fedora
> versions, and 2) their administrators turned on experimental code.
Assumption 2 is unfortunately false :-( (unless running Fedora is considered turning on experimental code ;-))

  # uname -a 
  Linux sperry-01 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51 UTC 2011 i686 i686 i386 GNU/Linux
  # rpm -q -f /boot/config-$(uname -r) 
  kernel-PAE-2.6.35.14-106.fc14.i686
  # grep  'CONFIG_NFS_V4_1=y' /boot/config-$(uname -r)
  CONFIG_NFS_V4_1=y


> I have some sympathy, but we can't fix the old server now, and I don't think
> it's worth a workaround; closing as NOTABUG.
OK, duly noted...

> 
> Thanks to everyone who helped investigate.
> 
> To help document the workarounds:
> 
> - if using a Fedora server: it should be enough just to turn off 4.1 on the
> server (probably just a matter of fixing the RPCNFSDARGS line in
> /etc/sysconfig/nfs).
RPCNFSDARGS+="-N 4.1"

> 
> - if using another server: mount with -overs=4.0, or use nfsmount.conf to
> set the default version for that server to 4.0 (see "man nfsmount.conf")

Comment 26 William Rueth 2015-06-30 00:19:28 UTC
(In reply to Anders Blomdell from comment #25)
> 
> > I have some sympathy, but we can't fix the old server now, and I don't think
> > it's worth a workaround; closing as NOTABUG.
> OK, duly noted...
> 
> > 
> > Thanks to everyone who helped investigate.
> > 
> > To help document the workarounds:
> > 
> > - if using a Fedora server: it should be enough just to turn off 4.1 on the
> > server (probably just a matter of fixing the RPCNFSDARGS line in
> > /etc/sysconfig/nfs).
> RPCNFSDARGS+="-N 4.1"
> 
> > 
> > - if using another server: mount with -overs=4.0, or use nfsmount.conf to
> > set the default version for that server to 4.0 (see "man nfsmount.conf")

RPCNFSDARGS+="-N 4.1" WORKS VERY WELL on both FC21 & FC22 machines.
Thank You


Note You need to log in before you can comment on or make changes to this bug.