Bug 1468044

Summary: kernel: NFS: nfs4_discover_server_trunking unhandled error -22. Exiting with error EIO
Product: [Fedora] Fedora Reporter: Tom Bouwman <tombouwman>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 25CC: bcodding, bfields, guipaivanz, jlayton, nilsborrmann, smayhew, steved, tombouwman
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-30 14:02:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tom Bouwman 2017-07-05 21:14:41 UTC
Description of problem:
kernel: NFS: nfs4_discover_server_trunking unhandled error -22. Exiting with error EIO
when trying to mount a NFS-device with Defaultvers=4 in the [ Server "hostname" ] section of /etc/nfsmount.conf

Version-Release number of selected component (if applicable):
nfs-utils-1:2.1.1-5.rc4.fc25.x86_64

How reproducible:
nfs-utils was upgraded from nfs-utils-1:2.1.1-5.rc3.fc25.x86_64 to nfs-utils-1:2.1.1-5.rc4.fc25.x86_64 on June 30 2017

Steps to Reproduce:
1. upgrade nfs-utils-1:2.1.1-5.rc3.fc25.x86_64 to nfs-utils-1:2.1.1-5.rc4.fc25.x86_64
2. try to mount a NFS-device with Defaultvers=4
3.

Actual results:
kernel: NFS: nfs4_discover_server_trunking unhandled error -22. Exiting with error EIO

Expected results:
A mounted NFS-device

Additional info:
Changing Defaultvers=4 in the [ Server "hostname" ] section of /etc/nfsmount.conf to Defaultvers=3, will do the mount. But that results in errors when trying to open LibreOffice documents.

Comment 1 Tom Bouwman 2017-07-05 21:48:08 UTC
I downgraded to nfs-uils 1.3.4-1.rc2.fc25.
I downloaded that version from the base release of fc25.

Tis appears to be working for the moment.
Used dnf versionlock to make sure that I does not get updated to rc4.

Comment 2 J. Bruce Fields 2017-07-06 20:21:38 UTC
(In reply to Tom Bouwman from comment #1)
> I downgraded to nfs-uils 1.3.4-1.rc2.fc25.
> I downloaded that version from the base release of fc25.
> 
> Tis appears to be working for the moment.
> Used dnf versionlock to make sure that I does not get updated to rc4.

I assume you're talking about the NFS client here?  (What kind of NFS server are you using?)

Watching network traffic between client and server in wireshark might be enlightening.

> But that results in errors when trying to open LibreOffice documents.

Weird, what errors?  Off hand I can't think why an application would work over NFSv4 but not over NFSv3.  Maybe it's trying to do file locking and that's not working in the v3 case for some reason?

Comment 3 Tom Bouwman 2017-07-07 07:21:17 UTC
Yes, I downgraded the nfs-utils on the NFS-client. My NFS-server is Fedora 14.

LibreOffice will open the documents only as read-only, saying it is locked by me.

Comment 4 Nils 2017-07-11 08:26:28 UTC
Had the same issue with version nfs-utils-1:2.1.1-5.rc4.fc25.x86_64.

In fstab i had "vers=4.0" and got the same error. NFS-Mouting didn't work on reboot. Changing it to "vers=4.1" fixed this issue for me.

Comment 5 Steve Dickson 2017-07-11 15:35:41 UTC
(In reply to Tom Bouwman from comment #3)
> Yes, I downgraded the nfs-utils on the NFS-client. My NFS-server is Fedora
> 14.
What 'Defaultvers=4' means is use v4 with the default minor version.
So what's happening the 4.2 is tried. When the server does not support
that, which is the case with fc14, it should return 
NFSERR_MINOR_VERS_MISMATCH. Maybe f14 is mishandling that negation

Defaultvers=4.0 should work. 
 
> 
> LibreOffice will open the documents only as read-only, saying it is locked
> by me.
I have now idea what's going on here... Is the filesystem mount rw?

Comment 6 Steve Dickson 2017-07-11 15:43:13 UTC
(In reply to Nils from comment #4)
> Had the same issue with version nfs-utils-1:2.1.1-5.rc4.fc25.x86_64.
> 
> In fstab i had "vers=4.0" and got the same error. NFS-Mouting didn't work on
> reboot. Changing it to "vers=4.1" fixed this issue for me.

What is the server on this one?

Comment 7 J. Bruce Fields 2017-07-11 17:20:11 UTC
(In reply to Steve Dickson from comment #5)
> (In reply to Tom Bouwman from comment #3)
> > LibreOffice will open the documents only as read-only, saying it is locked
> > by me.
> I have now idea what's going on here... Is the filesystem mount rw?

My guess is it tried to get a file lock and fcntl returned ENOLCK--an strace could help confirm that.  I think there was an rpcbind bug that could cause that?

Comment 8 Nils 2017-07-12 07:23:34 UTC
(In reply to Nils from comment #4)
> Had the same issue with version nfs-utils-1:2.1.1-5.rc4.fc25.x86_64.
> 
> In fstab i had "vers=4.0" and got the same error. NFS-Mouting didn't work on
> reboot. Changing it to "vers=4.1" fixed this issue for me.

Sorry, it only worked for the next reboot. Now i got the same message again.

Comment 9 Tom Bouwman 2017-07-12 07:55:09 UTC
(In reply to Steve Dickson from comment #5)
> (In reply to Tom Bouwman from comment #3)
> > Yes, I downgraded the nfs-utils on the NFS-client. My NFS-server is Fedora
> > 14.
> What 'Defaultvers=4' means is use v4 with the default minor version.
> So what's happening the 4.2 is tried. When the server does not support
> that, which is the case with fc14, it should return 
> NFSERR_MINOR_VERS_MISMATCH. Maybe f14 is mishandling that negation
> 
> Defaultvers=4.0 should work. 
>  
> > 
> > LibreOffice will open the documents only as read-only, saying it is locked
> > by me.
> I have now idea what's going on here... Is the filesystem mount rw?

During one of my upgrades fc21-->fc22 or fc22-->fc23 or fc23-->fc24 I had to add Defaultvers=4 in /etc/nfsmount.conf, because my NFS-server is fc14.

The filesystem is mounted rw.

Output of the mount command:
jupiter:/data/home on /mnt/jupiter/data/home type nfs4 (rw,relatime,vers=4.0,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.34,local_lock=none,addr=192.168.1.31)

Comment 10 Steve Dickson 2017-10-16 12:26:40 UTC
(In reply to Tom Bouwman from comment #9)
> (In reply to Steve Dickson from comment #5)
> > (In reply to Tom Bouwman from comment #3)
> > > Yes, I downgraded the nfs-utils on the NFS-client. My NFS-server is Fedora
> > > 14.
> > What 'Defaultvers=4' means is use v4 with the default minor version.
> > So what's happening the 4.2 is tried. When the server does not support
> > that, which is the case with fc14, it should return 
> > NFSERR_MINOR_VERS_MISMATCH. Maybe f14 is mishandling that negation
> > 
> > Defaultvers=4.0 should work. 
> >  
> > > 
> > > LibreOffice will open the documents only as read-only, saying it is locked
> > > by me.
> > I have now idea what's going on here... Is the filesystem mount rw?
> 
> During one of my upgrades fc21-->fc22 or fc22-->fc23 or fc23-->fc24 I had to
> add Defaultvers=4 in /etc/nfsmount.conf, because my NFS-server is fc14.
> 
> The filesystem is mounted rw.
> 
> Output of the mount command:
> jupiter:/data/home on /mnt/jupiter/data/home type nfs4
> (rw,relatime,vers=4.0,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,
> port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.34,local_lock=none,
> addr=192.168.1.31)

After thinking about this... with an F14 server... I would set Defaultvers=3 to get better performance...

Comment 11 Tom Bouwman 2017-10-16 15:33:28 UTC
Hi Steve,
Defaultvers=3 might be better for performance, but but results in opening LibreOffice documents as RO.
Not a solution.

Defaultvers=4 with nfs-utils 1.3.4-1.rc2.fc25 works for me.

I do not agree that this is not a bug.

Comment 12 Steve Dickson 2017-10-16 21:49:24 UTC
(In reply to Tom Bouwman from comment #11)
> Hi Steve,
> Defaultvers=3 might be better for performance, but but results in opening
> LibreOffice documents as RO.
> Not a solution.
> 
> Defaultvers=4 with nfs-utils 1.3.4-1.rc2.fc25 works for me.
> 
> I do not agree that this is not a bug.

Ok... Fair enough... Here are the differences between rc3 and rc4

commit 90790d3129cf6f5fe96cf01e37d2d0a89d8dbec
Author: Steve Dickson <steved>
Date:   Mon Jun 19 11:19:55 2017 -0400

    mount.nfs: Use default minor version when -o v4 is specified
    
    When v4 is specified on the command line the
    default minor version needs to be used.
    
    Signed-off-by: Steve Dickson <steved>

commit 62a4d95854e5cda4b772fa132cbd16c4429412c8
Author: Steve Dickson <steved>
Date:   Tue Jun 13 12:00:39 2017 -0400

    mount.nfs: Use default minor version when -t nfs4 is specified
    
    When the nfs4 filesystem specified, the default major
    and minor versions should be used.
    
    Signed-off-by: Steve Dickson <steved>

commit 9569237ba50c5857e04bc36c9b3250c570bfbef2
Author: Justin Mitchell <jumitche>
Date:   Wed Jun 21 12:01:01 2017 -0400

    Reimplement include functionality in nfs.conf
    
    Re-implement include file functionality as documented.
    
    Existing implementation had various issues, one of which was it allowed
    a subordinate file to inadvertently change which section the subsequent
    tags back in the master config file got assigned to.
    
    Acked-by: NeilBrown <neilb>
    Signed-off-by: Justin Mitchell <jumitche>
    Signed-off-by: Steve Dickson <steved>

commit 34c73d82ed02209bd8933da2f1f4761bb464d1d7
Author: NeilBrown <neilb>
Date:   Tue Jun 13 08:39:08 2017 -0400

    umount.nfs: assume path name is canonical.
    
    /usr/bin/umount will always pass a canonical name
    to umount.nfs, so it is safe to disable canonicalization.
    
    When umounting an NFS filesystem, it is generally safest to
    not "stat" the mountpoint at all as that can block
    indefinitely.  umount() will not block, but lstat() etc can.
    By disabling canonicalization in libmount, we discourage it
    from ever calling 'stat' family operations, and thus reduce
    the chance of a hang.
    
    Note that to be fully effective, this requires changes to
    util-linux which have not yet been accepted.
    When both that change and this are in effect, automounters
    can use "umount -c $PATH" to safely unmount a filesystem
    without blocking.
    
    Signed-off-by: NeilBrown <neilb>
    Signed-off-by: Steve Dickson <steved>

commit 1c2c18806800198bf3f2336939a5b5c348f46b92
Author: Justin Mitchell <jumitche>
Date:   Thu Jun 1 11:20:36 2017 -0400

    nfs.conf: Removed buffer overruns
    
    Remove the line length parameter and associated code which
    led to buffer overruns in the line parsing code.
    Also drops the undocumented 'include' directive.
    
    Signed-off-by: Justin Mitchell <jumitche>
    Signed-off-by: Steve Dickson <steved>

commit bc89ecb3146539e8a0afbd24ea6529aa8c4df175
Author: Justin Mitchell <jumitche>
Date:   Fri Jun 2 10:48:24 2017 -0400

    nfs.conf: Add function to cleanup and free the loaded config
    
    Signed-off-by: Justin Mitchell <jumitche>
    Signed-off-by: Steve Dickson <steved>

commit 0276228a6f0ac390d3c3ed3502bb0d3ad73d093b
Author: Justin Mitchell <jumitche>
Date:   Thu Jun 1 10:37:57 2017 -0400

    nfs.conf: Remove static variables in parsing routines
    
    Part of a sequence of attempts to tidy up the nfs.conf code and prepare
    it for use as part of a configuration API.
    
    Remove static vars that prevented memory cleanup and potentially lead to
    parsing errors if conf_init was called again.
    
    Signed-off-by: Justin Mitchell <jumitche>
    Signed-off-by: Steve Dickson <steved>

commit db61c1a59b2b29e35a608c798159088a39347dea
Author: Steve Dickson <steved>
Date:   Thu Jun 1 13:29:20 2017 -0400

    nfsd: added 'u' to argument list.
    
    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451568
    Signed-off-by: Steve Dickson <steved>

There were some changes to the nfs.conf processing...

Comment 13 Steve Dickson 2017-10-17 13:35:47 UTC
would it be possible to do a mount -vvv which will turn on verbosity?

This info will show how the protocol negotiation...

Comment 14 Tom Bouwman 2017-10-18 13:21:57 UTC
I created a VM with F25 Workstation.
- kernel: 4.8.6-300.fc25.x86_64
- nfs-utils: 1.3.4-1.rc2.fc25

/etc/nfsmount.conf
[ Server "jupiter" ]
Defaultvers=4

/etc/fstab
jupiter:/data /mnt/jupiter/data nfs noauto,rsize=32768,wsize=32768 0 0

mount -vvv /mnt/jupiter/data
mount.nfs: timeout set for Wed Oct 18 14:36:24 2017
mount.nfs: trying text-based options 'rsize=32768,wsize=32768,vers=4.0,addr=192.168.1.31,clientaddr=192.168.1.47'

This all works find.
===================
Done a dnf update:
- kernel: 4.13.5-100.fc25.x86_64
- nfs-utils: 2.1.1-5.rc4.fc25

All other settings as above.

mount -vvv /mnt/jupiter/data
mount.nfs: timeout set for Wed Oct 18 15:09:35 2017
mount.nfs: trying text-based options 'rsize=32768,wsize=32768,vers=4.2,addr=192.168.1.31,clientaddr=192.168.1.47'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'rsize=32768,wsize=32768,vers=4.1,addr=192.168.1.31,clientaddr=192.168.1.47'
mount.nfs: mount(2): Input/output error
mount.nfs: mount system call failed
===================
After dnf downgrade nfs-utils (back to 1.3.4-1.rc2.fc25.x86_64), everything works fine again.

Comment 15 Steve Dickson 2017-10-18 14:50:35 UTC
(In reply to Tom Bouwman from comment #14)
> Done a dnf update:
> - kernel: 4.13.5-100.fc25.x86_64
> - nfs-utils: 2.1.1-5.rc4.fc25
> 
> All other settings as above.
> 
> mount -vvv /mnt/jupiter/data
> mount.nfs: timeout set for Wed Oct 18 15:09:35 2017
> mount.nfs: trying text-based options
> 'rsize=32768,wsize=32768,vers=4.2,addr=192.168.1.31,clientaddr=192.168.1.47'
> mount.nfs: mount(2): Protocol not supported
> mount.nfs: trying text-based options
> 'rsize=32768,wsize=32768,vers=4.1,addr=192.168.1.31,clientaddr=192.168.1.47'
> mount.nfs: mount(2): Input/output error
> mount.nfs: mount system call failed
> ===================
> After dnf downgrade nfs-utils (back to 1.3.4-1.rc2.fc25.x86_64), everything
> works fine again.
I see what happening here... Two things are going on here:

1) The server is not returning the correct error on the v4.1 negotiation.  
On the 4.2 negotiation the correct error (EPROTONOSUPPORT) is returned
which causes the v4.1 negotiation. On that negotiation EIO is returned 
which cause the mount to fail. If EPROTONOSUPPORT was returned the 
negotiation would move on to v4.0.

2) The reason this negotiation is happening is the meaning of "Defaultvers=4"
has changed (a change that I did not realize went into f25).
Defaultvers=4 did mean v4.0 but it now means negotiate the minor version
since it was not specified. So to specify v4.0 from nfsmount.conf the minor
version needs to specified (aka Defaultvers=4.0), which is the work
around for this case. 


Question: who's server is this? So I can added to the list of broken servers or servers we break... ;-)

Comment 16 Tom Bouwman 2017-10-18 19:18:52 UTC
Hi Steve,

Tried with your workaround on the test VM with F25 Workstation. --> works
Tried with your workaround on my production laptop, also with F25 Workstation --> works too.

Not sure what you mean by "who's server is this?".
If you are referring to jupiter (192.168.1.31), that is my personal server, running Fedora 14.

Thanks, Tom

Comment 17 J. Bruce Fields 2017-10-18 19:33:51 UTC
(In reply to Tom Bouwman from comment #16)
> Not sure what you mean by "who's server is this?".
> If you are referring to jupiter (192.168.1.31), that is my personal server,
> running Fedora 14.

Is it reproduceable on a more recent server?  From Steve's description in comment 15 it does sound like a server bug.

Comment 18 Steve Dickson 2017-10-18 20:27:13 UTC
(In reply to Tom Bouwman from comment #16)
> Hi Steve,
> 
> Tried with your workaround on the test VM with F25 Workstation. --> works
> Tried with your workaround on my production laptop, also with F25
> Workstation --> works too.
Good.. that is the work-around then... 

> 
> Not sure what you mean by "who's server is this?".
> If you are referring to jupiter (192.168.1.31), that is my personal server,
> running Fedora 14.
That's right... its f14... maybe you could use a server from this century?? ;-)

Comment 19 Guilherme Paiva 2017-10-30 03:53:01 UTC
Hey Guys,

I was having the same problem but turns out that my NFS server was outdated.
After upgrading the NFS server nfs-utils package it started working again.

Comment 20 Steve Dickson 2017-10-30 14:02:11 UTC
(In reply to Guilherme Paiva from comment #19)
> Hey Guys,
> 
> I was having the same problem but turns out that my NFS server was outdated.
> After upgrading the NFS server nfs-utils package it started working again.

Good to hear...

Comment 21 Guilherme Paiva 2017-10-30 23:37:06 UTC
Also, adding a bit more info to my issue, after upgrading the nfs-utils it started working but later I found another problem that was giving the same error but this time, somehow after the OS upgrade the timezone was different.

What I did was to set the timezone on my NFS server to UTC(My chosen timezone for my servers) and also make sure the servers connecting to it were configured to UTC.

Without having to restart the server, I simply ran mount -a and it worked like a charm