Bug 1392133

Summary: nfs client mount fails to fallback to NFSv3 when talking to Ganesha server
Product: [Fedora] Fedora Reporter: Valdis Kletnieks <valdis.kletnieks>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: bcodding, bfields, ffilz, jlayton, jthottan, kkeithle, skoduri, smayhew, steved, valdis.kletnieks
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 19:37:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
tcpdump of failing NFS mount. none

Description Valdis Kletnieks 2016-11-04 23:43:59 UTC
Created attachment 1217534 [details]
tcpdump of failing NFS mount.

Description of problem:
(This may be an upstream problem, or a server-end problem. We're seeing the same problem on multiple client machines)

We are in the process of deploying a Ganesha-based NFS server (RHEL 7.1, nfs-ganesha 2.3.2).  Due to internal issues that mean multiple auth servers would be involved, we're *not* doing NFSv4, and only NFSv3 (we can fortunately isolate the communities involved to different exports, so there's no issue with UID/GID namespace overlap).  As a result, although the NFS server is NFSv4 capable, there are no v4 exports, and the server is configured for 'default to nfsv3'.

If I do a 'mount server.name:/export/path/here /mnt/nfs', it goes astray:

Client end defaults to trying NFS 4.

Client sends nfsv4 NULL call with AUTH_NULL.
Server replies with 'accepted' and Accept State: program can't support procedure (3).

Client waits 4 seconds, and retries the exact same thing, rather than dropping back to NFSv3.  Lather rinse repeat till the mount command times out.

Specifying 'mount -o nfsvers=3' makes the mount work.


Version-Release number of selected component (if applicable):
nfs-utils-1.3.4-1.rc2.fc26

tcpdump attached.


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 J. Bruce Fields 2016-11-07 13:23:00 UTC
"Client sends nfsv4 NULL call with AUTH_NULL. Server replies with 'accepted' and Accept State: program can't support procedure (3)."

That response makes no sense (the client sent procedure 0, not procedure 3), so there's almost certainly some simple Ganesha bug here.  The client could possibly handle this better (e.g., erroring out instead of retrying), but I think for such a clear server bug it's not worth it.

By the way: "As a result, although the NFS server is NFSv4 capable, there are no v4 exports, and the server is configured for 'default to nfsv3'."

A 'default to nfsv3' server configuration doesn't really make sense: There is no way in the protocol for a server to indicate that it supports v4 but prefers v3, so if you want clients to negotiate down then v4 needs to be completely off.

Comment 2 Soumya Koduri 2016-11-07 14:23:20 UTC
Which version of nfs-ganesha are you using? I checked the behaviour using the latest nfs-ganesha-2.4 branch. The client seems to be falling back to NFSv3.

Comment 3 Valdis Kletnieks 2016-11-07 18:29:25 UTC
The client didn't request procedure 3.

The server sent back a result code 3 which is "program can't support procedure".

Or so says wireshark when looking at the data stream.

Comment 4 J. Bruce Fields 2016-11-07 18:59:14 UTC
(In reply to Valdis Kletnieks from comment #3)
> The client didn't request procedure 3.
> 
> The server sent back a result code 3 which is "program can't support
> procedure".
> 
> Or so says wireshark when looking at the data stream.

D'oh, you're right, I don't know what I was thinking when I said that.

Anyway, that's still a nonsensical response, support for procedure 0 is mandatory, so it should be PROG_MISMATCH if it doesn't support NFS version 4.

Comment 5 Valdis Kletnieks 2016-11-07 19:23:38 UTC
(In reply to J. Bruce Fields from comment #4)
> Anyway, that's still a nonsensical response, support for procedure 0 is
> mandatory, so it should be PROG_MISMATCH if it doesn't support NFS version 4.

Aha.  That's probably the smoking gun.

And deeper digging indicates that our copy of nfs-ganesha came from IBM, which presents the possibility that they broke it somehow.  I'll take it up with them.

Comment 6 J. Bruce Fields 2016-11-07 19:37:47 UTC
Thanks, in that case I think we should close this; reopen if it turns out there's some problem on our side after all.

Comment 7 Valdis Kletnieks 2016-11-07 19:49:03 UTC
OK, that works for me, will re-open if needed.

Comment 8 Frank Filz 2016-11-08 00:05:51 UTC
Hmm, there COULD have been bugs crept into Ganesha that make it not work right if NFS v4 is disabled. It SHOULD NOT register with rpcbind for NFS v4 if only Protocols=3 is defined in NFS_CORE_PARAM.

If that isn't the case, but all exports are Protocols=3 only, I think Ganesha will still build a PseudoFS with only a root directory, but in that case, it should "work" just not be very interesting...