Bug 606108

Summary: NFS to NFSv3 server is broken in F-13
Product: [Fedora] Fedora Reporter: Tom Lane <tgl>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 13CC: dan, gmrandazzo, hhorak, jlayton, steved, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-15 13:47:03 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Tom Lane 2010-06-20 13:10:17 EDT
Description of problem:
I just updated to F-13 and find myself once again unable to access my old UDP-only NFSv3 server.
Experimentation suggests that NFSv3 support is completely broken now, because if I try to do the mount manually rather than via automount, I get this:

$ sudo mount -t nfs -s -v -o nosuid,nodev,intr sss2:/ /tmp/zzz
mount.nfs: timeout set for Sun Jun 20 13:06:09 2010
mount.nfs: trying text-based options 'intr,sloppy,vers=4,addr=192.168.168.3,clientaddr=192.168.168.8'
mount.nfs: mount(2): Connection refused
mount.nfs: trying text-based options 'intr,sloppy,vers=4,addr=192.168.168.3,clientaddr=192.168.168.8'
mount.nfs: mount(2): Connection refused
... (repeat till timeout) ...

The vers=4 bit looks like a smoking gun :-( ...

Version-Release number of selected component (if applicable):
nfs-utils-1.2.2-2.fc13.x86_64
nfs-utils-lib-1.1.5-1.fc13.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Try to mount remote NFS volume from NFSv3, UDP-only server.

Actual results:
Times out.

Expected results:
Success.

Additional info:
See bug #528776 for details of the configuration involved here.
Comment 1 Steve Dickson 2010-06-22 07:02:08 EDT
what is the output of 
    showmount -e sss2
and
    rpcinfo -p sss2:
Comment 2 Tom Lane 2010-06-23 00:33:50 EDT
$ showmount -e sss2
Export list for sss2:
/     sss,rh1,rh2,hp715,g3,g42,pro
/home sss,rh1,rh2,hp715,g3,g42,pro
/opt  sss,rh1,rh2,hp715,g3,g42,pro
/tmp  sss,rh1,rh2,hp715,g3,g42,pro
/usr  sss,rh1,rh2,hp715,g3,g42,pro
/var  sss,rh1,rh2,hp715,g3,g42,pro

(btw, rh2 is the F-13 machine that's failing here)

[tgl@rh2 ~]$ rpcinfo -p sss2
   program vers proto   port  service
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp    855  status
    100024    1   tcp    857  status
    100021    1   tcp    861  nlockmgr
    100021    1   udp   1030  nlockmgr
    100021    3   tcp    865  nlockmgr
    100021    3   udp   1031  nlockmgr
    100021    4   tcp    869  nlockmgr
    100021    4   udp   1032  nlockmgr
    100020    1   udp   4045  llockmgr
    100020    1   tcp   4045  llockmgr
    100021    2   tcp    876  nlockmgr
    100099    1   udp   2155
    100068    2   udp   1036
    100068    3   udp   1036
    100068    4   udp   1036
    100068    5   udp   1036
    100083    1   tcp   1036
    351456    1   udp    847
    351456    1   tcp    849
    100005    1   udp    889  mountd
    100005    3   udp    889  mountd
    100005    1   tcp    892  mountd
    100005    3   tcp    892  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
1342177279    4   tcp   1056
1342177279    1   tcp   1056
Comment 3 Giuseppe Marco Randazzo 2010-07-06 12:49:22 EDT
Hi! edit /etc/nfsmount.conf and in


 "[ NFSMount_Global_Options ]"

set:

"Defaultvers=3"

For me it works.
Comment 4 Tom Lane 2010-07-08 12:19:42 EDT
Confirm the workaround in comment #3 gets things going again for me.
Comment 5 Steve Dickson 2010-10-14 11:11:32 EDT
My apologizes for disappear on this... 

comment #3 is the workaround... but I would like to figure 
out the problem... 

I see your server is only advertising v2 and v3.
What OS is your server running?

Also could you post a bzip2 binary network trace
something similar to;
    yum install wireshark
    tshark -w /tmp/data.pcap host <server>
    bzip2 /tmp/data.pcap

tia....
Comment 6 Tom Lane 2010-10-14 11:21:44 EDT
It's  HPUX 10.20, probably fifteen years old at this point.  I'm not sure whether anybody still cares about compatibility with that --- I'm perfectly willing to use the Defaultvers workaround.
Comment 7 Steve Dickson 2010-10-14 11:29:06 EDT
Understood... but if we figure out why mounts are not dialling
back from v4 to v3 automatically I could fix it in upcoming 
release. Meaning things would just work out of the box...

Unfortunately I don't have access to a HPUX, so if possible,
could you please post a network trace as described in Comment 5

It would be much appreciated!
Comment 9 Tom Lane 2010-10-14 17:46:12 EDT
OK, there you go.  This is a tshark trace of the following interaction:

[tgl@rh3 ~]$ ls /net/sss2
home/  opt/  tmp/  usr/  var/
[tgl@rh3 ~]$ ls /net/sss2/home
ls: cannot open directory /net/sss2/home: No such file or directory
[tgl@rh3 ~]$ 

Each command sat for a minute or two before responding.  Note that what the first command is reporting is the names of exported volumes on the server, but not any of the loose files that are in the server's root directory.  Other than the delay, the symptoms look very very much like bug #528776, which you might want to consult for additional details about my setup and the expected results from these commands.  You commented there that the critical point was lack of TCP support in this server, not so much the NFS protocol version.

Again, setting Defaultvers=3 in /etc/nfsmount.conf solves the problem immediately.
Comment 10 Steve Dickson 2010-10-14 18:47:05 EDT
Thank you for taking the time... I'll need to digest this but I 
do appreciate you making the effort!
Comment 11 Giuseppe Marco Randazzo 2010-10-15 04:09:35 EDT
alternatively if you do not set in nfsmount.conf to Defaultvers=3, you can edit /etc/fstab and specify the version of nfs  like this line:



1.2.3.4:/home/pippa /mnt/remote_pippa    nfs     defaults,nfsvers=3        0 0


so you could mount various nfs with various version \o/ ;)

END
Comment 12 Steve Dickson 2010-10-15 09:58:05 EDT
When doing the a verbose mount (i.e. mount -v ) without specifying
the v3 do the messages displayed contain:

mount.nfs: mount(2): Connection refused ?
Comment 13 Tom Lane 2010-10-15 10:41:29 EDT
Hm, sorry, I don't usually do any explicit mounts in this setup.  What command do you want me to try, exactly?
Comment 14 Steve Dickson 2010-10-15 11:55:44 EDT
Sorry... I got my answer from looking at your opening  description... 
I see the problem...  

The f13 client now, by default, first tries 'NFS v4 over TCP' when
initiating the mount to the server. In the past 'NFS v3 over TCP'
was first tried.

The idea is for the server to return an "NO SUPPORT VERSION" error
causing the client to dial back to 'NFS v3 over TCP'. If that combination
does not work, client again dial back to 'NFS v3 over UDP'. This
type of negotiation happens until 'NFS v2 over UDP' fails. Then
the mount will fail.

In your case your server only supports 'NFS v3 over UDP' and 'NFS v2
over UDP'. So in your case, what should happen is the 'NFS v4 over TCP'
mount will fail as well as the 'NFS v3 over TCP' mount attempt. 
The the 'NFS v3 over UDP' attempt should succeed... 

Here is the problem, when the f13 sends the 'NFS v4 over TCP' your
server is failing the mount with "Connection refused". Unfortunately 
"Connection refused" can have multiple meanings. One, it can mean
there is not a TCP listener (which is true in this case) but it 
also can mean the server is down and it could be on the way up...
So we must keep trying (using the same 'NFS v4 over TCP' combo)
assuming the server is on the way up. 

Unfortunately I don't have to answer for this problem.. Actually I
think this is a long standing problem because if the server only
supported 'NFS v2 over UDP' (which is very uncommon,) the same 
scenario would occur when the legacy mount send the 'NFS v3 over TCP'

My suggestion to use the '[ Server "Server_Name" ]' section
in the /etc/nfsmount.conf to cause all mounts to that server
to default NFS V3. See nfsmount.conf(5) for details.

I do thank you for your time... it was much appreciated.
Comment 15 Tom Lane 2010-10-15 12:09:37 EDT
Hmm ... so I guess the remaining question is what about that behavior changed in F-13?  Because it used to work fine without any configuration hacking.
Comment 16 Steve Dickson 2010-10-15 13:24:09 EDT
> Hmm ... so I guess the remaining question is what about that behavior changed
> in F-13?  Because it used to work fine without any configuration hacking.
Good point... 

It was decided with v4 mounts not to do any pre-checking with the 
remote portmapper see if the server supports that version 
and protocol. The reason has to do with firewalls. 

With v4 only the 2049 port has to be open for the mount to succeed
since the mounting protocol is built into the v4. Unlike
legacy NFS versions were the mount was separate protocol, 
needing a separate daemon (rpc.mountd) listening on dynamic 
ports... Very firewall unfriendly... Especially when it 
means you also have to open up the portmapper port so
the client get the port of rpc.mountd.

But I do agree, having to to do "configuration hacking" is a 
pain... 

Question, why doesn't your server support NFS over TCP? 
TCP support has been around for many years and using TCP as 
the transport is by far superior that UDP in a number of ways... 

Actually that's another workaround... turn TCP support
on the server and the  "configuration hacking" will
not be needed.
Comment 17 Tom Lane 2010-10-15 13:47:03 EDT
(In reply to comment #16)
> It was decided with v4 mounts not to do any pre-checking with the 
> remote portmapper see if the server supports that version 
> and protocol. The reason has to do with firewalls. 

I see.  That's a pretty fair reason.

> Question, why doesn't your server support NFS over TCP? 

AFAICT it's just too old.  There's no indication in the nfsd docs that it has any ability to do TCP.  Sooner or later I'll get around to replacing it.

Anyway, thanks for your time.  Since there's a defensible reason for changing this behavior, it's clearly NOTABUG.