Bug 669112

Summary:

autofs does not seem to work with nfsv3 either

Product:

[Fedora] Fedora

Reporter:

Bill C. Riemers <briemers>

Component:

autofs

Assignee:

Ian Kent <ikent>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

low

Version:

CC:

amcnabb, ikent, jmoyer, steved

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

659887

Environment:

Last Closed:

2012-03-13 18:41:55 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

659887

Bug Blocks:

Attachments:

Description	Flags
tshark -w /tmp/bz669112.pcap host 172.31.253.11	none

Description Bill C. Riemers 2011-01-12 17:28:52 UTC

I have a net work server that with a set of exports.   In previous versions of Fedora I could simply type ls /net/172.31.253.11/share and see a directory list.
In the current version, this command hangs.  If in another window I type it again, I simply get an empty directory.

After a bit of debugging it looks like the command being used is:

mount -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/xxxx

where xxxx seems to be an auto-generated path name.

If I try the same command interactively, it hangs.  e.g.:

# mkdir /tmp/foo
# mount -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo

I think my server is using nfs version 3.  So I try again explicitly specifying the nfs version.

# mkdir /tmp/bar
# mount -t nfs -s -o nosuid,nodev,vers=3,intr 172.31.253.11:/share /tmp/bar

This succeeds.  So it looks like currently autofs does not work by default with nfs version 3 mounts.  According to Bug #659887 it also does not work with version 4 mounts.

The workaround was to explicitly specify vers=3 in the auto master.  However, there really needs to be a way of auto detecting which version to use for each mount.   I am particularly concerned that right now autofs is not working with either version.


+++ This bug was initially created as a clone of Bug #659887 +++

I have a machine (prodigy) that exports /local with fsid=0.  On another machine with autofs, if I ls /net/prodigy, I see a single directory called local (NFSv3 style) rather than the contents of local (NFSv4 style).  If I set "/net    -hosts -fstype=nfs4" in /etc/auto.master, then an ls of /net/prodigy still shows a directory called local, but an ls of /net/prodigy/local gives the error "ls: cannot open directory /net/prodigy/local: No such file or directory".  I would like autofs to automatically mount prodigy:/ on /net/prodigy, but I haven't been able to succeed at getting this to work.

If I am doing something wrong, please let me know, but I had assumed that autofs would use nfsv4 by default starting in Fedora 14.  Am I missing something?  Thanks.

--- Additional comment from ikent on 2010-12-05 19:26:41 EST ---

(In reply to comment #0)
> I have a machine (prodigy) that exports /local with fsid=0.  On another machine
> with autofs, if I ls /net/prodigy, I see a single directory called local (NFSv3
> style) rather than the contents of local (NFSv4 style).  If I set "/net   
> -hosts -fstype=nfs4" in /etc/auto.master, then an ls of /net/prodigy still
> shows a directory called local, but an ls of /net/prodigy/local gives the error
> "ls: cannot open directory /net/prodigy/local: No such file or directory".  I
> would like autofs to automatically mount prodigy:/ on /net/prodigy, but I
> haven't been able to succeed at getting this to work.
> 
> If I am doing something wrong, please let me know, but I had assumed that
> autofs would use nfsv4 by default starting in Fedora 14.  Am I missing
> something?  Thanks.

The autofs internal hosts map doesn't know that you want it
to mount prodigy:/ instead of prodigy:/local. There is no
way for autofs to discover if a global root is being used
or not from the export information provided by mountd on
the other machine. So the server needs to honour mount requests
both for the global root path and the export path.

What distribution is in use on the server?

--- Additional comment from fedora-admin-xmlrpc on 2010-12-06 10:13:16 EST ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from amcnabb on 2010-12-06 14:12:53 EST ---

I've tried with both Fedora 12 and Fedora 14 as servers.

If autofs is mounting with NFSv4, then prodigy:/local is not a valid path in the first place (this is only a valid mount with NFSv3).  So it must be using NFSv3 for discovery, right?

--- Additional comment from ikent on 2010-12-06 22:13:43 EST ---

(In reply to comment #3)
> I've tried with both Fedora 12 and Fedora 14 as servers.
> 
> If autofs is mounting with NFSv4, then prodigy:/local is not a valid path in
> the first place (this is only a valid mount with NFSv3).  So it must be using
> NFSv3 for discovery, right?

I believe that used to be the case but I thought the restriction
of mounting the NFS global root as / from the server had been
removed in around F14. It's true that this would introduce
conflicts between server versions but it had to be done. Other
industry NFS servers allow both forms I believe.

Can you post an autofs debug log.
Uncomment and set LOGGING="debug" in /etc/sysconfig/autofs and
ensure the syslog facility "daemon" is being captured by syslog.
You can do this by adding something like 
"daemon.*    /var/log/debug.log"
to /etc/rsyslog.conf, touching the log file if it doesn't exist
and restarting rsyslog.

--- Additional comment from amcnabb on 2011-01-11 17:56:35 EST ---

Created attachment 472916 [details]
debug.log

I'm attaching debug.log from when I went to /net/0potato and /net/0potato/local.

--- Additional comment from amcnabb on 2011-01-11 18:10:24 EST ---

[sorry for not looking into this over the holidays]

It looks like it works as long as it uses NFSv3 (which seems to be the case in the debug.log I just attached).  Oddly, in /etc/sysconfig/autofs, it has MOUNT_NFS_DEFAULT_PROTOCOL=4, so I'm surprised that it's using NFSv3.

For NFSv4, I think the correct behavior would be to simply mount 0potato:/ as /net/0potato rather than looking at the export list.  I've found a few web pages that seem to indicated that showmount isn't used with NFSv4 (e.g., "For nfsv4, the old showmount command will no more be used, because '/' can always be mounted.").  Is there any easy way to configure this with autofs?  It seems that it shouldn't be too hard, but I admit I'm not very experienced with autofs.  In the meantime, I'll keep on working on it and see what I can figure out.

--- Additional comment from amcnabb on 2011-01-11 19:37:50 EST ---

Okay, I think I have something that works for nfs4, except for the lack of backwards compatibility with nfs3.  I changed the /net line in /etc/auto.master to:

/net    /etc/auto.nfs4

and then I set /etc/auto.nfs4 to the following:

*       -fstype=nfs4,rw,nosuid,soft     &:/

Since NFSv4 has a single unified root instead of multiple unconnected exports, doing this single mount per host seems to give the correct behavior (for NFSv4).  Ideally, there would be some way to automatically switch between the NFSv3 and NFSv4 behaviors.

Comment 1 Jeff Moyer 2011-01-12 17:46:20 UTC

Steve,

Should mount be gracefully falling back to v3 in this case?

Comment 2 Steve Dickson 2011-01-14 22:23:00 UTC

No, since the -fstype=nfs4 is specifying to use v4 and only v4

Comment 3 Bill C. Riemers 2011-01-14 22:41:16 UTC

(In reply to comment #2)
> No, since the -fstype=nfs4 is specifying to use v4 and only v4

I'm not passing an -fstype flag.  The -fstype flag was used by amcnabb as a workaround for Bug 659887.

Comment 4 Steve Dickson 2011-01-15 13:57:29 UTC

(In reply to comment #0)
> I have a net work server that with a set of exports.   In previous versions of
> Fedora I could simply type ls /net/172.31.253.11/share and see a directory
> list.
> In the current version, this command hangs.  If in another window I type it
> again, I simply get an empty directory.
> 
> After a bit of debugging it looks like the command being used is:
> 
> mount -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/xxxx
> 
> where xxxx seems to be an auto-generated path name.
> 
> If I try the same command interactively, it hangs.  e.g.:
> 
> # mkdir /tmp/foo
> # mount -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo
> 
What is the output of mount -v -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo ?

Comment 5 Steve Dickson 2011-01-15 14:04:43 UTC

(In reply to comment #1)
> Steve,
> 
> Should mount be gracefully falling back to v3 in this case?
Yes it should but it all depends on how the server fails the 
mount... There is a list of failures that will trigger the
fallback... That list is EPROTONOSUPPORT, ENOENT and EPERM.

If the server does not fail the mount using one of those, the 
failure will be interpreted as the server is temporary not 
available will keep trying the mount (aka hang)

Comment 6 Bill C. Riemers 2011-01-15 16:46:03 UTC

[root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo
mount.nfs: timeout set for Sat Jan 15 11:43:31 2011
mount.nfs: trying text-based options 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100'

After that it just hangs forever.   There is no timeout at 11:43:31.

Comment 7 Steve Dickson 2011-01-15 18:29:22 UTC

(In reply to comment #6)
> [root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr
> 172.31.253.11:/share /tmp/foo
> mount.nfs: timeout set for Sat Jan 15 11:43:31 2011
> mount.nfs: trying text-based options
> 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100'
> 
> After that it just hangs forever.   There is no timeout at 11:43:31.
Ok... 
In another terminal please run the following commands:
    sudo yum install wireshark
    sudo tshark -w /tmp/bz669112.pcap host 172.31.253.11
Now do the mount command and let it sit for few seconds (~30)

    Ctrl-C the tshark trace
    bzip2 /tmp/bz669112.pcap

Then please post the /tmp/bz669112.pcap.bz2 file. This will show what is
happening over the network.

Also what OS is your server running?

Comment 8 Bill C. Riemers 2011-01-15 21:43:13 UTC

Created attachment 473672 [details]
tshark -w /tmp/bz669112.pcap host 172.31.253.11

[root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo
mount.nfs: timeout set for Sat Jan 15 16:33:33 2011
mount.nfs: trying text-based options 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100'

Looks like tshark only captured 24 bytes.  I hope it is useful.   172.31.253.100 is an NSLU2 running debian lenny (Linux).

[docbill@docbill-think ~]$ ssh 172.31.253.11 uname -a
Linux cisco1 2.6.26-1-ixp4xx #1 Tue Jan 13 13:23:31 GMT 2009 armv5tel GNU/Linux

The nfs packages used on the debian system are:

docbill@cisco1:~$ dpkg -l |grep nfs
ii  libnfsidmap2                         0.20-1                      An nfs idmapping library
ii  nfs-common                           1:1.1.2-6lenny1             NFS support files common to client and serve
ii  nfs-kernel-server                    1:1.1.2-6lenny1             support for NFS kernel server

Comment 9 Steve Dickson 2011-01-24 19:39:30 UTC

(In reply to comment #8)
> Created attachment 473672 [details]
> tshark -w /tmp/bz669112.pcap host 172.31.253.11
> 
> [root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr
> 172.31.253.11:/share /tmp/foo
> mount.nfs: timeout set for Sat Jan 15 16:33:33 2011
> mount.nfs: trying text-based options
> 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100'
> 
> Looks like tshark only captured 24 bytes.  I hope it is useful.  
> 172.31.253.100 is an NSLU2 running debian lenny (Linux).
Hmm... I don't see any network traffic in the trace... 
Just to see if anything is going over the wire, pleawe run
the tshark command with out the '-w /tmp/bz669112.pcap' part.

Comment 10 Bill C. Riemers 2011-01-24 19:53:58 UTC

OK.  This is what I see without -w option.  The same host is also a DNS sever, so I see a few of those calls mixed in.

[docbill@docbill-think ~]$ sudo tshark host 172.31.253.11
[sudo] password for docbill: 
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
  0.000000 172.31.253.11 -> 172.31.255.255 BROWSER Local Master Announcement CISCO1, Workstation, Server, Print Queue Server, Xenix Server, NT Workstation, NT Server, Master Browser, DFS server
  0.000040 172.31.253.11 -> 172.31.255.255 BROWSER Domain/Workgroup Announcement WORKGROUP, NT Workstation, Domain Enum
 44.740174 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSV=27650844 TSER=0 WS=7
 44.740885 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=70844355 TSER=27650844 WS=1
 44.740947 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27650845 TSER=70844355
 44.741086 172.31.253.98 -> 172.31.253.11 NFS V4 NULL Call
 44.741616 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=1 Ack=45 Win=5792 Len=0 TSV=70844355 TSER=27650845
 44.742145 172.31.253.11 -> 172.31.253.98 NFS V4 NULL Reply (Call In 6)
 44.742188 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=45 Ack=29 Win=5888 Len=0 TSV=27650846 TSER=70844355
 44.825164 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR
 44.859271 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=29 Ack=189 Win=6864 Len=0 TSV=70844367 TSER=27650929
104.871761 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [FIN, ACK] Seq=189 Ack=29 Win=5888 Len=0 TSV=27710976 TSER=70844367
104.909001 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=29 Ack=190 Win=6864 Len=0 TSV=70850372 TSER=27710976
109.909485 Cisco-Li_6b:e7:4a -> WistronI_05:85:79 ARP Who has 172.31.253.98?  Tell 172.31.253.11
109.909506 WistronI_05:85:79 -> Cisco-Li_6b:e7:4a ARP 172.31.253.98 is at f0:de:f1:05:85:79
119.911751 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [RST, ACK] Seq=190 Ack=29 Win=5888 Len=0 TSV=27726016 TSER=70850372
122.919800 172.31.253.98 -> 172.31.253.11 TCP [TCP Port numbers reused] silc > nfs [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSV=27729024 TSER=0 WS=7
122.920426 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=70852173 TSER=27729024 WS=1
122.920475 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27729024 TSER=70852173
122.920538 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [FIN, ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27729024 TSER=70852173
122.921409 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [FIN, ACK] Seq=1 Ack=2 Win=5792 Len=0 TSV=70852173 TSER=27729024
122.921445 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=2 Ack=2 Win=5888 Len=0 TSV=27729025 TSER=70852173
122.921528 172.31.253.98 -> 172.31.253.11 TCP [TCP Port numbers reused] silc > nfs [SYN] Seq=0 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=27729025 TSER=70852173 WS=7
122.922010 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=70852173 TSER=27729025 WS=1
122.922044 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27729026 TSER=70852173
122.922111 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR
122.922636 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=1 Ack=145 Win=6864 Len=0 TSV=70852173 TSER=27729026
127.711396 172.31.253.98 -> 172.31.253.11 DNS Standard query AAAA 0.156.channel.facebook.com
127.750772 172.31.253.11 -> 172.31.253.98 DNS Standard query response
128.793331 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [FIN, ACK] Seq=145 Ack=1 Win=5888 Len=0 TSV=27734897 TSER=70852173
128.829213 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=1 Ack=146 Win=6864 Len=0 TSV=70852764 TSER=27734897

Comment 11 Steve Dickson 2011-01-24 21:18:32 UTC

Hmm... 

Here are the four interesting packets from Comment 10

 44.741086 172.31.253.98 -> 172.31.253.11 NFS V4 NULL Call
 44.742145 172.31.253.11 -> 172.31.253.98 NFS V4 NULL Reply (Call In 6)
 44.825164 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR
122.922111 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR

It appears the reason for the hang is the server simply stops responding...
The server seems to support v4 since it replies to the NULL ping but
never replies to the next two CALL (which are trying to do the mount). 

Maybe I missed it in the above comments but what OS is the server
running and also what does the output of 'rpcinfo -p 172.31.253.11'
look like?

Comment 12 Bill C. Riemers 2011-01-24 21:36:45 UTC

Debian (Lenny) Linux

[docbill@docbill-think .git]$ rpcinfo -p 172.31.253.11
   program vers proto   port  service
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  33846  status
    100024    1   tcp  47637  status
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100021    1   udp  36760  nlockmgr
    100021    3   udp  36760  nlockmgr
    100021    4   udp  36760  nlockmgr
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100021    1   tcp  37117  nlockmgr
    100021    3   tcp  37117  nlockmgr
    100021    4   tcp  37117  nlockmgr
    100005    1   udp   4002  mountd
    100005    1   tcp   4002  mountd
    100005    2   udp   4002  mountd
    100005    2   tcp   4002  mountd
    100005    3   udp   4002  mountd
    100005    3   tcp   4002  mountd

Comment 13 Steve Dickson 2011-01-25 12:19:10 UTC

hmm... everything looks normal... and the server does indeed support
NFSv4 (ala 100003    4   tcp   2049  nfs)... 

Is there a firewall on either the server or client? If so, does turning off the 
firewall(s) help?

Comment 14 Bill C. Riemers 2011-01-25 13:48:35 UTC

(In reply to comment #13)
> hmm... everything looks normal... and the server does indeed support
> NFSv4 (ala 100003    4   tcp   2049  nfs)... 
> 
> Is there a firewall on either the server or client? If so, does turning off the 
> firewall(s) help?

The very first thing I tried.

I was assuming the reason autofs was failing is it was trying NFSv4 and not falling back to NFSv3 when my NSLU2 was reporting NFSv4 as unsupported.  But if NFSv4 is "supported" on Lenny, then this problem has nothing to do with NFSv3.

That now points to this being a problem on my server not my client.  So you can probably close this bug as NOTABUG.

Based on the idea this is a problem on my server, I took a look at the showmounts on the server.  Two of the directories listed:

cisco1:~# showmount -e
Export list for cisco1:
/diskC         172.31.0.0/16,192.168.2.0/24
/diskB         172.31.0.0/16,192.168.2.0/24
/diskA         172.31.0.0/16,192.168.2.0/24
/share         172.31.0.0/16,192.168.2.0/24
/share/.disk/E 172.31.0.0/16,192.168.2.0/24
/share/.disk/D 172.31.0.0/16,192.168.2.0/24
/share/.disk/C 172.31.0.0/16,192.168.2.0/24
/share/.disk/B 172.31.0.0/16,192.168.2.0/24
/share/.disk/A 172.31.0.0/16,192.168.2.0/24


I noted everything is a separate mount except /share/.disk/E and /share/.disk/D which are empty directories, since I have retired those disks.

After removing those two directories from my exports file, I found that on the client I still could not do a nfsv4 mount, but the mount does correctly downgrade to nfsv3 when there is a failure.

[root@docbill-think ~]# mount -v -v -t nfs -s -o nosuid,nolock,nodev,intr 172.31.253.11:/share /tmp/foo
mount.nfs: timeout set for Tue Jan 25 08:45:41 2011
mount.nfs: trying text-based options 'nolock,intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.98'
mount.nfs: mount(2): No such file or directory
mount.nfs: trying text-based options 'nolock,intr,sloppy,addr=172.31.253.11'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 172.31.253.11 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 172.31.253.11 prog 100005 vers 3 prot UDP port 4002
172.31.253.11:/share on /tmp/foo type nfs (rw,nosuid,nodev,nolock,intr)


I don't really follow why I get a "No such file or directory" error for NFSv4, but since it succeeds on the NFSv3 fallback, that is only a minor concern.

Comment 15 Bill C. Riemers 2011-01-25 14:23:44 UTC

I see.  It looks like one of the problems is the nfsv4 mount hangs if any of the child exports does not have an fsid value assigned to it.   For the disk mounts, it picks up the UUID values as the fsid.   For the empty directories, it just hangs the whole processes.  I'm not sure if the hang is caused on the server or on the client...

The second problem is it seems there is no way with nfsv4 to specify a crossmnt without also specifying fsid=0.  If crossmnt is specified without fsid=0, then there is simply a no such file or directory.  I'm not sure what in the world one would do if they want to specify multiple crossmnt directories with nfsv4...   Again I'm not certain if this problem is client side or server side.

For now, I commented out my /diskA, /diskB, and /diskC directories, and added the fsid=0 flag.  (That is problematic for Windows, which does not seem to support crossmnt, but I can probably fine tune my exports via IP addresses...)  I've added the fsid=0 flag to the /share mount.   Now:
 
mount -v -v -t nfs4 -s -o nosuid,nolock,nodev,intr 172.31.253.11:/ /tmp/foo

works.   It looks however like automount still falling back to nfsv3 instead of nfsv4, as I find the disk list as /net/172.31.253.11/share not /net/172.31.253.11/ as I would expect with the fsid=0 flag in play.

Comment 16 Steve Dickson 2011-01-25 15:35:33 UTC

(In reply to comment #14)
> (In reply to comment #13)
> > hmm... everything looks normal... and the server does indeed support
> > NFSv4 (ala 100003    4   tcp   2049  nfs)... 
> > 
> > Is there a firewall on either the server or client? If so, does turning off the 
> > firewall(s) help?
> 
> The very first thing I tried.
> 
> I was assuming the reason autofs was failing is it was trying NFSv4 and not
> falling back to NFSv3 when my NSLU2 was reporting NFSv4 as unsupported.  But if
> NFSv4 is "supported" on Lenny, then this problem has nothing to do with NFSv3.
What kernel version is the server running? 

> 
> That now points to this being a problem on my server not my client.  So you can
> probably close this bug as NOTABUG.
> 
> Based on the idea this is a problem on my server, I took a look at the
> showmounts on the server.  Two of the directories listed:
> 
> cisco1:~# showmount -e
> Export list for cisco1:
> /diskC         172.31.0.0/16,192.168.2.0/24
> /diskB         172.31.0.0/16,192.168.2.0/24
> /diskA         172.31.0.0/16,192.168.2.0/24
> /share         172.31.0.0/16,192.168.2.0/24
> /share/.disk/E 172.31.0.0/16,192.168.2.0/24
> /share/.disk/D 172.31.0.0/16,192.168.2.0/24
> /share/.disk/C 172.31.0.0/16,192.168.2.0/24
> /share/.disk/B 172.31.0.0/16,192.168.2.0/24
> /share/.disk/A 172.31.0.0/16,192.168.2.0/24
> 
> 
> I noted everything is a separate mount except /share/.disk/E and /share/.disk/D
> which are empty directories, since I have retired those disks.
> 
> After removing those two directories from my exports file, I found that on the
> client I still could not do a nfsv4 mount, but the mount does correctly
> downgrade to nfsv3 when there is a failure.
How bizarre...... Was there anything being logged to /var/log/messages?

> 
> [root@docbill-think ~]# mount -v -v -t nfs -s -o nosuid,nolock,nodev,intr
> 172.31.253.11:/share /tmp/foo
> mount.nfs: timeout set for Tue Jan 25 08:45:41 2011
> mount.nfs: trying text-based options
> 'nolock,intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.98'
> mount.nfs: mount(2): No such file or directory
> mount.nfs: trying text-based options 'nolock,intr,sloppy,addr=172.31.253.11'
> mount.nfs: prog 100003, trying vers=3, prot=6
> mount.nfs: trying 172.31.253.11 prog 100003 vers 3 prot TCP port 2049
> mount.nfs: prog 100005, trying vers=3, prot=17
> mount.nfs: trying 172.31.253.11 prog 100005 vers 3 prot UDP port 4002
> 172.31.253.11:/share on /tmp/foo type nfs (rw,nosuid,nodev,nolock,intr)
> 
> 
> I don't really follow why I get a "No such file or directory" error for NFSv4,
> but since it succeeds on the NFSv3 fallback, that is only a minor concern.
Its because there is no pseudo root defined (via fsid=0)....

Comment 17 Steve Dickson 2011-01-25 15:52:35 UTC

(In reply to comment #15)
> I see.  It looks like one of the problems is the nfsv4 mount hangs if any of
> the child exports does not have an fsid value assigned to it.   For the disk
> mounts, it picks up the UUID values as the fsid.   For the empty directories,
> it just hangs the whole processes.  I'm not sure if the hang is caused on the
> server or on the client...
From what the trace said, it looked like the server was simply dropping
the request so it was mostly likely a server issued.

> 
> The second problem is it seems there is no way with nfsv4 to specify a crossmnt
> without also specifying fsid=0.  If crossmnt is specified without fsid=0, then
> there is simply a no such file or directory.  I'm not sure what in the world
> one would do if they want to specify multiple crossmnt directories with
> nfsv4...   Again I'm not certain if this problem is client side or server side.

> 
> For now, I commented out my /diskA, /diskB, and /diskC directories, and added
> the fsid=0 flag.  (That is problematic for Windows, which does not seem to
> support crossmnt, but I can probably fine tune my exports via IP addresses...) 
Try setting  the 'crossmnt,nohide' exports flags to help with jumping
to different file systems.

> I've added the fsid=0 flag to the /share mount.   Now:
> 
> mount -v -v -t nfs4 -s -o nosuid,nolock,nodev,intr 172.31.253.11:/ /tmp/foo
> 
> works.   It looks however like automount still falling back to nfsv3 instead of
> nfsv4, as I find the disk list as /net/172.31.253.11/share not
> /net/172.31.253.11/ as I would expect with the fsid=0 flag in play.
In V4, there is this notion of a pseudo root which means a server
can define what the root file system is the client can see. For example
If the server has this export:
 /share  *(fsid=0,ro)

and the client mounted the root file system
   mount server:/ /mnt 

the client would only see the directories under '/server' 
For a v4 mount to work there has to be a pseudo root. With most
NFS server implementation, the  pseudo root just became '/'.
With earlier Linux implementations this was not the case. One
actually had to define a  pseudo root with the fsid=0 export
flag. The most common practice was to have this export:
   / *(ro,fsid=0)

All this changed in the 2.6.32 kernel. The notion of a default pseudo root
was added so when a pseudo root was not defined, '/' would become the 
pseudo root which cause all the current v3 exports magically work under v4.

Comment 18 Bill C. Riemers 2011-12-12 13:26:35 UTC

autofs is working great under Fedora 16.