Bug 669112
Summary: | autofs does not seem to work with nfsv3 either | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bill C. Riemers <briemers> | ||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 14 | CC: | amcnabb, ikent, jmoyer, steved | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 659887 | Environment: | |||||
Last Closed: | 2012-03-13 18:41:55 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 659887 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Bill C. Riemers
2011-01-12 17:28:52 UTC
Steve, Should mount be gracefully falling back to v3 in this case? No, since the -fstype=nfs4 is specifying to use v4 and only v4 (In reply to comment #2) > No, since the -fstype=nfs4 is specifying to use v4 and only v4 I'm not passing an -fstype flag. The -fstype flag was used by amcnabb as a workaround for Bug 659887. (In reply to comment #0) > I have a net work server that with a set of exports. In previous versions of > Fedora I could simply type ls /net/172.31.253.11/share and see a directory > list. > In the current version, this command hangs. If in another window I type it > again, I simply get an empty directory. > > After a bit of debugging it looks like the command being used is: > > mount -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/xxxx > > where xxxx seems to be an auto-generated path name. > > If I try the same command interactively, it hangs. e.g.: > > # mkdir /tmp/foo > # mount -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo > What is the output of mount -v -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo ? (In reply to comment #1) > Steve, > > Should mount be gracefully falling back to v3 in this case? Yes it should but it all depends on how the server fails the mount... There is a list of failures that will trigger the fallback... That list is EPROTONOSUPPORT, ENOENT and EPERM. If the server does not fail the mount using one of those, the failure will be interpreted as the server is temporary not available will keep trying the mount (aka hang) [root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo mount.nfs: timeout set for Sat Jan 15 11:43:31 2011 mount.nfs: trying text-based options 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100' After that it just hangs forever. There is no timeout at 11:43:31. (In reply to comment #6) > [root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr > 172.31.253.11:/share /tmp/foo > mount.nfs: timeout set for Sat Jan 15 11:43:31 2011 > mount.nfs: trying text-based options > 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100' > > After that it just hangs forever. There is no timeout at 11:43:31. Ok... In another terminal please run the following commands: sudo yum install wireshark sudo tshark -w /tmp/bz669112.pcap host 172.31.253.11 Now do the mount command and let it sit for few seconds (~30) Ctrl-C the tshark trace bzip2 /tmp/bz669112.pcap Then please post the /tmp/bz669112.pcap.bz2 file. This will show what is happening over the network. Also what OS is your server running? Created attachment 473672 [details] tshark -w /tmp/bz669112.pcap host 172.31.253.11 [root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr 172.31.253.11:/share /tmp/foo mount.nfs: timeout set for Sat Jan 15 16:33:33 2011 mount.nfs: trying text-based options 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100' Looks like tshark only captured 24 bytes. I hope it is useful. 172.31.253.100 is an NSLU2 running debian lenny (Linux). [docbill@docbill-think ~]$ ssh 172.31.253.11 uname -a Linux cisco1 2.6.26-1-ixp4xx #1 Tue Jan 13 13:23:31 GMT 2009 armv5tel GNU/Linux The nfs packages used on the debian system are: docbill@cisco1:~$ dpkg -l |grep nfs ii libnfsidmap2 0.20-1 An nfs idmapping library ii nfs-common 1:1.1.2-6lenny1 NFS support files common to client and serve ii nfs-kernel-server 1:1.1.2-6lenny1 support for NFS kernel server (In reply to comment #8) > Created attachment 473672 [details] > tshark -w /tmp/bz669112.pcap host 172.31.253.11 > > [root@docbill-think ~]# mount -v -t nfs -s -o nosuid,nodev,intr > 172.31.253.11:/share /tmp/foo > mount.nfs: timeout set for Sat Jan 15 16:33:33 2011 > mount.nfs: trying text-based options > 'intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.100' > > Looks like tshark only captured 24 bytes. I hope it is useful. > 172.31.253.100 is an NSLU2 running debian lenny (Linux). Hmm... I don't see any network traffic in the trace... Just to see if anything is going over the wire, pleawe run the tshark command with out the '-w /tmp/bz669112.pcap' part. OK. This is what I see without -w option. The same host is also a DNS sever, so I see a few of those calls mixed in. [docbill@docbill-think ~]$ sudo tshark host 172.31.253.11 [sudo] password for docbill: Running as user "root" and group "root". This could be dangerous. Capturing on eth0 0.000000 172.31.253.11 -> 172.31.255.255 BROWSER Local Master Announcement CISCO1, Workstation, Server, Print Queue Server, Xenix Server, NT Workstation, NT Server, Master Browser, DFS server 0.000040 172.31.253.11 -> 172.31.255.255 BROWSER Domain/Workgroup Announcement WORKGROUP, NT Workstation, Domain Enum 44.740174 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSV=27650844 TSER=0 WS=7 44.740885 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=70844355 TSER=27650844 WS=1 44.740947 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27650845 TSER=70844355 44.741086 172.31.253.98 -> 172.31.253.11 NFS V4 NULL Call 44.741616 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=1 Ack=45 Win=5792 Len=0 TSV=70844355 TSER=27650845 44.742145 172.31.253.11 -> 172.31.253.98 NFS V4 NULL Reply (Call In 6) 44.742188 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=45 Ack=29 Win=5888 Len=0 TSV=27650846 TSER=70844355 44.825164 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR 44.859271 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=29 Ack=189 Win=6864 Len=0 TSV=70844367 TSER=27650929 104.871761 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [FIN, ACK] Seq=189 Ack=29 Win=5888 Len=0 TSV=27710976 TSER=70844367 104.909001 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=29 Ack=190 Win=6864 Len=0 TSV=70850372 TSER=27710976 109.909485 Cisco-Li_6b:e7:4a -> WistronI_05:85:79 ARP Who has 172.31.253.98? Tell 172.31.253.11 109.909506 WistronI_05:85:79 -> Cisco-Li_6b:e7:4a ARP 172.31.253.98 is at f0:de:f1:05:85:79 119.911751 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [RST, ACK] Seq=190 Ack=29 Win=5888 Len=0 TSV=27726016 TSER=70850372 122.919800 172.31.253.98 -> 172.31.253.11 TCP [TCP Port numbers reused] silc > nfs [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSV=27729024 TSER=0 WS=7 122.920426 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=70852173 TSER=27729024 WS=1 122.920475 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27729024 TSER=70852173 122.920538 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [FIN, ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27729024 TSER=70852173 122.921409 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [FIN, ACK] Seq=1 Ack=2 Win=5792 Len=0 TSV=70852173 TSER=27729024 122.921445 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=2 Ack=2 Win=5888 Len=0 TSV=27729025 TSER=70852173 122.921528 172.31.253.98 -> 172.31.253.11 TCP [TCP Port numbers reused] silc > nfs [SYN] Seq=0 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=27729025 TSER=70852173 WS=7 122.922010 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSV=70852173 TSER=27729025 WS=1 122.922044 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=27729026 TSER=70852173 122.922111 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR 122.922636 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=1 Ack=145 Win=6864 Len=0 TSV=70852173 TSER=27729026 127.711396 172.31.253.98 -> 172.31.253.11 DNS Standard query AAAA 0.156.channel.facebook.com 127.750772 172.31.253.11 -> 172.31.253.98 DNS Standard query response 128.793331 172.31.253.98 -> 172.31.253.11 TCP silc > nfs [FIN, ACK] Seq=145 Ack=1 Win=5888 Len=0 TSV=27734897 TSER=70852173 128.829213 172.31.253.11 -> 172.31.253.98 TCP nfs > silc [ACK] Seq=1 Ack=146 Win=6864 Len=0 TSV=70852764 TSER=27734897 Hmm... Here are the four interesting packets from Comment 10 44.741086 172.31.253.98 -> 172.31.253.11 NFS V4 NULL Call 44.742145 172.31.253.11 -> 172.31.253.98 NFS V4 NULL Reply (Call In 6) 44.825164 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR 122.922111 172.31.253.98 -> 172.31.253.11 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR It appears the reason for the hang is the server simply stops responding... The server seems to support v4 since it replies to the NULL ping but never replies to the next two CALL (which are trying to do the mount). Maybe I missed it in the above comments but what OS is the server running and also what does the output of 'rpcinfo -p 172.31.253.11' look like? Debian (Lenny) Linux [docbill@docbill-think .git]$ rpcinfo -p 172.31.253.11 program vers proto port service 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 33846 status 100024 1 tcp 47637 status 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100021 1 udp 36760 nlockmgr 100021 3 udp 36760 nlockmgr 100021 4 udp 36760 nlockmgr 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100021 1 tcp 37117 nlockmgr 100021 3 tcp 37117 nlockmgr 100021 4 tcp 37117 nlockmgr 100005 1 udp 4002 mountd 100005 1 tcp 4002 mountd 100005 2 udp 4002 mountd 100005 2 tcp 4002 mountd 100005 3 udp 4002 mountd 100005 3 tcp 4002 mountd hmm... everything looks normal... and the server does indeed support NFSv4 (ala 100003 4 tcp 2049 nfs)... Is there a firewall on either the server or client? If so, does turning off the firewall(s) help? (In reply to comment #13) > hmm... everything looks normal... and the server does indeed support > NFSv4 (ala 100003 4 tcp 2049 nfs)... > > Is there a firewall on either the server or client? If so, does turning off the > firewall(s) help? The very first thing I tried. I was assuming the reason autofs was failing is it was trying NFSv4 and not falling back to NFSv3 when my NSLU2 was reporting NFSv4 as unsupported. But if NFSv4 is "supported" on Lenny, then this problem has nothing to do with NFSv3. That now points to this being a problem on my server not my client. So you can probably close this bug as NOTABUG. Based on the idea this is a problem on my server, I took a look at the showmounts on the server. Two of the directories listed: cisco1:~# showmount -e Export list for cisco1: /diskC 172.31.0.0/16,192.168.2.0/24 /diskB 172.31.0.0/16,192.168.2.0/24 /diskA 172.31.0.0/16,192.168.2.0/24 /share 172.31.0.0/16,192.168.2.0/24 /share/.disk/E 172.31.0.0/16,192.168.2.0/24 /share/.disk/D 172.31.0.0/16,192.168.2.0/24 /share/.disk/C 172.31.0.0/16,192.168.2.0/24 /share/.disk/B 172.31.0.0/16,192.168.2.0/24 /share/.disk/A 172.31.0.0/16,192.168.2.0/24 I noted everything is a separate mount except /share/.disk/E and /share/.disk/D which are empty directories, since I have retired those disks. After removing those two directories from my exports file, I found that on the client I still could not do a nfsv4 mount, but the mount does correctly downgrade to nfsv3 when there is a failure. [root@docbill-think ~]# mount -v -v -t nfs -s -o nosuid,nolock,nodev,intr 172.31.253.11:/share /tmp/foo mount.nfs: timeout set for Tue Jan 25 08:45:41 2011 mount.nfs: trying text-based options 'nolock,intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.98' mount.nfs: mount(2): No such file or directory mount.nfs: trying text-based options 'nolock,intr,sloppy,addr=172.31.253.11' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: trying 172.31.253.11 prog 100003 vers 3 prot TCP port 2049 mount.nfs: prog 100005, trying vers=3, prot=17 mount.nfs: trying 172.31.253.11 prog 100005 vers 3 prot UDP port 4002 172.31.253.11:/share on /tmp/foo type nfs (rw,nosuid,nodev,nolock,intr) I don't really follow why I get a "No such file or directory" error for NFSv4, but since it succeeds on the NFSv3 fallback, that is only a minor concern. I see. It looks like one of the problems is the nfsv4 mount hangs if any of the child exports does not have an fsid value assigned to it. For the disk mounts, it picks up the UUID values as the fsid. For the empty directories, it just hangs the whole processes. I'm not sure if the hang is caused on the server or on the client... The second problem is it seems there is no way with nfsv4 to specify a crossmnt without also specifying fsid=0. If crossmnt is specified without fsid=0, then there is simply a no such file or directory. I'm not sure what in the world one would do if they want to specify multiple crossmnt directories with nfsv4... Again I'm not certain if this problem is client side or server side. For now, I commented out my /diskA, /diskB, and /diskC directories, and added the fsid=0 flag. (That is problematic for Windows, which does not seem to support crossmnt, but I can probably fine tune my exports via IP addresses...) I've added the fsid=0 flag to the /share mount. Now: mount -v -v -t nfs4 -s -o nosuid,nolock,nodev,intr 172.31.253.11:/ /tmp/foo works. It looks however like automount still falling back to nfsv3 instead of nfsv4, as I find the disk list as /net/172.31.253.11/share not /net/172.31.253.11/ as I would expect with the fsid=0 flag in play. (In reply to comment #14) > (In reply to comment #13) > > hmm... everything looks normal... and the server does indeed support > > NFSv4 (ala 100003 4 tcp 2049 nfs)... > > > > Is there a firewall on either the server or client? If so, does turning off the > > firewall(s) help? > > The very first thing I tried. > > I was assuming the reason autofs was failing is it was trying NFSv4 and not > falling back to NFSv3 when my NSLU2 was reporting NFSv4 as unsupported. But if > NFSv4 is "supported" on Lenny, then this problem has nothing to do with NFSv3. What kernel version is the server running? > > That now points to this being a problem on my server not my client. So you can > probably close this bug as NOTABUG. > > Based on the idea this is a problem on my server, I took a look at the > showmounts on the server. Two of the directories listed: > > cisco1:~# showmount -e > Export list for cisco1: > /diskC 172.31.0.0/16,192.168.2.0/24 > /diskB 172.31.0.0/16,192.168.2.0/24 > /diskA 172.31.0.0/16,192.168.2.0/24 > /share 172.31.0.0/16,192.168.2.0/24 > /share/.disk/E 172.31.0.0/16,192.168.2.0/24 > /share/.disk/D 172.31.0.0/16,192.168.2.0/24 > /share/.disk/C 172.31.0.0/16,192.168.2.0/24 > /share/.disk/B 172.31.0.0/16,192.168.2.0/24 > /share/.disk/A 172.31.0.0/16,192.168.2.0/24 > > > I noted everything is a separate mount except /share/.disk/E and /share/.disk/D > which are empty directories, since I have retired those disks. > > After removing those two directories from my exports file, I found that on the > client I still could not do a nfsv4 mount, but the mount does correctly > downgrade to nfsv3 when there is a failure. How bizarre...... Was there anything being logged to /var/log/messages? > > [root@docbill-think ~]# mount -v -v -t nfs -s -o nosuid,nolock,nodev,intr > 172.31.253.11:/share /tmp/foo > mount.nfs: timeout set for Tue Jan 25 08:45:41 2011 > mount.nfs: trying text-based options > 'nolock,intr,sloppy,vers=4,addr=172.31.253.11,clientaddr=172.31.253.98' > mount.nfs: mount(2): No such file or directory > mount.nfs: trying text-based options 'nolock,intr,sloppy,addr=172.31.253.11' > mount.nfs: prog 100003, trying vers=3, prot=6 > mount.nfs: trying 172.31.253.11 prog 100003 vers 3 prot TCP port 2049 > mount.nfs: prog 100005, trying vers=3, prot=17 > mount.nfs: trying 172.31.253.11 prog 100005 vers 3 prot UDP port 4002 > 172.31.253.11:/share on /tmp/foo type nfs (rw,nosuid,nodev,nolock,intr) > > > I don't really follow why I get a "No such file or directory" error for NFSv4, > but since it succeeds on the NFSv3 fallback, that is only a minor concern. Its because there is no pseudo root defined (via fsid=0).... (In reply to comment #15) > I see. It looks like one of the problems is the nfsv4 mount hangs if any of > the child exports does not have an fsid value assigned to it. For the disk > mounts, it picks up the UUID values as the fsid. For the empty directories, > it just hangs the whole processes. I'm not sure if the hang is caused on the > server or on the client... From what the trace said, it looked like the server was simply dropping the request so it was mostly likely a server issued. > > The second problem is it seems there is no way with nfsv4 to specify a crossmnt > without also specifying fsid=0. If crossmnt is specified without fsid=0, then > there is simply a no such file or directory. I'm not sure what in the world > one would do if they want to specify multiple crossmnt directories with > nfsv4... Again I'm not certain if this problem is client side or server side. > > For now, I commented out my /diskA, /diskB, and /diskC directories, and added > the fsid=0 flag. (That is problematic for Windows, which does not seem to > support crossmnt, but I can probably fine tune my exports via IP addresses...) Try setting the 'crossmnt,nohide' exports flags to help with jumping to different file systems. > I've added the fsid=0 flag to the /share mount. Now: > > mount -v -v -t nfs4 -s -o nosuid,nolock,nodev,intr 172.31.253.11:/ /tmp/foo > > works. It looks however like automount still falling back to nfsv3 instead of > nfsv4, as I find the disk list as /net/172.31.253.11/share not > /net/172.31.253.11/ as I would expect with the fsid=0 flag in play. In V4, there is this notion of a pseudo root which means a server can define what the root file system is the client can see. For example If the server has this export: /share *(fsid=0,ro) and the client mounted the root file system mount server:/ /mnt the client would only see the directories under '/server' For a v4 mount to work there has to be a pseudo root. With most NFS server implementation, the pseudo root just became '/'. With earlier Linux implementations this was not the case. One actually had to define a pseudo root with the fsid=0 export flag. The most common practice was to have this export: / *(ro,fsid=0) All this changed in the 2.6.32 kernel. The notion of a default pseudo root was added so when a pseudo root was not defined, '/' would become the pseudo root which cause all the current v3 exports magically work under v4. autofs is working great under Fedora 16. |