Bug 859078
Description
Michael Young
2012-09-20 14:02:20 UTC
(In reply to comment #0) > Our Netapp box is reporting warnings like > Client x.x.x.x is violating the NFSv4 specification by sending a UDP/IP > datagram to the NFSv4 server. > I took a packet trace, and it looks like at the start of automounting an NFS > share, the RHEL 6.3 client issues NFS NULL calls, first in TCP for V4, V3, > and V2, and then in UDP for V4, V3, and V2, presumably to work out what is > supported, but the NetApp seems to see the UDP NFS V4 packet as broken > behaviour. If they are right about it breaking the specification, could > automount or the kernel stop doing this. I won't go into why the change that is causing this was done now. There are two ways to avoid this, one is to set MOUNT_WAIT to a sensible timeout value for your environment or ... Second, I believe that the package at: http://people.redhat.com/~ikent/autofs-5.0.5-55.el6 will resolve your problem provided you specify the autofs pseudo option "fstype=nfs4" to tell autofs you are an NFSv4 only environment and that you don't want to allow fallback to NVFSv3 etc. Ian This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. (In reply to Ian Kent from comment #2) > > I won't go into why the change that is causing this was done now. > > There are two ways to avoid this, one is to set MOUNT_WAIT to a > sensible timeout value for your environment or ... > > Second, I believe that the package at: > http://people.redhat.com/~ikent/autofs-5.0.5-55.el6 > will resolve your problem provided you specify the autofs > pseudo option "fstype=nfs4" to tell autofs you are an NFSv4 > only environment and that you don't want to allow fallback > to NVFSv3 etc. Looks like I'm still waiting for information. Is this still a problem with autofs-5.0.5-55? How about with autofs-5.0.5-73? I haven't checked recently (NetApp and our support provider finally accepted a different issue we were having was an unrelated bug and so stopped complaining about strange entries in our NetApp log files). Looking at our logs, I am still getting the warnings with autofs-5.0.5-74.el6_4.x86_64 without the workaround. I will test the "fstype=nfs4" workaround on a test box and see if it fixes it (I did test autofs-5.0.5-55 and I think it worked but I can't remember for sure). (In reply to Michael Young from comment #7) > I haven't checked recently (NetApp and our support provider finally accepted > a different issue we were having was an unrelated bug and so stopped > complaining about strange entries in our NetApp log files). Looking at our > logs, I am still getting the warnings with autofs-5.0.5-74.el6_4.x86_64 > without the workaround. I will test the "fstype=nfs4" workaround on a test > box and see if it fixes it (I did test autofs-5.0.5-55 and I think it worked > but I can't remember for sure). Thinking about this I have a recent change that might affect this in Fedora 19. It should negate the need to use the fstype option for simple autofs map entries and NFSv4 IIRC, so there might be more we can do. I will also need to look at the NFS RFC again and check if using TCP only is a MUST and not a SHOULD. I thought it was a SHOULD so UDP can be used when probing availability although I think at least TCP is used first now, which from my point of view is not good from a reserved socket consumption POV (umm ... not sure what releases ... I'll need to check). Ian (In reply to Michael Young from comment #7) > I haven't checked recently (NetApp and our support provider finally accepted > a different issue we were having was an unrelated bug and so stopped > complaining about strange entries in our NetApp log files). Looking at our > logs, I am still getting the warnings with autofs-5.0.5-74.el6_4.x86_64 > without the workaround. I will test the "fstype=nfs4" workaround on a test > box and see if it fixes it (I did test autofs-5.0.5-55 and I think it worked > but I can't remember for sure). I have now tested this with fstype=nfs4 set and it does stop the NFSv4 warnings that NetApp is reporting for that host. case test step: 1. config nfs disable rpc.mount over tcp; diable rpc.nfsd nfs4 and tcp; RPCMOUNTDOPTS="--no-tcp" # /etc/sysconfig/nfs RPCNFSDARGS="-N 4 --no-tcp" /tmp *(sync) # /etc/exports service nfs restart 2. config autofs /nfsmp /etc/auto.nfs # /etc/auto.master lsystmp -nobind 127.0.0.1:/tmp # /etc/auto.nfs service autofs restart 3. (tcpdump -nv -i lo -w lo.pcap &) sleep 5 ls /nfsmp/lsystmp; sleep; pkill tcpdump tshark -tad -r lo.pcap | grep 'NFS V4 NULL ' tshark -tad -V -r lo.pcap | awk 'BEGIN{RS=""}/Protocol: UDP.*Program: NFS .*Program Version: 4.*V4 Procedure: NULL/{print}' Because in RHEL6.x rpc.mountd can not really disable over tcp.(bz984824) can not reproduce in local mount. need run the nfs server in rhel5.9 become a multi-host case? -_- Created attachment 775659 [details]
Patch - mount_nfs.so to honor explicit NFSv4 requests
Created attachment 775660 [details]
Patch - mount_nfs.so fix port=0 option behavior v3
Created attachment 775661 [details]
Patch - check for protocol option
Created attachment 775662 [details]
Patch - probe each nfs version in turn for singleton mounts
The above changes do a number of things that should improve NFSv4 usage in autofs. The changes are: - the port option handling should behave the same way as it is handled by mount.nfs(8). Not sticky related to this bug but sensible to have. - the options vers=4 and nfsvers=4 should be respected and cause autofs to behave as though fstype=nfs4 has been specified. - the options proto=tcp, tcp, proto=udp, udp should be respected and the availability probe should be done only for the given protocol when specified (the debug log should show this to be the case). - if /etc/nfsmount.conf has Defaultvers=4 (not checked by autofs) and /etc/sysconfig/autofs has the setting MOUNT_NFS_DEFAULT_PROTOCOL=4 and the map entry is simple (ie. one host only) then autofs should probe only for NFSv4. This should eliminate the need for the explicit fstype=nfs4 or alternate options. A package that can be used for testing is available at: http://people.redhat.com/~ikent/autofs-5.0.5-77.el6 Please test and post results. (In reply to Ian Kent from comment #18) > ... > A package that can be used for testing is available at: > http://people.redhat.com/~ikent/autofs-5.0.5-77.el6 > > Please test and post results. That version also seems to stop the NFSv4 UDP warnings on the NetApp (none in 5 days and I would have expected to have seen some by now). This is with default NFSv4 settings, ie. MOUNT_NFS_DEFAULT_PROTOCOL=4 in /etc/sysconfig/autofs and a default copy of /etc/nfsmount.conf. In particular fstype=nfs4 is not set anywhere. Hi Ian Use the errata(RHBA-2013:15476-02) build(autofs-5.0.5-80.el6) in RHEL6.4 x86_64 Test FAIL by use /CoreOS/autofs/Regression/bz859078 and manually: build a RHEL5.9 for nfs server: 1.config nfs: rlFileBackup /etc/sysconfig/nfs /etc/exports echo 'RPCMOUNTDOPTS="--no-tcp" RPCNFSDARGS="-N 4 --no-tcp"' >/etc/sysconfig/nfs echo '/tmp *(sync)' >/etc/exports 2.restart nfs service nfs restart 3. (get ipaddr) build a RHEL6.4 for test 1.install errata build: rpm -Uvh autofs-5.0.5-80.el6.${arch}.rpm 2.run the /CoreOS/autofs/Regression/bz859078/runtest.sh export nfsServ=$IP ./runtest.sh some test screen log, and /var/log/messages info: ---------------------------------------------------- 35 2013-08-14 18:08:03.446997 10.66.13.188 -> 10.66.13.16 NFS V4 NULL Call 36 2013-08-14 18:08:03.447148 10.66.13.16 -> 10.66.13.188 NFS V4 NULL Reply (Call In 35) :: [ PASS ] :: Running 'tshark -tad -r nic.pcap | grep 'NFS V4 NULL '' Running as user "root" and group "root". This could be dangerous. Frame 35 (82 bytes on wire, 82 bytes captured) Arrival Time: Aug 14, 2013 18:08:03.446997000 [Time delta from previous captured frame: 0.000605000 seconds] [Time delta from previous displayed frame: 0.000605000 seconds] [Time since reference or first frame: 0.004231000 seconds] Frame Number: 35 Frame Length: 82 bytes Capture Length: 82 bytes [Frame is marked: False] [Protocols in frame: eth:ip:udp:rpc] Ethernet II, Src: CadmusCo_94:11:fa (08:00:27:94:11:fa), Dst: CadmusCo_90:7c:2e (08:00:27:90:7c:2e) Destination: CadmusCo_90:7c:2e (08:00:27:90:7c:2e) Address: CadmusCo_90:7c:2e (08:00:27:90:7c:2e) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) Source: CadmusCo_94:11:fa (08:00:27:94:11:fa) Address: CadmusCo_94:11:fa (08:00:27:94:11:fa) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) Type: IP (0x0800) Internet Protocol, Src: 10.66.13.188 (10.66.13.188), Dst: 10.66.13.16 (10.66.13.16) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 68 Identification: 0x0000 (0) Flags: 0x02 (Don't Fragment) 0.. = Reserved bit: Not Set .1. = Don't fragment: Set ..0 = More fragments: Not Set Fragment offset: 0 Time to live: 64 Protocol: UDP (0x11) Header checksum: 0x0b5a [correct] [Good: True] [Bad : False] Source: 10.66.13.188 (10.66.13.188) Destination: 10.66.13.16 (10.66.13.16) User Datagram Protocol, Src Port: 42849 (42849), Dst Port: nfs (2049) Source port: 42849 (42849) Destination port: nfs (2049) Length: 48 Checksum: 0x2f91 [validation disabled] [Good Checksum: False] [Bad Checksum: False] Remote Procedure Call, Type:Call XID:0x520d8fb1 XID: 0x520d8fb1 (1376620465) Message Type: Call (0) RPC Version: 2 Program: NFS (100003) Program Version: 4 Procedure: NULL (0) Credentials Flavor: AUTH_NULL (0) Length: 0 Verifier Flavor: AUTH_NULL (0) Length: 0 Network File System [Program Version: 4] [V4 Procedure: NULL (0)] :: [ 18:08:07 ] :: check the package: Protocol: UDP (0x11) :: [ FAIL ] :: Running 'echo "$cap" | grep Protocol:\ UDP' (Expected 1, got 0) :: [ 18:08:07 ] :: {ls} gconfd-root :: [ 18:08:07 ] :: {/var/log/messages} Aug 14 18:08:03 dhcp-13-188 automount[2378]: handle_packet: type = 3 Aug 14 18:08:03 dhcp-13-188 automount[2378]: handle_packet_missing_indirect: token 131, name lsystmp, request pid 2416 Aug 14 18:08:03 dhcp-13-188 automount[2378]: attempting to mount entry /nfsmp/lsystmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: lookup_mount: lookup(file): looking up lsystmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: lookup_mount: lookup(file): lsystmp -> -nobind,vers=3 10.66.13.16:/tmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: parse_mount: parse(sun): expanded entry: -nobind,vers=3 10.66.13.16:/tmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: parse_mount: parse(sun): gathered options: nobind,vers=3 Aug 14 18:08:03 dhcp-13-188 automount[2378]: parse_mount: parse(sun): dequote("10.66.13.16:/tmp") -> 10.66.13.16:/tmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: parse_mount: parse(sun): core of entry: options=nobind,vers=3, loc=10.66.13.16:/tmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: sun_mount: parse(sun): mounting root /nfsmp, mountpoint lsystmp, what 10.66.13.16:/tmp, fstype nfs, options nobind,vers=3 Aug 14 18:08:03 dhcp-13-188 automount[2378]: mount_mount: mount(nfs): root=/nfsmp name=lsystmp what=10.66.13.16:/tmp, fstype=nfs, options=nobind,vers=3 Aug 14 18:08:03 dhcp-13-188 automount[2378]: mount_mount: mount(nfs): nfs options="vers=3", nobind=1, nosymlink=0, ro=0 Aug 14 18:08:03 dhcp-13-188 automount[2378]: get_nfs_info: called with host 10.66.13.16(10.66.13.16) proto 6 version 0x40 Aug 14 18:08:03 dhcp-13-188 automount[2378]: get_nfs_info: called with host 10.66.13.16(10.66.13.16) proto 6 version 0x70 Aug 14 18:08:03 dhcp-13-188 automount[2378]: get_nfs_info: called with host 10.66.13.16(10.66.13.16) proto 17 version 0x70 Aug 14 18:08:03 dhcp-13-188 automount[2378]: get_nfs_info: nfs v3 rpc ping time: 0.000192 Aug 14 18:08:03 dhcp-13-188 automount[2378]: get_nfs_info: nfs v2 rpc ping time: 0.000254 Aug 14 18:08:03 dhcp-13-188 automount[2378]: get_nfs_info: host 10.66.13.16 cost 223 weight 0 Aug 14 18:08:03 dhcp-13-188 automount[2378]: prune_host_list: selected subset of hosts that support NFS3 over UDP Aug 14 18:08:03 dhcp-13-188 automount[2378]: mount_mount: mount(nfs): calling mkdir_path /nfsmp/lsystmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: mount_mount: mount(nfs): calling mount -t nfs -s -o vers=3 10.66.13.16:/tmp /nfsmp/lsystmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: mount(nfs): mounted 10.66.13.16:/tmp on /nfsmp/lsystmp Aug 14 18:08:03 dhcp-13-188 automount[2378]: dev_ioctl_send_ready: token = 131 Aug 14 18:08:03 dhcp-13-188 automount[2378]: mounted /nfsmp/lsystmp Aug 14 18:08:06 dhcp-13-188 kernel: device eth0 left promiscuous mode (In reply to Yin.JianHong from comment #13) Sorry I didn't look closely enough at this test. I don't think it's quite right. > case test step: > 1. config nfs disable rpc.mount over tcp; diable rpc.nfsd nfs4 and tcp; > RPCMOUNTDOPTS="--no-tcp" # /etc/sysconfig/nfs > RPCNFSDARGS="-N 4 --no-tcp" > /tmp *(sync) # /etc/exports > service nfs restart > > 2. config autofs > /nfsmp /etc/auto.nfs # /etc/auto.master > lsystmp -nobind 127.0.0.1:/tmp # /etc/auto.nfs The test actually has: echo "lsystmp -nobind,vers=3 $nfsServ:/tmp" >/etc/auto.nfs which is requesting an NFSv3 mount which will issue UDP packets during the availability probe. This bug is about issuing UDP packets when an NFSv4 mount is attempted. The TCP protocol MUST be available for NFSv4 according to the NFS RFC so it isn't valid to disable it. If the NFSv4 over TCP only availability probe fails autofs will fall back to probing all versions and protocols which includes probing NFSv4 over UDP. I believe this is valid because I don't remember a MUST NOT provide NFSv4 over UDP in the NFSv4 RFC which also means the NetApp log message isn't strictly accurate. At least I don't remember a MUST NOT provide NFSv4 over UDP in the RFC but I may be mistaken. Ian (In reply to Yin.JianHong from comment #22) > Aug 14 18:08:03 dhcp-13-188 automount[2378]: mount_mount: mount(nfs): > root=/nfsmp name=lsystmp what=10.66.13.16:/tmp, fstype=nfs, > options=nobind,vers=3 > Aug 14 18:08:03 dhcp-13-188 automount[2378]: mount_mount: mount(nfs): nfs > options="vers=3", nobind=1, nosymlink=0, ro=0 > Aug 14 18:08:03 dhcp-13-188 automount[2378]: get_nfs_info: called with host > 10.66.13.16(10.66.13.16) proto 6 version 0x40 But this is a problem, not because it is NFSv4 over UDP but because NFSv3 only has been given in the mount options. I didn't account for that in the change here because I was focused on NFSv4 only. I probably should only probe the specific version present in the options for completeness. So there is an opportunity to improve this. I'll do that as soon as I can and update the bug. Ian Created attachment 787183 [details]
Patch - only probe specific nfs version when requested
verified by /CoreOS/autofs/Regression/bz859078 manually : [root@dell-pe1950-01 bz859078]# make run test -x runtest.sh || chmod a+x runtest.sh ./runtest.sh package beakerlib-plugins-qe-0.1.0-0.noarch is already installed {INFO} platform: RedHatEnterpriseServer 6.5 Linux dell-pe1950-01.rhts.englab.brq.redhat.com 2.6.32-412.el6.x86_64 #1 SMP Tue Aug 13 23:06:33 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux autofs-5.0.5-82.el6.x86_64 wireshark-1.8.8-5.el6.x86_64 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: /CoreOS/autofs/Regression/bz859078::Test :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 01:55:03 ] :: run (tcpdump -nv -i eth0 -w nic.pcap src 10.34.35.50 or dst 10.34.35.50 &) :: [ 01:55:03 ] :: sleep 3 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes :: [ 01:55:06 ] :: run (ls /nfsmp/lsystmp >ls.result &) 55 packets captured 56 packets received by filter 0 packets dropped by kernel :: [ PASS ] :: Running 'sleep 3 && pkill tcpdump' (Expected 0, got 0) Running as user "root" and group "root". This could be dangerous. 17 2013-08-28 01:55:06.892399 10.34.35.50 -> 10.66.13.194 NFS 110 V4 NULL Call 19 2013-08-28 01:55:07.223742 10.66.13.194 -> 10.34.35.50 NFS 94 V4 NULL Reply (Call In 17) :: [ PASS ] :: Running 'tshark -tad -r nic.pcap | egrep 'NFS ([0-9]+ )?V4 NULL '' (Expected 0, got 0) Running as user "root" and group "root". This could be dangerous. :: [ 01:55:10 ] :: check the package: :: [ PASS ] :: Running 'echo "$cap" | grep Protocol:\ UDP' (Expected 1, got 1) :: [ 01:55:10 ] :: {ls} :: [ 01:55:11 ] :: {/var/log/messages} Aug 28 01:55:06 dell-pe1950-01 automount[5715]: handle_packet: type = 3 Aug 28 01:55:06 dell-pe1950-01 automount[5715]: handle_packet_missing_indirect: token 11, name lsystmp, request pid 5753 Aug 28 01:55:06 dell-pe1950-01 automount[5715]: attempting to mount entry /nfsmp/lsystmp Aug 28 01:55:06 dell-pe1950-01 automount[5715]: lookup_mount: lookup(file): looking up lsystmp Aug 28 01:55:06 dell-pe1950-01 automount[5715]: lookup_mount: lookup(file): lsystmp -> -nobind,vers=3 10.66.13.194:/tmp Aug 28 01:55:06 dell-pe1950-01 automount[5715]: parse_mount: parse(sun): expanded entry: -nobind,vers=3 10.66.13.194:/tmp Aug 28 01:55:06 dell-pe1950-01 automount[5715]: parse_mount: parse(sun): gathered options: nobind,vers=3 Aug 28 01:55:06 dell-pe1950-01 automount[5715]: parse_mount: parse(sun): dequote("10.66.13.194:/tmp") -> 10.66.13.194:/tmp Aug 28 01:55:06 dell-pe1950-01 automount[5715]: parse_mount: parse(sun): core of entry: options=nobind,vers=3, loc=10.66.13.194:/tmp Aug 28 01:55:06 dell-pe1950-01 automount[5715]: sun_mount: parse(sun): mounting root /nfsmp, mountpoint lsystmp, what 10.66.13.194:/tmp, fstype nfs, options nobind,vers=3 Aug 28 01:55:06 dell-pe1950-01 automount[5715]: mount_mount: mount(nfs): root=/nfsmp name=lsystmp what=10.66.13.194:/tmp, fstype=nfs, options=nobind,vers=3 Aug 28 01:55:06 dell-pe1950-01 automount[5715]: mount_mount: mount(nfs): nfs options="vers=3", nobind=1, nosymlink=0, ro=0 Aug 28 01:55:06 dell-pe1950-01 automount[5715]: get_nfs_info: called with host 10.66.13.194(10.66.13.194) proto 6 version 0x40 Aug 28 01:55:07 dell-pe1950-01 automount[5715]: get_nfs_info: nfs v4 rpc ping time: 0.331407 Aug 28 01:55:07 dell-pe1950-01 automount[5715]: get_nfs_info: host 10.66.13.194 cost 331407 weight 0 Aug 28 01:55:07 dell-pe1950-01 automount[5715]: prune_host_list: selected subset of hosts that support NFS4 over TCP Aug 28 01:55:07 dell-pe1950-01 automount[5715]: mount_mount: mount(nfs): calling mkdir_path /nfsmp/lsystmp Aug 28 01:55:07 dell-pe1950-01 automount[5715]: mount_mount: mount(nfs): calling mount -t nfs -s -o vers=3 10.66.13.194:/tmp /nfsmp/lsystmp Aug 28 01:55:09 dell-pe1950-01 kernel: device eth0 left promiscuous mode :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: /CoreOS/autofs/Regression/bz859078::Cleanup :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 01:55:11 ] :: JOURNAL XML: /var/tmp/beakerlib-SO4oDGn/journal.xml :: [ 01:55:11 ] :: JOURNAL TXT: /var/tmp/beakerlib-SO4oDGn/journal.txt During QA autofs bugzilla regression tests bz239361 and bz239370 have been reported to fail. Investigation shows that there is a mistake in the patches included for this bug for the special case when NFSv2 only is requested to be probed. In order to include the correction this bug needs the flag exception+. Can we get this approval please? Ian Created attachment 802763 [details] Patch - fix get_nfs_info() probe Resolve regression identified by bugzilla tests bz239361 and bz239370. Created attachment 802776 [details]
Patch - fix get_nfs_info() probe (updated)
Make patch description reasonably readable.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1690.html |