This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours
Bug 208244 - unable to mount nfs with udp
unable to mount nfs with udp
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: nfs-utils (Show other bugs)
5.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-27 07:12 EDT by Terje Rosten
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: beta2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-22 19:26:23 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
tcpdump capture file udp as transport (2.61 KB, application/octet-stream)
2006-09-27 07:14 EDT, Terje Rosten
no flags Details
tcpdump capture file tcp as transport (6.43 KB, application/octet-stream)
2006-09-27 07:20 EDT, Terje Rosten
no flags Details
ethereal packet capture - UDP (8.47 KB, application/octet-stream)
2006-11-22 10:23 EST, Jeff Bastian
no flags Details
ethereal packet capture - TCP (9.67 KB, application/octet-stream)
2006-11-22 10:25 EST, Jeff Bastian
no flags Details
patch -- don't call connect() on UDP sockets (378 bytes, patch)
2006-11-27 10:01 EST, Jeff Layton
no flags Details | Diff

  None (edit)
Description Terje Rosten 2006-09-27 07:12:49 EDT
Description of problem:

Starting with RHEL5 Beta1 is's impossible to mount nfs with udp:

$ mount -t nfs -o udp nfs-server:/global/export/home/user /mnt/nfs/udp
mount: mount to NFS server 'nfs-server' failed: timed out (retrying).
mount: mount to NFS server 'nfs-server' failed: timed out (retrying).

This has been working on anything from  RHEL2.1 and RHL9 to RHEL4 and FC5.

TCP works fine, however seems like UDP is the default and then automount
with no mount options will not mount. 

(Can then default transport be modified without rebuilding the kernel?)

The server is Solaris 9 sparc.

Logs from tcpdump capture when mounting udp (failing) and tcp (ok) is 
attached.
Comment 1 Terje Rosten 2006-09-27 07:14:36 EDT
Created attachment 137209 [details]
tcpdump capture file udp as transport
Comment 2 Terje Rosten 2006-09-27 07:20:51 EDT
Created attachment 137210 [details]
tcpdump capture file tcp as transport
Comment 3 Steve Dickson 2006-09-27 07:31:42 EDT
Looking at the tcpdump in Comment #1, it appears the mountd
on the server is not answering the portmap query for the 
udp port that should be used.

Make sure the mountd on the server is listening for udp
requests by doing:

rpcinfo -p <server> | grep mountd | grep udp
 
Comment 4 Terje Rosten 2006-09-28 03:07:19 EDT
> Looking at the tcpdump in Comment #1, it appears the mountd
> on the server is not answering the portmap query for the 
> udp port that should be used.

Yes, that's is my view too, but why?

 
> Make sure the mountd on the server is listening for udp
> requests by doing:
> 
> rpcinfo -p <server> | grep mountd | grep udp
  

$ /usr/sbin/rpcinfo -p nfs-server | grep mountd | grep udp
    100005    1   udp  62790  mountd
    100005    2   udp  62790  mountd
    100005    3   udp  62790  mountd

Note: this server has about 400 clients which are running 
anything from RHL9-FC5, RHEL2.1-5, SuSE 9, Solaris 8-10, HP-UX 11.
I will be surprised if the bug is on the server side.

I also see the problem on two different hosts running RHEL5 Beta 1, one is
running i386 the other x86_64.
Comment 5 Steve Dickson 2006-09-28 06:25:25 EDT
Could this be a firewall problem? Is SELinux in the picture?
Comment 6 Terje Rosten 2006-09-28 10:05:58 EDT
(In reply to comment #5)
> Could this be a firewall problem? Is SELinux in the picture?

No, iptables and selinux is not in use:

$ grep ^SELINUX= /etc/sysconfig/selinux 
SELINUX=disabled

$ service iptables status
service iptables status
Table: filter
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         

Is the mount call from the client valid/correct?

It's a pity I don't have access to the server.
Comment 7 Steve Dickson 2006-09-29 10:39:14 EDT
Yes... the arguments to the mount are correct...
I just used both the '-t nfs -o udp' args to successfully
mount  rhel5 client  on a Solaris 9 server...

Maybe its the permission on the exports... although I
would expect the server to fail the mount, not just
drop the request...  Does the server have multiple
interfaces? Maybe there was a response did came back
but on a different interface?

Also is there any thing in /var/log/messages that give
a clue as to what is happening?
Comment 8 Terje Rosten 2006-09-29 16:04:05 EDT
> Yes... the arguments to the mount are correct...

Yes, of course, but is the data in network packet sent from the client really 
correct or could the server be confused by some wrong bits in the reqeust?

> I just used both the '-t nfs -o udp' args to successfully
> mount  rhel5 client  on a Solaris 9 server...

Yes, this client mount another Solaris servers and RHEL/FC servers just fine,
the install was done by kickstart over nfs from Solaris 10 :-)


> Maybe its the permission on the exports... although I
> would expect the server to fail the mount, not just
> drop the request...  Does the server have multiple
> interfaces? Maybe there was a response did came back
> but on a different interface?

No i don't think so, the server is however running Solaris 9 with the
Sun Cluster HA software.

I will try to put a rawhide kernel on the client and the RHEL5 beta1 kernel
on a FC client (must wait untill monday...)




Comment 9 Jeff Bastian 2006-11-21 13:39:54 EST
Steve,

I'm seeing the same problem with both Fedora Core 6 and RHEL5 Beta 2.  Both the
firewall and SELinux are disabled.  I've tried this with a couple different
NetApp filers that are serving hundreds of RHEL3, RHEL4, and Solaris 8 clients.
 Only RHEL5 & FC6 have a problem.  rpcinfo shows the server is responding to UDP
requests.

$ /usr/sbin/rpcinfo -p server | grep mountd | grep udp
    100005    3   udp   4046  mountd
    100005    2   udp   4046  mountd
    100005    1   udp   4046  mountd

Comment 10 Jeff Bastian 2006-11-21 13:41:17 EST
One more comment: it doesn't fail consistently.  Trying to mount with UDP fails
*most* of the time, but occasionally I'm able to successfully make a UDP mount.
Comment 11 Terje Rosten 2006-11-21 13:52:53 EST
I just updated to RHEL 5 Beta 2 x86_64 (and the nfs server is now running
Solaris 10), the problem is still here. tcp works.
Comment 12 Jeff Bastian 2006-11-21 16:03:12 EST
I took a Solaris 10 workstation and shared the /export directory via NFS and
tried  mounting it over UDP with both RHEL5 Beta 2 and FC6 and both worked fine.
 The RHEL5 & FC6 boxes were the only two NFS clients mounting from this NFS
server, though.  I wonder if somehow RHEL5 & FC6 are more sensitive to an NFS
server that's under a heavy load from hundreds or thousands of clients...?
Comment 13 Jeff Layton 2006-11-22 10:07:40 EST
I've not been able to reproduce this here so far. I've tried a RHEL-4, solaris
8, and netapps as servers and they all mounted without issue. What might be most
helpful is getting a network capture of the traffic between the two hosts when a
mount is attempted to see if we can determine what's happening.
Comment 15 Jeff Bastian 2006-11-22 10:23:39 EST
Created attachment 141908 [details]
ethereal packet capture - UDP

This packet capture was taken with no mount options and the mount failed.
Comment 16 Jeff Bastian 2006-11-22 10:25:19 EST
Created attachment 141909 [details]
ethereal packet capture - TCP

This packet capture was taken during 'mount -t nfs -o vers=3,proto=tcp' and it
worked.
Comment 17 Jeff Bastian 2006-11-22 10:27:13 EST
In the two pcap files I just uploaded, the RHEL5 NFS client is 192.168.251.153.
 The NetApp filer has three interfaces: 128.247.24.16, 128.247.26.16, and
128.247.28.16.
Comment 18 Jeff Layton 2006-11-22 15:23:27 EST
From looking at the captures, the issue seems to be that the UDP mountd packets
are being sent to 192.168.24.16, but the replies are coming back from
192.168.28.16. Some of the UDP networking behavior seems to have changed in
RHEL5/FC6 that makes this no longer be kosher.

This seems to be the RHEL5 version of BZ 212471 (an fc6 BZ).
Comment 19 Jeff Layton 2006-11-22 16:37:16 EST
This appears to be a rather nasty problem, actually. mount.nfs just uses the
standard libc rpc calls to talk to mountd. In an strace, it ends up looking like
this:

26265 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
26265 bind(3, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
26265 connect(3, {sa_family=AF_INET, sin_port=htons(912),
sin_addr=inet_addr("192.168.1.2")}, 16) = 0
26265 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 3000) = 1
26265 recvfrom(3, "8\203=\334\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 400, 0,
{sa_family=AF_INET, sin_port=htons(912), sin_addr=inet_addr("192.
168.1.2")}, [16]) = 24
26265 close(3)                          = 0

I think the issue here is likely that once the socket is "connected", that more
recent versions of the kernel reject packets that come from addresses other than
the sin_addr that the socket is connected to.

This is actually correct behavior (we shouldn't accept packets from other hosts
on the socket), and the right fix would seem to be to have the server reply
using the same source address. We probably need to investigate this more closely
before declaring so, however...
Comment 20 Jeff Bastian 2006-11-22 16:40:33 EST
I took a look at BZ 212471 and it certainly sounds like the problem is with NFS
servers that have multiple network interfaces.

To test this, I added a virtual network interface to my Solaris 10 NFS server
system and added the 2nd IP address to the DNS server and immediately I started
having problems making UDP mounts from a RHEL5 client.

I ran tshark on the client and snoop on the server and I see the same behavior:
the client makes a mount request to one IP address, but the reply comes from the
other address and RHEL5 doesn't like this very much.

This also explains why once in a while mounting over UDP actually works: on
those occasions the reply comes back from the same IP address that the mount
request went to.
Comment 21 Jeff Bastian 2006-11-22 16:49:38 EST
It's true that a system shouldn't accept replies from a different host, however,
in this case, the reply came back from the same host, just using a different IP
address and network interface.

Furthermore, if Solaris, Irix, OSF1, and NetApp NFS servers are all behaving the
same "wrong" way, the Linux community is going to have a hard time arguing that
Linux is doing things the "correct" way here.  Even if it is the correct way, it
doesn't play nicely with other *nix systems so it makes it difficult to use in a
heterogeneous environment, especially when all previous versions of RHEL & FC
worked fine.
Comment 22 Jeff Layton 2006-11-27 08:24:08 EST
> It's true that a system shouldn't accept replies from a different host, however,
> in this case, the reply came back from the same host, just using a different IP
> address and network interface.

Yes, but the client has no way to know this.

> Furthermore, if Solaris, Irix, OSF1, and NetApp NFS servers are all behaving 
> the same "wrong" way, the Linux community is going to have a hard time arguing
that
> Linux is doing things the "correct" way here.  Even if it is the correct way, it
> doesn't play nicely with other *nix systems so it makes it difficult to use in a
> heterogeneous environment, especially when all previous versions of RHEL & FC
worked fine.

Also true. The problem here is more fundamental than NFS though...

We have a socket that is connected to a definite IP address, and it receives a
UDP packet from a completely different IP address. Should it accept this packet,
even though it's quite possible that it comes from an impostor? Allowing this
could leave some applications vulnerable to subterfuge.

Again though, I'm still looking at this problem and it's not yet clear to me
what the correct way to fix it is.
Comment 23 Jeff Layton 2006-11-27 10:01:52 EST
Created attachment 142173 [details]
patch -- don't call connect() on UDP sockets

It looks like we're calling connect() on all sockets, even UDP sockets. It's
not clear to me that this is necessary. This patch seemed to prevent the
connect() calls on my test box, but I don't have a good setup at the moment for
testing whether this fixes the issues with multihomed hosts.

I'll build and post some test packages in a bit for those who can test this.
Comment 24 Jeff Layton 2006-11-27 10:28:23 EST
I've posted some test packages for i386 and x86_64 on my people page:

http://people.redhat.com/jlayton/bz208244/

Can people experiencing this problem (carefully) test them and let me know if
they resolve it?
Comment 25 Eric Hagberg 2006-11-27 12:36:44 EST
Doesn't seem to work for me. Mounting a share from a NetApp, I get a hanging
mount, but specifying "-o tcp" to the same mount command results in the mount
completing.
Comment 26 Steve Dickson 2006-11-27 13:00:57 EST
Eric,

Could you please post a bziped tethereal network trace so
we can verify we are looking at the same problem?

tia...
Comment 27 Eric Hagberg 2006-11-27 13:23:38 EST
Just realized there was another possible culprit here - iptables. Looks like
there was something in the default iptables setup that was causing this to fail.
Sorry. I'm not seeing this problem anymore.

I did replace the test nfs-utils package with the original one and did see the
problem again, and then re-installing this test nfs-utils rpm did solve the
problem, so it wasn't just iptables causing this for me.
Comment 28 Jeff Bastian 2006-11-27 13:30:38 EST
The patch to nfs-utils seems to work for me.  I just mounted & unmounted a
directory a dozen times or so with UDP and it worked every time.  (I also tested
with TCP and it still works.)
Comment 29 RHEL Product and Program Management 2006-11-27 21:53:37 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 30 RHEL Product and Program Management 2006-11-27 21:53:44 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 31 Terje Røsten 2006-11-28 05:08:24 EST
(In reply to comment #24)
> Can people experiencing this problem (carefully) test them and let me know if
> they resolve it?

Works great here, thanks!
 
Comment 32 Steve Dickson 2006-11-28 09:19:52 EST
Yes... Nice work Jeff!!!

fixed in nfs-utils-1.0.9-14.el5
Comment 33 Jay Turner 2006-12-01 15:38:15 EST
QE ack for RHEL5.
Comment 34 RHEL Product and Program Management 2006-12-22 19:26:23 EST
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.

Note You need to log in before you can comment on or make changes to this bug.