Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 228727

Summary: NFS mount failing from clustered resource
Product: Red Hat Enterprise Linux 5 Reporter: Andy Elmer <andy.elmer>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: dkovalsk, Matthew.Thyer
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-14 20:49:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bz228728.pcap.bz2 --> "mount -v asdfssghome01:/export/Home01/ir/elmerar /mnt"
none
bz228729.pcap.bz2 --> "mount -v -o noacl asdfssghome01:/export/Home01/ir/elmerar /mnt"
none
bz228730.pcap.bz2 --> "mount -v -o tcp asdfssghome01:/export/Home01/ir/elmerar /mnt" none

Description Andy Elmer 2007-02-14 17:19:08 UTC
Description of problem:
Unable to mount NFS filesystems from a VCS clustered host.  "mount" just
hangs/retries without success.  Mounting NFS filesystems from a non-clustered
system or using the *real* server name (not the cluster alias) works fine. 

How reproducible:
See below

Steps to Reproduce:
1.  See below
2.
3.
  
Actual results:
Ex.
server1 = file server (192.168.0.10)
homedir1 = hostname for clustered resource *currently* residing on server1
(homedir1 is basically an alias for server1) (192.168.0.11)

*note:  while server1 & homedir1 have different names & IP's, they refer to the
same system.

mount -v server1:/nfs/point /mount/point
mount: no type was given - I'll assume nfs because of the colon
mount: trying 10.66.61.101 prog 100003 vers 3 prot tcp port 2049
mount: trying 10.66.61.101 prog 100005 vers 3 prot udp port 33966
(successful)

mount -v homedir1:/nfs/point /mount/point  
mount: no type was given - I'll assume nfs because of the colon
mount: trying 10.66.61.93 prog 100003 vers 3 prot tcp port 2049
mount: mount to NFS server 'asdfssgHome01' failed: timed out (retrying).
mount: trying 10.66.61.93 prog 100003 vers 3 prot tcp port 2049
mount: mount to NFS server 'asdfssgHome01' failed: timed out (retrying).
(unsuccessful)

Both commands above are mounting the same NFS point from the same server.


Expected results:
Filesystem should mount without problems.  Our RHEL3 & RHEL4 workstations have
no problems with this.

Additional info:
- NFS file server is a Solaris 8 system running Veritas Cluster Server.
- If I specifically specify TCP to the mount command (-o tcp), the mount command
works every time -- which doesn't make sense since mounting w/tcp should be the
default behavior.

TCPDUMP output:

From unsucessful attempt:
<snip>
client1 -> homedir1       PORTMAP C GETPORT prog=100005 (MOUNT) vers=3 proto=UDP
homedir1 -> client1       TCP D=34844 S=2049 Fin Ack=1160248412 Seq=935776432
Len=0 Win=33304 Options=<nop,nop,tstamp 17450967 60794813>
client1 -> homedir1       TCP D=2049 S=34844     Ack=935776433 Seq=1160248412
Len=0 Win=46 Options=<nop,nop,tstamp 60794814 17450967>
server1 -> client1       PORTMAP R GETPORT port=33966
client1 -> server1      ICMP Destination unreachable (UDP port 32771 unreachable)


From successful attempt:
<snip>
client1 -> server1      PORTMAP C GETPORT prog=100005 (MOUNT) vers=3 proto=UDP
server1 -> client1       TCP D=56024 S=2049     Ack=1235741369 Seq=931535860
Len=0 Win=33304 Options=<nop,nop,tstamp 90305704 60868844>
server1 -> client1       TCP D=56024 S=2049 Fin Ack=1235741369 Seq=931535860
Len=0 Win=33304 Options=<nop,nop,tstamp 90305704 60868844>
server1 -> client1       PORTMAP R GETPORT port=32807
client1 -> server1      TCP D=2049 S=56024     Ack=931535861 Seq=1235741369
Len=0 Win=46 Options=<nop,nop,tstamp 60868844 90305704>
client1 -> server1      MOUNT3 C Null
server1 -> client1       MOUNT3 R Null 
client1 -> server1      MOUNT3 C Mount /export/JS/cfengine/local
</snip>

Comment 1 Steve Dickson 2007-02-16 11:22:20 UTC
Make sure the server is listening for upd mounts with
the 'rpcinfo -p <server>' command. You should see at least
one of the following lines in rpcinfo's output

    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs


Comment 2 Andy Elmer 2007-02-19 15:42:48 UTC
The server is listening:
/usr/sbin/rpcinfo -p server1 | grep nfs
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100227    2   udp   2049  nfs_acl
    100227    3   udp   2049  nfs_acl
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100227    2   tcp   2049  nfs_acl
    100227    3   tcp   2049  nfs_acl

To me, it seems to be a client side problem.

Our current RHEL3, RHEL4, and the few FC6 machines are able to mount these NFS
locations just fine.  I'll try installing RHEL5-beta2 on a different piece of
hardware to see if I get the same or different result.

Comment 3 Steve Dickson 2007-02-22 17:03:59 UTC
Ok... try mounting with the '-o noacl' mount option

Comment 4 Andy Elmer 2007-02-23 21:02:39 UTC
Mounting with "-o noacl" still fails

# mount -v -o noacl sevrer1:/mount/point /mnt/
mount: no type was given - I'll assume nfs because of the colon
mount: trying 10.66.61.75 prog 100003 vers 3 prot tcp port 2049
mount: mount to NFS server 'asdfssgcorp01.mpls.udlp.com' failed: timed out
(retrying).

Mounting with no options fails
# mount -v server1:/mount/point /mnt/
mount: no type was given - I'll assume nfs because of the colon
mount: trying 10.66.61.75 prog 100003 vers 3 prot tcp port 2049
mount: mount to NFS server 'asdfssgcorp01.mpls.udlp.com' failed: timed out
(retrying).

Mounting with -o tcp succeeds
# mount -v -o tcp server1:/mount/point /mnt/
mount: no type was given - I'll assume nfs because of the colon
mount: trying 10.66.61.75 prog 100003 vers 3 prot tcp port 2049
mount: trying 10.66.61.75 prog 100005 vers 3 prot tcp port 34206
# 
(success)

Now, I know I could just mount everything with "-o tcp" to get past this
problem, however, we use the automounter to mount these locations and as a
result, just hangs.

As another note, I spoke with a friend who had the same problem with a
particular version of Ubuntu.  Perhaps this is isolated to a specific version of
"mount".  In any case, RHEL3, RHEL4, and FC6 all mount the above location just
fine without additional options.

Comment 5 Steve Dickson 2007-02-27 22:15:15 UTC
Could you please post a bzip2 binary tethereal trace of both 
failures... something similar to 
    tethereal -w /tmp/bz228727.pcap host <server> ; bzip2 /tmp/bz228727.pcap

tia..



Comment 6 Andy Elmer 2007-03-05 21:46:57 UTC
Created attachment 149295 [details]
bz228728.pcap.bz2  -->  "mount -v asdfssghome01:/export/Home01/ir/elmerar /mnt"

I attached 3 tethereal traces.	These are the commands I ran with each trace:

bz228728.pcap.bz2  -->	"mount -v asdfssghome01:/export/Home01/ir/elmerar /mnt"


bz228729.pcap.bz2  -->	"mount -v -o noacl
asdfssghome01:/export/Home01/ir/elmerar /mnt"

bz228730.pcap.bz2  -->	"mount -v -o tcp
asdfssghome01:/export/Home01/ir/elmerar /mnt"

Comment 7 Andy Elmer 2007-03-05 21:48:42 UTC
Created attachment 149296 [details]
bz228729.pcap.bz2  -->  "mount -v -o noacl asdfssghome01:/export/Home01/ir/elmerar /mnt"

Comment 8 Andy Elmer 2007-03-05 21:49:32 UTC
Created attachment 149297 [details]
bz228730.pcap.bz2  -->  "mount -v -o tcp asdfssghome01:/export/Home01/ir/elmerar /mnt"

Comment 9 Matthew Thyer 2009-04-15 02:41:47 UTC
This is due to the way VCS on Solaris runs it's NFS server as a single server per node instead of one per "service group".
I have confirmed that this is still a problem on Solaris 10 10/08 (u6) but is expected to be fixed with the next release of Solaris 10 (u7).  It's Sun bug Id: 2159403.

The scenario is this:

The VCS cluster node has several IP addresses on the same network.  The first being it's actual IP address and the rest are the addresses of each "service group" which are implemented as aliases on the network interface (Sun would call these "floating logical addresses").

The problem is that rpcbind on the Solaris VCS server replies to the Fedora client using the IP address of the node and not the IP address of the "service group" that the client sent it's NFS mount request to.

The Fedora 9 and above NFS client suspects a man-in-the-middle type of attack and ignores the rpcbind server responses.

Fedora 8 as a client did not mind, but Fedora 9 and above do.

On some Linux hosts a workaround is to use TCP mounts instead of UDP (Mandriva 2007 ?).

Comment 13 Steve Dickson 2009-12-14 20:49:55 UTC
This a Sun bug "It's Sun bugId: 2159403

Comment 14 Matthew Thyer 2009-12-15 00:28:22 UTC
Sun produced a 'T' patch for me (for SPARC) that fixed this problem.
It has been released as a full patch and is: 140917.