Bug 228727
| Summary: | NFS mount failing from clustered resource | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Andy Elmer <andy.elmer> |
| Component: | nfs-utils | Assignee: | Steve Dickson <steved> |
| Status: | CLOSED NOTABUG | QA Contact: | Ben Levenson <benl> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 5.0 | CC: | dkovalsk, Matthew.Thyer |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2009-12-14 20:49:55 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Andy Elmer
2007-02-14 17:19:08 UTC
Make sure the server is listening for upd mounts with
the 'rpcinfo -p <server>' command. You should see at least
one of the following lines in rpcinfo's output
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
The server is listening:
/usr/sbin/rpcinfo -p server1 | grep nfs
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100227 2 udp 2049 nfs_acl
100227 3 udp 2049 nfs_acl
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100227 2 tcp 2049 nfs_acl
100227 3 tcp 2049 nfs_acl
To me, it seems to be a client side problem.
Our current RHEL3, RHEL4, and the few FC6 machines are able to mount these NFS
locations just fine. I'll try installing RHEL5-beta2 on a different piece of
hardware to see if I get the same or different result.
Ok... try mounting with the '-o noacl' mount option Mounting with "-o noacl" still fails # mount -v -o noacl sevrer1:/mount/point /mnt/ mount: no type was given - I'll assume nfs because of the colon mount: trying 10.66.61.75 prog 100003 vers 3 prot tcp port 2049 mount: mount to NFS server 'asdfssgcorp01.mpls.udlp.com' failed: timed out (retrying). Mounting with no options fails # mount -v server1:/mount/point /mnt/ mount: no type was given - I'll assume nfs because of the colon mount: trying 10.66.61.75 prog 100003 vers 3 prot tcp port 2049 mount: mount to NFS server 'asdfssgcorp01.mpls.udlp.com' failed: timed out (retrying). Mounting with -o tcp succeeds # mount -v -o tcp server1:/mount/point /mnt/ mount: no type was given - I'll assume nfs because of the colon mount: trying 10.66.61.75 prog 100003 vers 3 prot tcp port 2049 mount: trying 10.66.61.75 prog 100005 vers 3 prot tcp port 34206 # (success) Now, I know I could just mount everything with "-o tcp" to get past this problem, however, we use the automounter to mount these locations and as a result, just hangs. As another note, I spoke with a friend who had the same problem with a particular version of Ubuntu. Perhaps this is isolated to a specific version of "mount". In any case, RHEL3, RHEL4, and FC6 all mount the above location just fine without additional options. Could you please post a bzip2 binary tethereal trace of both
failures... something similar to
tethereal -w /tmp/bz228727.pcap host <server> ; bzip2 /tmp/bz228727.pcap
tia..
Created attachment 149295 [details] bz228728.pcap.bz2 --> "mount -v asdfssghome01:/export/Home01/ir/elmerar /mnt" I attached 3 tethereal traces. These are the commands I ran with each trace: bz228728.pcap.bz2 --> "mount -v asdfssghome01:/export/Home01/ir/elmerar /mnt" bz228729.pcap.bz2 --> "mount -v -o noacl asdfssghome01:/export/Home01/ir/elmerar /mnt" bz228730.pcap.bz2 --> "mount -v -o tcp asdfssghome01:/export/Home01/ir/elmerar /mnt" Created attachment 149296 [details] bz228729.pcap.bz2 --> "mount -v -o noacl asdfssghome01:/export/Home01/ir/elmerar /mnt" Created attachment 149297 [details] bz228730.pcap.bz2 --> "mount -v -o tcp asdfssghome01:/export/Home01/ir/elmerar /mnt" This is due to the way VCS on Solaris runs it's NFS server as a single server per node instead of one per "service group". I have confirmed that this is still a problem on Solaris 10 10/08 (u6) but is expected to be fixed with the next release of Solaris 10 (u7). It's Sun bug Id: 2159403. The scenario is this: The VCS cluster node has several IP addresses on the same network. The first being it's actual IP address and the rest are the addresses of each "service group" which are implemented as aliases on the network interface (Sun would call these "floating logical addresses"). The problem is that rpcbind on the Solaris VCS server replies to the Fedora client using the IP address of the node and not the IP address of the "service group" that the client sent it's NFS mount request to. The Fedora 9 and above NFS client suspects a man-in-the-middle type of attack and ignores the rpcbind server responses. Fedora 8 as a client did not mind, but Fedora 9 and above do. On some Linux hosts a workaround is to use TCP mounts instead of UDP (Mandriva 2007 ?). This a Sun bug "It's Sun bugId: 2159403 Sun produced a 'T' patch for me (for SPARC) that fixed this problem. It has been released as a full patch and is: 140917. |