Bug 130467

Summary: autofs4 replication feature does not work
Product: Red Hat Enterprise Linux 3 Reporter: Van Okamura <van.okamura>
Component: autofsAssignee: Chris Feist <cfeist>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: dlehman, jeremy, jmoyer, k.georgiou, linux_admin, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-21 14:54:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to fix replicated host mounting comparison logic bug, made against autofs-4.1.3-12 SRPM
none
Patch to correctly choose a replicated server based on ping and rpc_ping times none

Description Van Okamura 2004-08-20 17:46:49 UTC
Description of problem:
RHEL 3 U3 beta on x86

*** SLQWONG  08/19/04 05:10 pm ***
Autofs4 replcation feature does not work.

This is how to reproduce the case:
Test automap file:(say from auto.maptest)
test      server1(3),server2(1):/test_dir

Both server1 and server2 have the directory /test_dir NFS shared.
showmount -e server1 or showmount -e server2 shows the same FS exported.

The two problems encountered are
1) server1 is used before server2, although server2 has a lower weight.
2) If we stop the NFS service and/or the ifdown eth0 on server1, while
the FS is being used.  If you cd to the dir (ie. /maptest/test ), df
-k, ls, or anything else in /maptest/test.  That window will hang,
until server1's NFS and eth0 is up and running.

Based on other info gathered, it seems that automount had the IP addr
of server1 hardcoded into it, when you run mount, giving you a listing
of everything that is mounted or via /proc/mounts

server1:/test on /maptest/test type nfs
(rw,ghost,bg,nointr,hard,timeo=600,wsize=16384,rsize=16384,nfsvers=3,tcp,noloc
k,addr=10.10.2.84)

But, if you umount /maptest/test and then stop the NFS service on
server1.  Then cd to /maptest/test it will go to serverb.

Then the IP of serverb will be in /proc/mounts
(rw,ghost,bg,nointr,hard,timeo=600,wsize=16384,rsize=16384,nfsvers=3,tcp,noloc
k,addr=10.10.2.90)

If this is the case.  There is no difference to restarting the NFS
service or the server.  In such a case... then we will still get NFS
Stale Handle or commands hanging or Apps hanging.

What would be nice is, if server1 goes down, it should be able to go
to next server in the replication list (maybe based on weight also). 

Version-Release number of selected component (if applicable):


How reproducible:
see above

Steps to Reproduce:
1.  see above
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jeff Moyer 2004-08-20 17:54:25 UTC
Autofs takes care of mounting file systems.  Once they are mounted, it
is out of the loop.  If the mount expires and a umount hangs, this is
not an autofs problem.  In your case, this is an NFS implementation issue.

I'll look into issue 1 above.  In the future, please only include one
bug per bugzilla.

Comment 2 Van Okamura 2004-08-20 18:05:31 UTC
Okay, I'll let the folks here know to isolate the 2nd problem as an
NFS issue in another bug.

Comment 3 Jeremy Rosengren 2004-09-10 21:43:27 UTC
I just wanted to add that issue 1 from the bug report also occurs for
us when not doing any weighting at all.  The client always seems to
pick the first-listed server.  The client in our case was a fully
updated U3 box.



Comment 4 Jeff Moyer 2004-09-10 21:54:23 UTC
Jeremy,

I fail to see where this is a problem, unless the distance to the
first server is greater than the distance to the second.  I can't find
any documentation which says the automounter will perform mounts in a
round-robin fashion.

That's not to say that you can't make a case for choosing a server at
random if they are all of equal weight.  But, that is an enhancement
request, not a bug.

Comment 5 Jeremy Rosengren 2004-09-11 01:08:49 UTC
Jeff,

Doh.  I left out the fact that the first server list is farther away
from the client than the first server.  If we take the example from
above, which is exactly what we're doing, with different mounts and
server names:

test      server1,server2:/test_dir

In our environment where we're not seeing the proper behavior, server1
lives at the other end of a T1 line across the country, and server2 is
on the same LAN as the client.  The client will always mount server1
since it's listed first, when it should be mounting server2.

Sorry for leaving that bit off.

Comment 6 Jeremy Rosengren 2004-09-11 01:10:37 UTC
Typoed the first line of my response:

The first server listed is farther away from the client than the
second server.

Comment 7 Jeremy Rosengren 2004-09-20 18:08:45 UTC
Since we use a custom kickstart install for our machines, I installed
a test machine with RHEL3U3 from CDs.  After pointing it at our NIS
server , changing /etc/auto.master to point to our NIS auto.master and
attempting to mount the same replicated host mount I was previously
having trouble with, I'm still seeing the same incorrect behavior:

test     server1,server2:/testdir

server1 is farther away from the client than server2, yet server1 is
always mounted.

Comment 8 Jeremy Rosengren 2004-10-20 16:36:20 UTC
Created attachment 105520 [details]
Patch to fix replicated host mounting comparison logic bug, made against autofs-4.1.3-12 SRPM

Comment 9 Jeremy Rosengren 2004-10-20 16:37:00 UTC
Ian Kent looked at the mount_nfs.c code and found a bug in the logic
that compares the rpc_ping responses of the servers in a replicated
map.  I've attached a patch that seems to address the problem for me.

As a side note, while working with Ian to get the problem fixed so
that we could roll out RHEL3 WS with autofs4, I've also been working
with RedHat Global Support.  An engineer there also came up with a
patch that he says addresses the issue, which I have not yet tested.

Comment 10 David Lehman 2004-10-26 20:52:59 UTC
Created attachment 105814 [details]
Patch to correctly choose a replicated server based on ping and rpc_ping times

This patch is from Eric Paris (eparis). I've verified it causes
autofs to choose correctly with two and three replicated servers in various
orderings.

Comment 11 Jeff Moyer 2004-10-28 20:27:41 UTC

*** This bug has been marked as a duplicate of 129052 ***

Comment 12 Chris Feist 2004-11-08 17:46:31 UTC
I've grabbed the patch from the upstream autofs site to fix the
problem.  The fix is in the autofs-4.1.3-43 (RHEL-3), autofs-4.1.3-44
(RHEL-4) & autofs-4.1.3-45 (FC-3).

Comment 13 Chris Feist 2004-11-12 20:38:40 UTC
I was unable to replicate the problem described about w/ weights using
autofs-4.1.3-9 or -12.

Comment 14 Chris Feist 2004-11-18 20:37:00 UTC
Just a clarification of my comments.  I was able to fix the problem
with the automounter incorrectly checking ping times.  The fixes in
comment #12 will allow the automounter to decide on a host based on
its ping time.

I was unable to replicate the problem of not selecting the proper host
based on its weight.

Comment 15 Jeremy Rosengren 2004-11-22 22:13:59 UTC
Does anyone currently on the CC list for this bug have an idea of how
long it's going to take for autofs-4.1.3-43 (from comment #12) to get
into RHN as an update for RHEL3?

Thanks,

-- jeremy

Comment 16 Jeff Moyer 2004-11-22 22:29:18 UTC
Jeremy,

It will be released in Update 4.  Precisely when that will be released, I do not
know.  It should be real soon, now.


Comment 17 Jeremy Rosengren 2004-11-22 22:32:14 UTC
Thanks Jeff.  I just happened to check my open RedHat Global support
case regarding this issue and was told the same thing there.

For the sake of completeness, I can confirm that replicated host maps
appear to be working properly for me now.

Comment 18 James Collins 2004-12-16 00:06:03 UTC
I have issues with the replicated host maps in the following format:

random_app \
filer1:/vol/vol1/random_app \
filer2:/vol/vol2/random_app \
filer3:/vol/vol3/random_app

In this case, the only thing that gets mounted is the last entry. 
From what I can tell the first two entries do not get rpc_ping'ed
which leads me to believe the map parser is simply skipping over the
first two entries.

Comment 19 James Collins 2004-12-16 00:13:48 UTC
Let me clarify this is using autofs-4.1.3-44.


Comment 20 John Flanagan 2004-12-21 14:54:29 UTC
An advisory has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-520.html