Description of problem: RHEL 3 U3 beta on x86 *** SLQWONG 08/19/04 05:10 pm *** Autofs4 replcation feature does not work. This is how to reproduce the case: Test automap file:(say from auto.maptest) test server1(3),server2(1):/test_dir Both server1 and server2 have the directory /test_dir NFS shared. showmount -e server1 or showmount -e server2 shows the same FS exported. The two problems encountered are 1) server1 is used before server2, although server2 has a lower weight. 2) If we stop the NFS service and/or the ifdown eth0 on server1, while the FS is being used. If you cd to the dir (ie. /maptest/test ), df -k, ls, or anything else in /maptest/test. That window will hang, until server1's NFS and eth0 is up and running. Based on other info gathered, it seems that automount had the IP addr of server1 hardcoded into it, when you run mount, giving you a listing of everything that is mounted or via /proc/mounts server1:/test on /maptest/test type nfs (rw,ghost,bg,nointr,hard,timeo=600,wsize=16384,rsize=16384,nfsvers=3,tcp,noloc k,addr=10.10.2.84) But, if you umount /maptest/test and then stop the NFS service on server1. Then cd to /maptest/test it will go to serverb. Then the IP of serverb will be in /proc/mounts (rw,ghost,bg,nointr,hard,timeo=600,wsize=16384,rsize=16384,nfsvers=3,tcp,noloc k,addr=10.10.2.90) If this is the case. There is no difference to restarting the NFS service or the server. In such a case... then we will still get NFS Stale Handle or commands hanging or Apps hanging. What would be nice is, if server1 goes down, it should be able to go to next server in the replication list (maybe based on weight also). Version-Release number of selected component (if applicable): How reproducible: see above Steps to Reproduce: 1. see above 2. 3. Actual results: Expected results: Additional info:
Autofs takes care of mounting file systems. Once they are mounted, it is out of the loop. If the mount expires and a umount hangs, this is not an autofs problem. In your case, this is an NFS implementation issue. I'll look into issue 1 above. In the future, please only include one bug per bugzilla.
Okay, I'll let the folks here know to isolate the 2nd problem as an NFS issue in another bug.
I just wanted to add that issue 1 from the bug report also occurs for us when not doing any weighting at all. The client always seems to pick the first-listed server. The client in our case was a fully updated U3 box.
Jeremy, I fail to see where this is a problem, unless the distance to the first server is greater than the distance to the second. I can't find any documentation which says the automounter will perform mounts in a round-robin fashion. That's not to say that you can't make a case for choosing a server at random if they are all of equal weight. But, that is an enhancement request, not a bug.
Jeff, Doh. I left out the fact that the first server list is farther away from the client than the first server. If we take the example from above, which is exactly what we're doing, with different mounts and server names: test server1,server2:/test_dir In our environment where we're not seeing the proper behavior, server1 lives at the other end of a T1 line across the country, and server2 is on the same LAN as the client. The client will always mount server1 since it's listed first, when it should be mounting server2. Sorry for leaving that bit off.
Typoed the first line of my response: The first server listed is farther away from the client than the second server.
Since we use a custom kickstart install for our machines, I installed a test machine with RHEL3U3 from CDs. After pointing it at our NIS server , changing /etc/auto.master to point to our NIS auto.master and attempting to mount the same replicated host mount I was previously having trouble with, I'm still seeing the same incorrect behavior: test server1,server2:/testdir server1 is farther away from the client than server2, yet server1 is always mounted.
Created attachment 105520 [details] Patch to fix replicated host mounting comparison logic bug, made against autofs-4.1.3-12 SRPM
Ian Kent looked at the mount_nfs.c code and found a bug in the logic that compares the rpc_ping responses of the servers in a replicated map. I've attached a patch that seems to address the problem for me. As a side note, while working with Ian to get the problem fixed so that we could roll out RHEL3 WS with autofs4, I've also been working with RedHat Global Support. An engineer there also came up with a patch that he says addresses the issue, which I have not yet tested.
Created attachment 105814 [details] Patch to correctly choose a replicated server based on ping and rpc_ping times This patch is from Eric Paris (eparis). I've verified it causes autofs to choose correctly with two and three replicated servers in various orderings.
*** This bug has been marked as a duplicate of 129052 ***
I've grabbed the patch from the upstream autofs site to fix the problem. The fix is in the autofs-4.1.3-43 (RHEL-3), autofs-4.1.3-44 (RHEL-4) & autofs-4.1.3-45 (FC-3).
I was unable to replicate the problem described about w/ weights using autofs-4.1.3-9 or -12.
Just a clarification of my comments. I was able to fix the problem with the automounter incorrectly checking ping times. The fixes in comment #12 will allow the automounter to decide on a host based on its ping time. I was unable to replicate the problem of not selecting the proper host based on its weight.
Does anyone currently on the CC list for this bug have an idea of how long it's going to take for autofs-4.1.3-43 (from comment #12) to get into RHN as an update for RHEL3? Thanks, -- jeremy
Jeremy, It will be released in Update 4. Precisely when that will be released, I do not know. It should be real soon, now.
Thanks Jeff. I just happened to check my open RedHat Global support case regarding this issue and was told the same thing there. For the sake of completeness, I can confirm that replicated host maps appear to be working properly for me now.
I have issues with the replicated host maps in the following format: random_app \ filer1:/vol/vol1/random_app \ filer2:/vol/vol2/random_app \ filer3:/vol/vol3/random_app In this case, the only thing that gets mounted is the last entry. From what I can tell the first two entries do not get rpc_ping'ed which leads me to believe the map parser is simply skipping over the first two entries.
Let me clarify this is using autofs-4.1.3-44.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-520.html