Bug 150631

Summary: automount daemon doesn't time all hosts within replicated maps
Product: Red Hat Enterprise Linux 4 Reporter: Andy Jaquysh <andy.jaquysh>
Component: autofsAssignee: Chris Feist <cfeist>
Status: CLOSED ERRATA QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: cfeist, jmoyer
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-03-10 13:09:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Jaquysh 2005-03-09 01:31:18 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1)
Gecko/20020920 Netscape/7.0

Description of problem:
All of the versions of automount I've tested have a bug in the
code for timing the "network distance" to the NFS servers in a 
replicated automount map.

An automount map with the following entry:

sunrise -intr  aserv:/sunrise1 bserv:/sunrise2

Will almost always mount the first entry (sunrise from aserv),
even if aserv is on a wide area network and bserv is on the
local area network.  This is due to the fact the method for
timing the NFS remote procedure call is not called against
the second server (bserv).

In the source code to autofs-4.1.3-13 the modules/mount_nfs.c file
has the following lines:

216             /* compare RPC times if there are no weighted hosts */
217             else if (winner_weight == INT_MAX) {
218                     double resp_time;
219
220                     /* did we time the first winner? */
221                     if (winner_time == 0) {
222                             if (rpc_time(winner, sec, micros,
&resp_time))
223                                     winner_time = resp_time;
224                             else
225                                     winner_time = 6;
226                     }
227
228                     if (rpc_time(winner, sec, micros, &resp_time)) {
229                             if (resp_time < winner_time) {
230                                     winner = p;
231                                     winner_time = resp_time;
232                             }
233                     }
234             }

This code results in rpc_time being called on "winner" repeatedly. 
"winner"
is set to the first host in the replicated map.

The fix is to change the code in the second if statement (line 228) from:

228                     if (rpc_time(winner, sec, micros, &resp_time)) {
to:
228                     if (rpc_time(p, sec, micros, &resp_time)) {

Later versions of autofs have this bug "optimized" into the code.  It
was observed the same function call with identical parameters was
called inside the if statements, so the rpc_time call was moved outside 
if statements and called once.

I imagine the fix is similiar for later versions of automount, but I
haven't tested it out.


Version-Release number of selected component (if applicable):
autofs-3.1.7-13 autofs-3.1.7-42

How reproducible:
Always

Steps to Reproduce:
1.Create a replicated automount map with two different servers and
file system paths.  An ideal scenario would be one local NFS server
and one wide area network NFS server.  Do not assign weights to the
replicated systems.
2. Allow automount to mount the NFS share
3. The mounted share will almost always be from the first entry in the
replicated map despite any network speed differences between the servers.
    

Actual Results:  The autofs mounted file system will almost always be
the first replicated entry in the automount map.  This problem is most
noticeable
when using replicated automount entries in a wide area network.

Expected Results:  The client should always mount from the server
which responds quickest to the initial rpc call.  

Additional info:

Comment 1 Jeff Moyer 2005-03-09 16:24:09 UTC
> Version-Release number of selected component (if applicable):
> autofs-3.1.7-13 autofs-3.1.7-42

Surely, you mean autofs-4.1.3-??, right?  Please provide the exact version of
the package.

Chris, could you check to see if this problem has been addressed in the latest
package?

Comment 2 Andy Jaquysh 2005-03-09 18:42:05 UTC
This bug exists in autofs-4.1.3-67 and in earlier versions. 


Comment 4 Chris Feist 2005-03-10 13:09:33 UTC
This bug has been fixed in RHEL4U1 (& RHEL3U5) and submitter has
verified the fix.