Bug 187951

Summary: Replication failover fails if the NFS permissions are incorrect on one of the servers...
Product: Red Hat Enterprise Linux 4 Reporter: Jeff Moyer <jmoyer>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED ERRATA QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: bakerg3, gregory.baker, ikent, jeremy, jlayton, jmoyer, kanderso, k.georgiou, paulwaterman, strobert, tao, van.okamura
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 23:03:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    
Attachments:
Description Flags
Perform another cache lookup when d_unhashed returns true none

Comment 1 Jeff Moyer 2006-04-06 00:14:42 UTC
I followed the steps to reproduce the bug, found in comment #62.
Basically, you have a replicated server entry.  In that entry, you add a
bogus localhost mount, as you know that that mount will be selected first
every time.  Then, you have another mount or two that will succeed.

When you do an ls of the mountpoint, you will get "No such file or
directory," but the file system actually gets mounted!

What's happening is that we call mount_nfs.c:mount_mount.  That function
determines that we want to do a local mount, and so calls into the
mount_bind module's mount_mount.  That module will do a mkdir_path for the
directory.  But, when the mount fails, it does an rmdir!  Since it's the
automount daemon, the rmdir is allowed to succeed (meaning that the process
that triggered the mount is now waiting on an unhashed dentry!).

Now we return to the mount_nfs.c:mount_mount code, which decides to try the
next server in the list.  This code will perform another mkdir_path, which
then gets a new dentry and inode.  The mount ultimately succeeds, but when
the process that triggered the mount gets woken up, it fails this test at
the end of autofs4_lookup:

	/*
	 * If this dentry is unhashed, then we shouldn't honour this
	 * lookup even if the dentry is positive.  Returning ENOENT here
	 * doesn't do the right thing for all system calls, but it should
	 * be OK for the operations we permit from an autofs.
	 */
	if ( dentry->d_inode && d_unhashed(dentry) )
		return ERR_PTR(-ENOENT);

So, even though the mount succeeded, we return a failure to the caller.

I have one potential solution to the problem, but I'm running it by the upstream
maintainer first to ensure it's the right approach.  I'll hopefully have a patch
ready for testing in the near future.

Thanks!

Comment 2 Paul Waterman 2006-04-10 15:43:56 UTC
Good to hear that we at least understand what's going on! Thanks for all your
hard work on this.

Comment 3 Jeff Moyer 2006-04-18 15:38:07 UTC
Created attachment 127923 [details]
Perform another cache lookup when d_unhashed returns true

This patch resolves the problem in my environment.  It simply performs another
d_lookup for a name when the autofs4_lookup routine finds that the dentry is
unhashed.

(the patch was created on a rhel3 tree, but applies cleanly to rhel4)

Comment 7 Jason Baron 2006-04-27 15:08:51 UTC
committed in stream U4 build 34.25. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 8 Jeff Moyer 2006-05-01 22:55:10 UTC
I built an updated version of autofs which addresses the problems reported in
this bug.  Please test the package found here:

  http://people.redhat.com/jmoyer/rhel4u4-autofs/

There are builds for every architecture.  The changes to this package are fairly
invasive and, as such, I'd like to get testing feedback as soon as possible.  If
there is any unexpected change in behaviour, please be sure to report it.

Thanks in advance!

Jeff

Comment 11 Red Hat Bugzilla 2006-08-10 23:03:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html