Bug 187951 - Replication failover fails if the NFS permissions are incorrect on one of the servers...
Replication failover fails if the NFS permissions are incorrect on one of the...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jeffrey Moyer
Brock Organ
:
Depends On:
Blocks: 181409
  Show dependency treegraph
 
Reported: 2006-04-04 16:13 EDT by Jeffrey Moyer
Modified: 2010-10-22 00:48 EDT (History)
12 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 19:03:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Perform another cache lookup when d_unhashed returns true (1.10 KB, text/x-patch)
2006-04-18 11:38 EDT, Jeffrey Moyer
no flags Details

  None (edit)
Comment 1 Jeffrey Moyer 2006-04-05 20:14:42 EDT
I followed the steps to reproduce the bug, found in comment #62.
Basically, you have a replicated server entry.  In that entry, you add a
bogus localhost mount, as you know that that mount will be selected first
every time.  Then, you have another mount or two that will succeed.

When you do an ls of the mountpoint, you will get "No such file or
directory," but the file system actually gets mounted!

What's happening is that we call mount_nfs.c:mount_mount.  That function
determines that we want to do a local mount, and so calls into the
mount_bind module's mount_mount.  That module will do a mkdir_path for the
directory.  But, when the mount fails, it does an rmdir!  Since it's the
automount daemon, the rmdir is allowed to succeed (meaning that the process
that triggered the mount is now waiting on an unhashed dentry!).

Now we return to the mount_nfs.c:mount_mount code, which decides to try the
next server in the list.  This code will perform another mkdir_path, which
then gets a new dentry and inode.  The mount ultimately succeeds, but when
the process that triggered the mount gets woken up, it fails this test at
the end of autofs4_lookup:

	/*
	 * If this dentry is unhashed, then we shouldn't honour this
	 * lookup even if the dentry is positive.  Returning ENOENT here
	 * doesn't do the right thing for all system calls, but it should
	 * be OK for the operations we permit from an autofs.
	 */
	if ( dentry->d_inode && d_unhashed(dentry) )
		return ERR_PTR(-ENOENT);

So, even though the mount succeeded, we return a failure to the caller.

I have one potential solution to the problem, but I'm running it by the upstream
maintainer first to ensure it's the right approach.  I'll hopefully have a patch
ready for testing in the near future.

Thanks!
Comment 2 Paul Waterman 2006-04-10 11:43:56 EDT
Good to hear that we at least understand what's going on! Thanks for all your
hard work on this.
Comment 3 Jeffrey Moyer 2006-04-18 11:38:07 EDT
Created attachment 127923 [details]
Perform another cache lookup when d_unhashed returns true

This patch resolves the problem in my environment.  It simply performs another
d_lookup for a name when the autofs4_lookup routine finds that the dentry is
unhashed.

(the patch was created on a rhel3 tree, but applies cleanly to rhel4)
Comment 7 Jason Baron 2006-04-27 11:08:51 EDT
committed in stream U4 build 34.25. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 8 Jeffrey Moyer 2006-05-01 18:55:10 EDT
I built an updated version of autofs which addresses the problems reported in
this bug.  Please test the package found here:

  http://people.redhat.com/jmoyer/rhel4u4-autofs/

There are builds for every architecture.  The changes to this package are fairly
invasive and, as such, I'd like to get testing feedback as soon as possible.  If
there is any unexpected change in behaviour, please be sure to report it.

Thanks in advance!

Jeff
Comment 11 Red Hat Bugzilla 2006-08-10 19:03:03 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html

Note You need to log in before you can comment on or make changes to this bug.