Red Hat Bugzilla – Bug 187951
Replication failover fails if the NFS permissions are incorrect on one of the servers...
Last modified: 2010-10-22 00:48:43 EDT
I followed the steps to reproduce the bug, found in comment #62.
Basically, you have a replicated server entry. In that entry, you add a
bogus localhost mount, as you know that that mount will be selected first
every time. Then, you have another mount or two that will succeed.
When you do an ls of the mountpoint, you will get "No such file or
directory," but the file system actually gets mounted!
What's happening is that we call mount_nfs.c:mount_mount. That function
determines that we want to do a local mount, and so calls into the
mount_bind module's mount_mount. That module will do a mkdir_path for the
directory. But, when the mount fails, it does an rmdir! Since it's the
automount daemon, the rmdir is allowed to succeed (meaning that the process
that triggered the mount is now waiting on an unhashed dentry!).
Now we return to the mount_nfs.c:mount_mount code, which decides to try the
next server in the list. This code will perform another mkdir_path, which
then gets a new dentry and inode. The mount ultimately succeeds, but when
the process that triggered the mount gets woken up, it fails this test at
the end of autofs4_lookup:
* If this dentry is unhashed, then we shouldn't honour this
* lookup even if the dentry is positive. Returning ENOENT here
* doesn't do the right thing for all system calls, but it should
* be OK for the operations we permit from an autofs.
if ( dentry->d_inode && d_unhashed(dentry) )
So, even though the mount succeeded, we return a failure to the caller.
I have one potential solution to the problem, but I'm running it by the upstream
maintainer first to ensure it's the right approach. I'll hopefully have a patch
ready for testing in the near future.
Good to hear that we at least understand what's going on! Thanks for all your
hard work on this.
Created attachment 127923 [details]
Perform another cache lookup when d_unhashed returns true
This patch resolves the problem in my environment. It simply performs another
d_lookup for a name when the autofs4_lookup routine finds that the dentry is
(the patch was created on a rhel3 tree, but applies cleanly to rhel4)
committed in stream U4 build 34.25. A test kernel with this patch is available
I built an updated version of autofs which addresses the problems reported in
this bug. Please test the package found here:
There are builds for every architecture. The changes to this package are fairly
invasive and, as such, I'd like to get testing feedback as soon as possible. If
there is any unexpected change in behaviour, please be sure to report it.
Thanks in advance!
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.