I followed the steps to reproduce the bug, found in comment #62. Basically, you have a replicated server entry. In that entry, you add a bogus localhost mount, as you know that that mount will be selected first every time. Then, you have another mount or two that will succeed. When you do an ls of the mountpoint, you will get "No such file or directory," but the file system actually gets mounted! What's happening is that we call mount_nfs.c:mount_mount. That function determines that we want to do a local mount, and so calls into the mount_bind module's mount_mount. That module will do a mkdir_path for the directory. But, when the mount fails, it does an rmdir! Since it's the automount daemon, the rmdir is allowed to succeed (meaning that the process that triggered the mount is now waiting on an unhashed dentry!). Now we return to the mount_nfs.c:mount_mount code, which decides to try the next server in the list. This code will perform another mkdir_path, which then gets a new dentry and inode. The mount ultimately succeeds, but when the process that triggered the mount gets woken up, it fails this test at the end of autofs4_lookup: /* * If this dentry is unhashed, then we shouldn't honour this * lookup even if the dentry is positive. Returning ENOENT here * doesn't do the right thing for all system calls, but it should * be OK for the operations we permit from an autofs. */ if ( dentry->d_inode && d_unhashed(dentry) ) return ERR_PTR(-ENOENT); So, even though the mount succeeded, we return a failure to the caller. I have one potential solution to the problem, but I'm running it by the upstream maintainer first to ensure it's the right approach. I'll hopefully have a patch ready for testing in the near future. Thanks!
Good to hear that we at least understand what's going on! Thanks for all your hard work on this.
Created attachment 127923 [details] Perform another cache lookup when d_unhashed returns true This patch resolves the problem in my environment. It simply performs another d_lookup for a name when the autofs4_lookup routine finds that the dentry is unhashed. (the patch was created on a rhel3 tree, but applies cleanly to rhel4)
committed in stream U4 build 34.25. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
I built an updated version of autofs which addresses the problems reported in this bug. Please test the package found here: http://people.redhat.com/jmoyer/rhel4u4-autofs/ There are builds for every architecture. The changes to this package are fairly invasive and, as such, I'd like to get testing feedback as soon as possible. If there is any unexpected change in behaviour, please be sure to report it. Thanks in advance! Jeff
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html