Description of problem: Assume, the automountd version 4 is running and normally working properly using the kernel module autofs4 (in kernel-unsupported). Please don't stop reading here, because it might actually be a nfs client problem (with the dcache handling ?) I reported this problem to the kernel mailing list a long time ago and got some response from other folks experiencing the problem, but yet no real solution. Using hierarchical mounts sometimes the access to a currently not mounted directory fails with "Permission denied" or "No such file or directory". The next access to the same filesystem entry works properly, but at that time the running program has already failed. The test scenario unfortunately requires a bit of effort. Assume the following commandline for automountd: /usr/sbin/automount --timeout 1 /home_test file /etc/auto_test rw,grpid,hard,intr and auto_test containing several entries (say 100) of the following kind: nfsserver1 /subdir nfsserver1:/subdir and so on. Then unter /home_test/nfsserver1/subdir the directory /subdir on host nfsserver1 should be accessible. The timeout of 1 second is only chosen to have a short time until the problem can be seen. Now follows a script, that will access 30 of such automounted directories, wait for a random time interval between about 980 and 1111 milliseconds (shell processing time not counted), then access the entries again and so on. After a while the script stops with typically 5 - 10 'Permission denied' or 'No such file or directory' error messages. Here's the script: #!/bin/sh I=1 while [ $I -le 30 ] ; do FILES="$FILES /home_test/nfsserver$I/subdir/affile" I=`expr $I + 1` done while true ; do echo 'trying to touch and ls' touch $FILES if [ $? -ne 0 ] ; then date exit 1 fi ls -ld $FILES date /bin/rm -f /tmp/bla head -4c /dev/random > /tmp/bla # sum command results in 0 - 65535, more likely around 32768 RANDNUM=`sum /tmp/bla | awk '{print $1 * 2}'` # 0 - 131070 WAITTIME=`expr $RANDNUM + 980000` # 980000 - 1111070 ls -ld $FILES>/dev/null usleep $WAITTIME done On RedHat-9 or Enterprise-Beta 3.0 the problem takes much more time to occur then on RedHat-7.2 with the 2.4.18-27.7.xbigmem kernel. Seems that sth. has been fixed something since then, but not definitely. It does not matter, whether it's a x86, x86_64 or ia64 machine, it happens on all of them. It does furthermore not matter, if the map is really hierarchical (having a / entry before the things shown above) or only having the subentry like above. If there is a path componentt, that exists only in memory (what is in fact the autofs), the problem can happen. We have never seen it on non-hierarchical maps. Excessive testing has definitely shown, that the problem happens during mounting, NOT when accessing the directory while it's getting expired. It can even happen on the first access to the directory ever. Here's some patch i received from another guy in Germany, who made the 2.4.9 kernel or higher having the behaviour of 2.4.5. I did not try this patch, but he said, it would solve the problem for his site. But we have much newer kernels here, nonehteless this might give a hint: diff -u -r vanilla-linux/linux/fs/autofs4/expire.c linux/fs/autofs4/expire.c --- vanilla-linux/linux/fs/autofs4/expire.c Tue Jun 12 04:15:27 2001 +++ linux/fs/autofs4/expire.c Wed Apr 24 13:40:23 2002 @@ -66,11 +66,19 @@ non-busy mounts */ static int check_vfsmnt(struct vfsmount *mnt, struct dentry *dentry) { - int ret = dentry->d_mounted; - struct vfsmount *vfs = lookup_mnt(mnt, dentry); + int ret = 0; + struct list_head *tmp; + + list_for_each(tmp, &dentry->d_vfsmnt) { + struct vfsmount *vfs = list_entry(tmp, struct vfsmount, + mnt_clash); + DPRINTK(("check_vfsmnt: mnt=%p, dentry=%p, tmp=%p, vfs=%p\n", + mnt, dentry, tmp, vfs)); + if (vfs->mnt_parent != mnt || /* don't care about busy-ness of other namespaces */ + !is_vfsmnt_tree_busy(vfs)) + ret++; + } - if (vfs && is_vfsmnt_tree_busy(vfs)) - ret--; DPRINTK(("check_vfsmnt: ret=%d\n", ret)); return ret; } diff -u -r vanilla-linux/linux/fs/dcache.c linux/fs/dcache.c --- vanilla-linux/linux/fs/dcache.c Mon Feb 25 20:38:08 2002 +++ linux/fs/dcache.c Fri Apr 26 09:34:57 2002 @@ -616,6 +616,7 @@ dentry->d_name.hash = name->hash; dentry->d_op = NULL; dentry->d_fsdata = NULL; + INIT_LIST_HEAD(&dentry->d_vfsmnt); dentry->d_mounted = 0; INIT_LIST_HEAD(&dentry->d_hash); INIT_LIST_HEAD(&dentry->d_lru); diff -u -r vanilla-linux/linux/fs/namei.c linux/fs/namei.c --- vanilla-linux/linux/fs/namei.c Mon Feb 25 20:38:09 2002 +++ linux/fs/namei.c Fri Apr 26 09:38:05 2002 @@ -381,9 +381,25 @@ static inline int __follow_down(struct vfsmount **mnt, struct dentry **dentry) { + struct list_head *p; struct vfsmount *mounted; spin_lock(&dcache_lock); + p = (*dentry)->d_vfsmnt.next; + while (p != &(*dentry)->d_vfsmnt) { + struct vfsmount *tmp; + tmp = list_entry(p, struct vfsmount, mnt_clash); + if (tmp->mnt_parent == *mnt) { + *mnt = mntget(tmp); + spin_unlock(&dcache_lock); + mntput(tmp->mnt_parent); + /* tmp holds the mountpoint, so... */ + dput(*dentry); + *dentry = dget(tmp->mnt_root); + return 1; + } + p = p->next; + } mounted = lookup_mnt(*mnt, *dentry); if (mounted) { *mnt = mntget(mounted); diff -u -r vanilla-linux/linux/fs/namespace.c linux/fs/namespace.c --- vanilla-linux/linux/fs/namespace.c Mon Feb 25 20:38:09 2002 +++ linux/fs/namespace.c Fri Apr 26 09:32:25 2002 @@ -50,6 +50,7 @@ memset(mnt, 0, sizeof(struct vfsmount)); atomic_set(&mnt->mnt_count,1); INIT_LIST_HEAD(&mnt->mnt_hash); + INIT_LIST_HEAD(&mnt->mnt_clash); INIT_LIST_HEAD(&mnt->mnt_child); INIT_LIST_HEAD(&mnt->mnt_mounts); INIT_LIST_HEAD(&mnt->mnt_list); @@ -111,6 +112,7 @@ mnt->mnt_mountpoint = mnt->mnt_root; list_del_init(&mnt->mnt_child); list_del_init(&mnt->mnt_hash); + list_del_init(&mnt->mnt_clash); old_nd->dentry->d_mounted--; } @@ -118,6 +120,7 @@ { mnt->mnt_parent = mntget(nd->mnt); mnt->mnt_mountpoint = dget(nd->dentry); + list_add(&mnt->mnt_clash, &nd->dentry->d_vfsmnt); list_add(&mnt->mnt_hash, mount_hashtable+hash(nd->mnt, nd->dentry)); list_add(&mnt->mnt_child, &nd->mnt->mnt_mounts); nd->dentry->d_mounted++; diff -u -r vanilla-linux/linux/include/linux/dcache.h linux/include/linux/dcache.h --- vanilla-linux/linux/include/linux/dcache.h Thu Nov 22 20:46:18 2001 +++ linux/include/linux/dcache.h Sat Apr 27 10:04:50 2002 @@ -68,6 +68,7 @@ unsigned int d_flags; struct inode * d_inode; /* Where the name belongs to - NULL is negative */ struct dentry * d_parent; /* parent directory */ + struct list_head d_vfsmnt; struct list_head d_hash; /* lookup hash list */ struct list_head d_lru; /* d_count = 0 LRU list */ struct list_head d_child; /* child of parent list */ @@ -268,7 +269,7 @@ static __inline__ int d_mountpoint(struct dentry *dentry) { - return dentry->d_mounted; + return !list_empty(&dentry->d_vfsmnt); } extern struct vfsmount *lookup_mnt(struct vfsmount *, struct dentry *); diff -u -r vanilla-linux/linux/include/linux/mount.h linux/include/linux/mount.h --- vanilla-linux/linux/include/linux/mount.h Fri Oct 5 22:05:55 2001 +++ linux/include/linux/mount.h Fri Apr 26 09:23:21 2002 @@ -20,6 +20,7 @@ { struct list_head mnt_hash; struct vfsmount *mnt_parent; /* fs we are mounted on */ + struct list_head mnt_clash; struct dentry *mnt_mountpoint; /* dentry of mountpoint */ struct dentry *mnt_root; /* root of the mounted tree */ struct super_block *mnt_sb; /* pointer to superblock */ Version-Release number of selected component (if applicable): any kernel/autofs4 on RedHat since 7.2, probably before, but we didn't have earlier releases in production here. How reproducible: see above Steps to Reproduce: 1. create hierarchical map, start automountd 2. access autofs directories 3. wait for expiry, access again Actual results: with the chance of maybe 0.1 % the access fails with EPERM or ENOENT Expected results: Automounted directory gets accessed as normal Additional info: see above
It's surely not a server problem. In case the access fails, there are no respective network NFS packages being received by the server or sent by the client, so the problem must be locally on the NFS client.
with the following values the problem appears much more often (!?!): RANDNUM=`sum /tmp/bla | awk '{print $1 * 20}'` # 0 - 1310700 WAITTIME=`expr '(' $RANDNUM / 2 ')' + 980000` # 980000 - 1635350 us (that is: more unmounts and mounts increase likeliness) Additional Info: It does not matter, whether it's a single or multiprocessor machine.
still a problem with current releases ?
Still testing with 2.4.21-20 and autofs-4.1.3. Please leave case open. Thanks !
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
Seems to be fixed with 4.1.3-12 (or probably later ? - doesn't matter) Have different problem now: The latest 4.1.3 autofs versions use unprivileged ports for NFS RPC test accesses, what is unacceptable from a security point of view. Anyway, i don't mind closing this particular case now.
This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
This issue has been resolved. If you are still experiencing automount issues, please file another bug. Include only one bug per bugzilla, please. Thanks.