Bug 106044 - access failures on hierarchical mounts
Summary: access failures on hierarchical mounts
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Moyer
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-10-02 08:46 UTC by Albert Fluegel
Modified: 2007-11-30 22:10 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-03 13:34:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Albert Fluegel 2003-10-02 08:46:41 UTC
Description of problem:

Assume, the automountd version 4 is running and normally working
properly using the kernel module autofs4 (in kernel-unsupported).
Please don't stop reading here, because it might actually be a
nfs client problem (with the dcache handling ?) I reported this
problem to the kernel mailing list a long time ago and got some
response from other folks experiencing the problem, but yet no
real solution.
Using hierarchical mounts sometimes the access to a currently
not mounted directory fails with "Permission denied" or
"No such file or directory". The next access to the same filesystem
entry works properly, but at that time the running program has
already failed. The test scenario unfortunately requires a bit of
effort.
Assume the following commandline for automountd:
/usr/sbin/automount --timeout 1 /home_test file /etc/auto_test rw,grpid,hard,intr
and auto_test containing several entries (say 100) of the following
kind:
nfsserver1 /subdir nfsserver1:/subdir
and so on. Then unter /home_test/nfsserver1/subdir the directory
/subdir on host nfsserver1 should be accessible. The timeout of 1
second is only chosen to have a short time until the problem can
be seen.
Now follows a script, that will access 30 of such automounted
directories, wait for a random time interval between about 980 and
1111 milliseconds (shell processing time not counted), then access
the entries again and so on. After a while the script stops with
typically 5 - 10 'Permission denied' or 'No such file or directory'
error messages. Here's the script:

#!/bin/sh

I=1
while [ $I -le 30 ] ; do
  FILES="$FILES /home_test/nfsserver$I/subdir/affile"
  I=`expr $I + 1`
done

while true ; do
  echo 'trying to touch and ls'
  touch $FILES
  if [ $? -ne 0 ] ; then
    date
    exit 1
  fi
  ls -ld $FILES

  date

  /bin/rm -f /tmp/bla
  head -4c /dev/random > /tmp/bla	# sum command results in 0 - 65535, more
likely around 32768
  RANDNUM=`sum /tmp/bla | awk '{print $1 * 2}'`		# 0 - 131070
  WAITTIME=`expr $RANDNUM + 980000`	# 980000 - 1111070

  ls -ld $FILES>/dev/null
  usleep $WAITTIME
done

On RedHat-9 or Enterprise-Beta 3.0 the problem takes much more time
to occur then on RedHat-7.2 with the 2.4.18-27.7.xbigmem kernel. Seems
that sth. has been fixed something since then, but not definitely. It
does not matter, whether it's a x86, x86_64 or ia64 machine, it happens
on all of them. It does furthermore not matter, if the map is really
hierarchical (having a / entry before the things shown above) or only
having the subentry like above. If there is a path componentt, that
exists only in memory (what is in fact the autofs), the problem can
happen. We have never seen it on non-hierarchical maps.
Excessive testing has definitely shown, that the problem happens during
mounting, NOT when accessing the directory while it's getting expired.
It can even happen on the first access to the directory ever.

Here's some patch i received from another guy in Germany, who made the 2.4.9
kernel or higher having the behaviour of 2.4.5. I did not try this patch, but
he said, it would solve the problem for his site. But we have much newer
kernels here, nonehteless this might give a hint:

diff -u -r vanilla-linux/linux/fs/autofs4/expire.c linux/fs/autofs4/expire.c
--- vanilla-linux/linux/fs/autofs4/expire.c	Tue Jun 12 04:15:27 2001
+++ linux/fs/autofs4/expire.c	Wed Apr 24 13:40:23 2002
@@ -66,11 +66,19 @@
    non-busy mounts */
 static int check_vfsmnt(struct vfsmount *mnt, struct dentry *dentry)
 {
-	int ret = dentry->d_mounted;
-	struct vfsmount *vfs = lookup_mnt(mnt, dentry);
+	int ret = 0;
+	struct list_head *tmp;
+
+	list_for_each(tmp, &dentry->d_vfsmnt) {
+		struct vfsmount *vfs = list_entry(tmp, struct vfsmount, 
+						  mnt_clash);
+		DPRINTK(("check_vfsmnt: mnt=%p, dentry=%p, tmp=%p, vfs=%p\n",
+			 mnt, dentry, tmp, vfs));
+		if (vfs->mnt_parent != mnt || /* don't care about busy-ness of other
namespaces */
+		    !is_vfsmnt_tree_busy(vfs))
+			ret++;
+	}
 
-	if (vfs && is_vfsmnt_tree_busy(vfs))
-		ret--;
 	DPRINTK(("check_vfsmnt: ret=%d\n", ret));
 	return ret;
 }
diff -u -r vanilla-linux/linux/fs/dcache.c linux/fs/dcache.c
--- vanilla-linux/linux/fs/dcache.c	Mon Feb 25 20:38:08 2002
+++ linux/fs/dcache.c	Fri Apr 26 09:34:57 2002
@@ -616,6 +616,7 @@
 	dentry->d_name.hash = name->hash;
 	dentry->d_op = NULL;
 	dentry->d_fsdata = NULL;
+	INIT_LIST_HEAD(&dentry->d_vfsmnt);
 	dentry->d_mounted = 0;
 	INIT_LIST_HEAD(&dentry->d_hash);
 	INIT_LIST_HEAD(&dentry->d_lru);
diff -u -r vanilla-linux/linux/fs/namei.c linux/fs/namei.c
--- vanilla-linux/linux/fs/namei.c	Mon Feb 25 20:38:09 2002
+++ linux/fs/namei.c	Fri Apr 26 09:38:05 2002
@@ -381,9 +381,25 @@
 
 static inline int __follow_down(struct vfsmount **mnt, struct dentry **dentry)
 {
+	struct list_head *p;
 	struct vfsmount *mounted;
 
 	spin_lock(&dcache_lock);
+	p = (*dentry)->d_vfsmnt.next;
+	while (p != &(*dentry)->d_vfsmnt) {
+		struct vfsmount *tmp;
+		tmp = list_entry(p, struct vfsmount, mnt_clash);
+		if (tmp->mnt_parent == *mnt) {
+			*mnt = mntget(tmp);
+			spin_unlock(&dcache_lock);
+			mntput(tmp->mnt_parent);
+			/* tmp holds the mountpoint, so... */
+			dput(*dentry);
+			*dentry = dget(tmp->mnt_root);
+			return 1;
+		}
+		p = p->next;
+	}
 	mounted = lookup_mnt(*mnt, *dentry);
 	if (mounted) {
 		*mnt = mntget(mounted);
diff -u -r vanilla-linux/linux/fs/namespace.c linux/fs/namespace.c
--- vanilla-linux/linux/fs/namespace.c	Mon Feb 25 20:38:09 2002
+++ linux/fs/namespace.c	Fri Apr 26 09:32:25 2002
@@ -50,6 +50,7 @@
 		memset(mnt, 0, sizeof(struct vfsmount));
 		atomic_set(&mnt->mnt_count,1);
 		INIT_LIST_HEAD(&mnt->mnt_hash);
+		INIT_LIST_HEAD(&mnt->mnt_clash);
 		INIT_LIST_HEAD(&mnt->mnt_child);
 		INIT_LIST_HEAD(&mnt->mnt_mounts);
 		INIT_LIST_HEAD(&mnt->mnt_list);
@@ -111,6 +112,7 @@
 	mnt->mnt_mountpoint = mnt->mnt_root;
 	list_del_init(&mnt->mnt_child);
 	list_del_init(&mnt->mnt_hash);
+	list_del_init(&mnt->mnt_clash);
 	old_nd->dentry->d_mounted--;
 }
 
@@ -118,6 +120,7 @@
 {
 	mnt->mnt_parent = mntget(nd->mnt);
 	mnt->mnt_mountpoint = dget(nd->dentry);
+	list_add(&mnt->mnt_clash, &nd->dentry->d_vfsmnt);
 	list_add(&mnt->mnt_hash, mount_hashtable+hash(nd->mnt, nd->dentry));
 	list_add(&mnt->mnt_child, &nd->mnt->mnt_mounts);
 	nd->dentry->d_mounted++;
diff -u -r vanilla-linux/linux/include/linux/dcache.h linux/include/linux/dcache.h
--- vanilla-linux/linux/include/linux/dcache.h	Thu Nov 22 20:46:18 2001
+++ linux/include/linux/dcache.h	Sat Apr 27 10:04:50 2002
@@ -68,6 +68,7 @@
 	unsigned int d_flags;
 	struct inode  * d_inode;	/* Where the name belongs to - NULL is negative */
 	struct dentry * d_parent;	/* parent directory */
+	struct list_head d_vfsmnt;
 	struct list_head d_hash;	/* lookup hash list */
 	struct list_head d_lru;		/* d_count = 0 LRU list */
 	struct list_head d_child;	/* child of parent list */
@@ -268,7 +269,7 @@
 
 static __inline__ int d_mountpoint(struct dentry *dentry)
 {
-	return dentry->d_mounted;
+	return !list_empty(&dentry->d_vfsmnt);
 }
 
 extern struct vfsmount *lookup_mnt(struct vfsmount *, struct dentry *);
diff -u -r vanilla-linux/linux/include/linux/mount.h linux/include/linux/mount.h
--- vanilla-linux/linux/include/linux/mount.h	Fri Oct  5 22:05:55 2001
+++ linux/include/linux/mount.h	Fri Apr 26 09:23:21 2002
@@ -20,6 +20,7 @@
 {
 	struct list_head mnt_hash;
 	struct vfsmount *mnt_parent;	/* fs we are mounted on */
+	struct list_head mnt_clash;
 	struct dentry *mnt_mountpoint;	/* dentry of mountpoint */
 	struct dentry *mnt_root;	/* root of the mounted tree */
 	struct super_block *mnt_sb;	/* pointer to superblock */

Version-Release number of selected component (if applicable):
any kernel/autofs4 on RedHat since 7.2, probably before, but
we didn't have earlier releases in production here.

How reproducible:
see above

Steps to Reproduce:
1. create hierarchical map, start automountd
2. access autofs directories
3. wait for expiry, access again
    
Actual results:
with the chance of maybe 0.1 % the access fails with EPERM
or ENOENT


Expected results:
Automounted directory gets accessed as normal


Additional info:
see above

Comment 1 Albert Fluegel 2003-10-02 08:55:56 UTC
It's surely not a server problem. In case the access fails, there
are no respective network NFS packages being received by the server
or sent by the client, so the problem must be locally on the
NFS client.


Comment 2 Albert Fluegel 2003-10-02 09:27:29 UTC
with the following values the problem appears much more often (!?!):

  RANDNUM=`sum /tmp/bla | awk '{print $1 * 20}'`		# 0 - 1310700
  WAITTIME=`expr '(' $RANDNUM / 2 ')' + 980000`	# 980000 - 1635350 us

(that is: more unmounts and mounts increase likeliness)

Additional Info: It does not matter, whether it's a single or multiprocessor
machine.


Comment 3 Dave Jones 2004-11-20 02:04:29 UTC
still a problem with current releases ?

Comment 4 Albert Fluegel 2004-11-22 08:28:19 UTC
Still testing with 2.4.21-20 and autofs-4.1.3.
Please leave case open. Thanks !

Comment 5 Dave Jones 2005-07-15 20:31:56 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 6 Albert Fluegel 2005-07-18 06:47:49 UTC
Seems to be fixed with 4.1.3-12 (or probably later ? - doesn't matter)
Have different problem now: The latest 4.1.3 autofs versions
use unprivileged ports for NFS RPC test accesses, what is
unacceptable from a security point of view. Anyway, i
don't mind closing this particular case now.

Comment 7 Dave Jones 2006-01-16 22:22:29 UTC
This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.


Comment 8 Dave Jones 2006-02-03 07:19:29 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 9 Jeff Moyer 2006-02-03 13:34:59 UTC
This issue has been resolved.  If you are still experiencing automount issues,
please file another bug.  Include only one bug per bugzilla, please.

Thanks.


Note You need to log in before you can comment on or make changes to this bug.