708039 – dentry still in use on umount of autofs mounted nfs4 filesystem

Bug 708039 - dentry still in use on umount of autofs mounted nfs4 filesystem

Summary: dentry still in use on umount of autofs mounted nfs4 filesystem

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	David Howells
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	708099 715374 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-05-26 15:41 UTC by Jeff Layton
Modified:	2014-06-18 07:41 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-09-26 18:52:37 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
info on root dentry, post-oops (4.68 KB, text/plain) 2011-05-27 13:51 UTC, Jeff Layton	no flags	Details
git bisect log and slightly different oops (5.54 KB, text/plain) 2011-05-27 20:48 UTC, Jeff Layton	no flags	Details
final git bisect log (2.00 KB, text/plain) 2011-05-27 21:27 UTC, Jeff Layton	no flags	Details
pre and post bad patch dentry info (4.23 KB, text/plain) 2011-06-01 12:39 UTC, Jeff Layton	no flags	Details
debug patch -- instrument nfs_d_automount (1.61 KB, patch) 2011-06-10 13:47 UTC, Jeff Layton	no flags	Details \| Diff
Patch - VFS: Fix vfsmount overput on simultaneous automount (6.02 KB, patch) 2011-06-16 15:25 UTC, Ian Kent	no flags	Details \| Diff
Patch - VFS: Fix vfsmount overput on simultaneous automount (F15) (5.99 KB, patch) 2011-06-16 15:54 UTC, Ian Kent	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Linux Kernel	30882	0	None	None	None	Never

Description Jeff Layton 2011-05-26 15:41:31 UTC

On my main workstation at home, I have autofs set up to mount the pseudoroot of my main fileserver (details on this to follow). When I mount that root and do even a small bit of I/O to it, and then let the mount expire, I get a BUG() pop on umount:

[  523.953284] BUG: Dentry ffff8801f3084180{i=2,n=} still in use (1) [unmount of nfs4 0:2c]
[  523.953306] ------------[ cut here ]------------
[  523.954013] kernel BUG at fs/dcache.c:925!
[  523.954013] invalid opcode: 0000 [#1] SMP 
[  523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent
[  523.954013] CPU 1 
[  523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi snd_seq_device snd_pcm snd_timer snd uvcvideo ppdev videodev soundcore media sp5100_tco v4l2_compat_ioctl32 edac_core parport_pc snd_page_alloc i2c_piix4 edac_mce_amd k10temp parport wmi r8169 microcode mii virtio_net kvm_amd kvm ipv6 ata_generic pata_acpi pata_atiixp radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[  523.954013] 
[  523.954013] Pid: 8387, comm: umount.nfs4 Not tainted 2.6.39-1.fc16.x86_64 #1 BIOSTAR Group TA790GX 128M/TA790GX 128M
[  523.954013] RIP: 0010:[<ffffffff81131788>]  [<ffffffff81131788>] shrink_dcache_for_umount_subtree+0x104/0x1c9
[  523.954013] RSP: 0018:ffff8801f2385d98  EFLAGS: 00010296
[  523.954013] RAX: 0000000000000062 RBX: ffff8801f3084180 RCX: 00000000000027a7
[  523.954013] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
[  523.954013] RBP: ffff8801f2385dc8 R08: 0000000000000002 R09: 0000ffff00066c0a
[  523.954013] R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff8801f3084180
[  523.954013] R13: ffff8801f31305f0 R14: ffff8801f30ec5f0 R15: 0000000000000000
[  523.954013] FS:  00007f08e468d720(0000) GS:ffff88022fc80000(0000) knlGS:00000000f770f870
[  523.954013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  523.954013] CR2: 00007fff8fc29ff8 CR3: 00000001f7ac4000 CR4: 00000000000006e0
[  523.954013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  523.954013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  523.954013] Process umount.nfs4 (pid: 8387, threadinfo ffff8801f2384000, task ffff8801f7a61710)
[  523.954013] Stack:
[  523.954013]  ffff8801fdd74a58 0000000000000000 ffff8801fdd74800 ffff8801f3084240
[  523.954013]  ffff8801f2385f28 ffff8801f30ec540 ffff8801f2385de8 ffffffff81132063
[  523.954013]  ffff8801fdd74800 ffffffffa0542300 ffff8801f2385e18 ffffffff81121f15
[  523.954013] Call Trace:
[  523.954013]  [<ffffffff81132063>] shrink_dcache_for_umount+0x4d/0x5f
[  523.954013]  [<ffffffff81121f15>] generic_shutdown_super+0x23/0xde
[  523.954013]  [<ffffffff8112204d>] kill_anon_super+0x13/0x4c
[  523.954013]  [<ffffffffa051c436>] nfs4_kill_super+0x44/0x77 [nfs]
[  523.954013]  [<ffffffff81122271>] deactivate_locked_super+0x26/0x4b
[  523.954013]  [<ffffffff81122a83>] deactivate_super+0x37/0x3b
[  523.954013]  [<ffffffff811375b3>] mntput_no_expire+0xcc/0xd1
[  523.954013]  [<ffffffff811375de>] mntput+0x26/0x28
[  523.954013]  [<ffffffff81137bef>] release_mounts+0x59/0x73
[  523.954013]  [<ffffffff81138155>] sys_umount+0x28d/0x2da
[  523.954013]  [<ffffffff814828c2>] system_call_fastpath+0x16/0x1b
[  523.954013] Code: 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 07 48 8b 90 a8 00 00 00 48 89 34 24 48 c7 c7 f9 ed 7a 81 48 89 de 31 c0 e8 4e 17 34 00 <0f> 0b 4c 8b 63 18 4c 39 e3 75 11 48 8d bb 90 00 00 00 45 31 e4 
[  523.954013] RIP  [<ffffffff81131788>] shrink_dcache_for_umount_subtree+0x104/0x1c9
[  523.954013]  RSP <ffff8801f2385d98>

Comment 1 Jeff Layton 2011-05-26 20:12:24 UTC

Ok, here's my reproducer for this so far. This can probably be simplified some and I'm still working on doing that.

I have the following in /etc/auto.misc:

    salusa		-fstype=nfs4,fsc	salusa.poochiereds.net:/

...and then this is auto.master (along with a few other entries):

    /misc	/etc/auto.misc

...salusa exports a /git filesystem, and I have a symlink on the client that looks like this:

  File: `/git' -> `/misc/salusa/git'

...with all of that set up, I then use claws-mail to save an email to a file under /git/linux-2.6. Then I don't touch the mount again. Once the mount expires, autofs goes to unmount it and then the BUG() pops.

Comment 2 Jeff Layton 2011-05-26 20:15:01 UTC

After I do the initial steps in the reproducer, this is what the contents of /proc/fs/nfsfs/volumes looks like:

NV SERVER   PORT DEV     FSID              FSC
v4 2001047000080d63020e0cfffec64b16  801 0:44    0:0               yes
v4 2001047000080d63020e0cfffec64b16  801 0:45    28e2782bbc8a82b3  yes

...the DEV field here is in decimal, and the BUG message in this instance referenced device 0:2c, which corresponds to 0:44 above. So the bug pop is occurring when unmounting the server's pseudoroot.

Comment 3 Jeff Layton 2011-05-26 20:33:50 UTC

I set the timeout on /misc to a shorter value and was able to reproduce this more quickly:

    /misc	/etc/auto.misc --timeout=60

...the /git symlink is unneeded as well. It reproduces just fine if I save a file to /misc/salusa/scratch (a scratch export that I use for testing).

Comment 4 Ian Kent 2011-05-27 08:01:52 UTC

I can't reproduce it using the procedure here.

I installed f15 (into a kvm vm) and tried with the default
kernel (2.6.38.6-27) to ensure it worked for that kernel.

Then I installed the rawhide kernel 2.6.39-1 expecting it
to fail but it didn't.

I noticed that for claws-mail to see the mount it needs to
be already mounted (no surprise) and that it appears to hold
something open after saving a mail since I had to close it
before the mount would expire.

Comment 5 Ian Kent 2011-05-27 08:56:25 UTC

There is some breakage using the rawhide kernel.
I think I was just lucky it booted. I'm rebuilding
the kernel locally to see if that will help.

Comment 6 Jeff Layton 2011-05-27 12:29:57 UTC

this, and the (In reply to comment #4)
> I can't reproduce it using the procedure here.
> 
> I installed f15 (into a kvm vm) and tried with the default
> kernel (2.6.38.6-27) to ensure it worked for that kernel.
> 
> Then I installed the rawhide kernel 2.6.39-1 expecting it
> to fail but it didn't.
> 
> I noticed that for claws-mail to see the mount it needs to
> be already mounted (no surprise) and that it appears to hold
> something open after saving a mail since I had to close it
> before the mount would expire.

Interesting...

I did a bit more factor elimination. The fsc option isn't needed to reproduce this. I also tried to reproduce this by several different command-line operations and it didn't occur.

I should note that when I'm saving the file here, I'm not navigating to the destination via point and click. I'm just prepending the filename with the path in the save box and that allows claws to trigger the mount. Perhaps that also keeps claws from pinning the directory so that it can expire?

I'm at a loss as to what claws is doing though that allows it to reliably trigger this...

Comment 7 Jeff Layton 2011-05-27 12:32:21 UTC

Oh, and one other strange thing. It seems like I can only reproduce this on the first mount. If I trigger the automount via other means and then let it expire, and then try my reproducer steps then it doesn't seem to happen reliably on subsequent attempts. I need to go back and confirm that though.

Comment 8 Jeff Layton 2011-05-27 12:36:41 UTC

Yes, confirmed -- did:

    $ ls -l /misc/salusa/scratch

...and then let the mount expire. Then I went into claws, saved a file to the same directory using my reproducer method here and the subsequent expire didn't cause an oops.

Comment 9 Jeff Layton 2011-05-27 12:46:26 UTC

For the record, I'm currently testing with:

autofs-5.0.5-35.fc15.x86_64
kernel-2.6.39-1.fc16.x86_64

Comment 10 Ian Kent 2011-05-27 13:15:15 UTC

(In reply to comment #9)
> For the record, I'm currently testing with:
> 
> autofs-5.0.5-35.fc15.x86_64
> kernel-2.6.39-1.fc16.x86_64

I was only able to get my vm to boot once with 2.6.39-1,
it just spews backtrace messages when I try and boot.

Comment 11 Jeff Layton 2011-05-27 13:51:21 UTC

Created attachment 501306 [details]
info on root dentry, post-oops

Poking around with crash gives me this info on the root dentry, post-oops. The d_flags are:

  d_flags = 0xc004 = DCACHE_DISCONNECTED|DCACHE_OP_REVALIDATE|DCACHE_OP_DELETE

...nothing terribly exciting there. d_parent points to itself (as expected), and all of the list_heads seem to be empty.

Comment 12 Jeff Layton 2011-05-27 19:23:36 UTC

I've also found that I can manually unmount the filesystem and trigger the bug, so I don't need to rely on autofs expiring the mount (that's good since that seemed to be broken in some early pre 2.6.38-rc1 versions:

Since I have a fairly reliable reproducer, I decided to try bisecting this down. The log so far:

$ git bisect log
git bisect start '--' 'fs/'
# bad: [1bae4ce27c9c90344f23c65ea6966c50ffeae2f5] Linux 2.6.38-rc2
git bisect bad 1bae4ce27c9c90344f23c65ea6966c50ffeae2f5
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# good: [b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1] Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git bisect good b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1
# good: [b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1] Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git bisect good b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1
# good: [6ab82196492a0b6968a654a06aae923b28afef0d] Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
git bisect good 6ab82196492a0b6968a654a06aae923b28afef0d
# bad: [9e8a462a0141b12e22c4a2f0c12e0542770401f0] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
git bisect bad 9e8a462a0141b12e22c4a2f0c12e0542770401f0

...looks like I have about 5 more iterations to go.

Comment 13 Jeff Layton 2011-05-27 20:48:40 UTC

Created attachment 501391 [details]
git bisect log and slightly different oops

Getting closer. Here's the log so far.

Note that with the last revision I couldn't reproduce the exact same bug as autofs failed to mount the fs. Suspecting that automountd was hoses, I did a service autofs restart and then it panicked with a very similar backtrace. So, I'm marking this bad on the assumption that it's a related problem...

Comment 14 Jeff Layton 2011-05-27 21:25:13 UTC

And the winner (loser?) of the bisect is...

commit 36d43a43761b004ad1879ac21471d8fc5f3157ec
Author: David Howells <dhowells>
Date:   Fri Jan 14 18:45:42 2011 +0000

    NFS: Use d_automount() rather than abusing follow_link()
    
    Make NFS use the new d_automount() dentry operation rather than abusing
    follow_link() on directories.
    
    Signed-off-by: David Howells <dhowells>
    Acked-by: Trond Myklebust <Trond.Myklebust>
    Acked-by: Ian Kent <raven>
    Signed-off-by: Al Viro <viro.org.uk>

...which unfortunately doesn't narrow things down that much :-/

Comment 15 Jeff Layton 2011-05-27 21:27:42 UTC

Created attachment 501403 [details]
final git bisect log

Here's the final git bisect log in case anyone is interested. Anyone have thoughts as to other possible avenues for investigation?

Comment 16 Ian Kent 2011-06-01 09:03:43 UTC

(In reply to comment #11)
> Created attachment 501306 [details]
> info on root dentry, post-oops
> 
> Poking around with crash gives me this info on the root dentry, post-oops. The
> d_flags are:
> 
>   d_flags = 0xc004 = DCACHE_DISCONNECTED|DCACHE_OP_REVALIDATE|DCACHE_OP_DELETE
> 
> ...nothing terribly exciting there. d_parent points to itself (as expected),
> and all of the list_heads seem to be empty.

It's about time for a complete stab in the dark!

It's odd that the dentry above hasn't been __d_drop()ed .... 

Which call of shrink_dcache_for_umount_subtree() does this oops on,
the first call or the second in shrink_dcache_for_umount()? The second
call is the one that walks the anonymous dentry list so that should be
the one. And the first thing shrink_dcache_for_umount_subtree() does
is __d_drop() the dentry. If it is the first call then how does the
anonymous dentry come to be under s_root?

And why does shrink_dcache_for_umount() decrement the count for the
s_root dentry but not for dentrys found on the sb->s_anon list, David?

Comment 17 Jeff Layton 2011-06-01 12:01:55 UTC

FWIW, I did some more testing yesterday. I did the reproducer on a kernel both with and without commit 36d43a437, and used crash to look at the d_count after triggering the mount and before the unmount. Those counts were the same in both cases -- the mnt_root had a d_count of 2, and s_root a d_count of 1.

Perhaps I should do it again and have a look at the d_flags, or just get the entire dentry structs in both cases so we can compare them.

Comment 18 Ian Kent 2011-06-01 12:37:28 UTC

(In reply to comment #17)
> FWIW, I did some more testing yesterday. I did the reproducer on a kernel both
> with and without commit 36d43a437, and used crash to look at the d_count after
> triggering the mount and before the unmount. Those counts were the same in both
> cases -- the mnt_root had a d_count of 2, and s_root a d_count of 1.

That makes sense, since s_root won't be used in path lookups
and then mnt_root will have one for when it was d_alloc()ed and
one from the umount code (I think).

> 
> Perhaps I should do it again and have a look at the d_flags, or just get the
> entire dentry structs in both cases so we can compare them.

I think we are most interested in the mnt_root dentry because
it is the one assigned to the vfsmount struct and so will be
the one involved in path walks. I think s_root is tucked away
in the super block and is basically inert in the NFS case.

What I don't get is whether the oops occurs from the fist call
to shrink_dcache_for_umount_subtree() or not. If it does then
I'd be wondering how the anonymous dentry got into the tree
under s_root at all (although that itself deserves more
looking around in the code).

The other thing that bugs me is in __d_drop() where we have:

if dentry is hashed then
  if dentry is disconnected
    remove from appropriate list
  else
    remove from other appropriate list
    maybe block on rcuwalk barrier
  done
done

So we don't boot active rcu-walk walks when dropping the
anonymous dentry, if it even gets to the drop that is.

Comment 19 Jeff Layton 2011-06-01 12:39:41 UTC

Created attachment 502245 [details]
pre and post bad patch dentry info

Here is the dentry info for the vfsmount->mnt_root and super_block->s_root, both without the "bad" patch (pre) and with the patch (post). I forgot to "set hex" beforehand so a lot of the interesting fields (e.g. d_flags) are in decimal. 

The d_flags seem to be identical in both cases as well. I don't see much in the way of discrepancies...

Also, I should note that the busy dentry in the oops message is the vfsmount->mnt_root, and *not* the super_block->s_root.

Comment 20 David Howells 2011-06-02 15:39:07 UTC

I'd be very surprised if you saw the bug appearing in relation to the dentry attached to s_root.  NFS's s_root dentry is a placeholder to stop the VFS blowing up, no more.  It isn't actually used for anything.

As NFS may share superblocks between several disjoint mounts from the same remote filesystem, it keeps its roots in the s_anon list instead.  Should the root of a mount be detected in the tree of another mount, then the two trees are spliced together at that point by d_materialise_unique().  The resulting tree will still have an anonymous root.

It is done this way as we may not have access to the actual root directory of the remote filesystem, and so may not be able to put that into our tree.

Thus the root of the mount is attached directly to the vfsmount struct without going anywhere near the core tree in the superblock.

Comment 21 Jeff Layton 2011-06-02 20:03:10 UTC

I was also able to reproduce this using a direct automount map as well (Ian asked if it was possible), so direct vs. indirect map doesn't seem to make any difference.

Comment 22 Jeff Layton 2011-06-02 20:41:22 UTC

Also, just for giggles I tried Ian's suggestion to make __d_drop always call dentry_rcuwalk_barrier() and it didn't prevent the oops on unmount.

Comment 23 Jeff Layton 2011-06-02 20:46:26 UTC

Here's what I don't quite get...

I've bisected the problem down to the patch that switched NFS over to use d_automount. That doesn't really affect unmounting afaict and should really only be affecting things when the automount is triggered.

So, it seems likely that that change is causing some sort of difference in the mounted filesystem. The question is what? I'll be happy to collect info from kernels with and without that patch to try and discern where the difference is -- any thoughts about what I should be collecting info on?

Comment 24 Ian Kent 2011-06-03 01:16:13 UTC

(In reply to comment #23)
> Here's what I don't quite get...
> 
> I've bisected the problem down to the patch that switched NFS over to use
> d_automount. That doesn't really affect unmounting afaict and should really
> only be affecting things when the automount is triggered.
> 
> So, it seems likely that that change is causing some sort of difference in the
> mounted filesystem. The question is what? I'll be happy to collect info from
> kernels with and without that patch to try and discern where the difference is
> -- any thoughts about what I should be collecting info on?

We should have also checked /proc/mounts in each case.

Comment 25 Jeff Layton 2011-06-03 11:48:01 UTC

(In reply to comment #24)
> We should have also checked /proc/mounts in each case.

I got some clarification from Ian and he was just concerned that it might have been mounted more than once. I verified that it wasn't.

Comment 26 Jeff Layton 2011-06-07 17:30:27 UTC

I decided to strace claws-mail. The first thing it does to trigger an automount is an open on the directory:


18672 13:17:53.843597 open("/misc/salusa/scratch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC <unfinished ...>

...while that crunching away, a different thread does an lstat of a file in that directory (the one it intends to save):

18682 13:17:53.975947 lstat("/misc/salusa/scratch/[CapitalCityCigars] And so it begins!!",  <unfinished ...>

...that comes back first:

18682 13:17:54.301130 <... lstat resumed> 0x7f7baea2a910) = -1 ENOENT (No such file or directory) <0.325145>

...then it does an lstat of the parent:

18682 13:17:54.303366 lstat("/misc/salusa/scratch", {st_dev=makedev(0, 42), st_ino=2, st_mode=S_IFDIR|S_ISVTX|0777, st_nlink=8, st_uid=99, st_gid=99, st_blksize=524288, st_blocks=8, st_size=4096,
 st_atime=2011/06/07-12:13:09, st_mtime=2011/06/07-12:13:09, st_ctime=2011/06/07-12:13:09}) = 0 <0.000390>

...a little later, the opendir returns:

18672 13:17:54.340699 <... open resumed> ) = 23 <0.497070>

...then it sets an inotify watch (probably via some gtk open dialog goop):

18668 13:17:54.341626 inotify_add_watch(21, "/misc/salusa/scratch", IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF|IN_UNMOUNT|IN_ONLYDIR) = 3 <0.000043>

...and then starts a readdir:

18682 13:17:54.342309 getdents(23,  <unfinished ...>

...that comes back (I won't paste it as it's pretty big), and then it starts lstat'ing each dentry:

18682 13:17:54.347897 lstat("/misc/salusa/scratch/testfile", {st_dev=makedev(0, 42), st_ino=13, st_mode=S_IFREG|0644, st_nlink=1, st_uid=99, st_gid=99, st_blksize=524288, st_blocks=2097160, st_si
ze=1073741824, st_atime=2011/05/19-10:56:52, st_mtime=2011/05/19-10:59:30, st_ctime=2011/05/19-10:59:30}) = 0 <0.000033>

...most of this is pretty straightforward stuff, but I wonder if the inotify_add_watch is important somehow as that's somewhat unusual from command-line apps.

Comment 27 Jeff Layton 2011-06-10 13:47:09 UTC

Created attachment 504121 [details]
debug patch -- instrument nfs_d_automount

Taking Ian's suggestion on IRC into account, I decided to instrument nfs_d_automount. I turned all of the dprintk's into printk's, had them print current->pid. I also added a printk of the string representation of the struct path.

It looks like we do have 2 concurrent nfs_d_automount requests going on here. They both return different vfsmounts, though. The second vfsmount does not show up in the mount table (as shown by crash's mount command). Here's the snippet of dmesg when I did the reproducer to trigger a mount. Unfortunately, the dump_stack() got a little muddled, but it gives you an idea of what's going on...

[  214.444300] 1670: --> nfs_d_automount()
[  214.445701] Pid: 1670, comm: claws-mail Not tainted 2.6.39-1.fc16.x86_64.debug #1
[  214.448353] Call Trace:
[  214.449301]  [<ffffffffa01dc897>] nfs_d_automount+0x60/0x579 [nfs]
[  214.451247] 1677: --> nfs_d_automount()
[  214.451251] Pid: 1677, comm: claws-mail Not tainted 2.6.39-1.fc16.x86_64.debug #1
[  214.451253] Call Trace:
[  214.451279]  [<ffffffffa01dc897>] nfs_d_automount+0x60/0x579 [nfs]
[  214.451334]  [<ffffffff814b9a15>] ? _raw_spin_unlock+0x28/0x2c
[  214.451362]  [<ffffffff811455b4>] ? dput+0xdc/0xfc
[  214.451373]  [<ffffffffa01cc347>] ? nfs_lookup_revalidate+0x227/0x366 [nfs]
[  214.451377]  [<ffffffff8113c463>] follow_managed+0x12a/0x1ce
[  214.451381]  [<ffffffff8113d7b8>] walk_component+0x22c/0x335
[  214.451402]  [<ffffffff81207715>] ? security_inode_exec_permission+0x25/0x27
[  214.451406]  [<ffffffff8113da60>] link_path_walk+0x19f/0x439
[  214.451409]  [<ffffffff8113de1e>] path_lookupat+0x5a/0x2e1
[  214.451432]  [<ffffffff81101170>] ? might_fault+0xa5/0xac
[  214.451435]  [<ffffffff81101127>] ? might_fault+0x5c/0xac
[  214.451438]  [<ffffffff8113c8ee>] ? getname_flags+0x31/0x1ca
[  214.451441]  [<ffffffff8113ef4f>] do_path_lookup+0x28/0x96
[  214.451444]  [<ffffffff8113f374>] user_path_at+0x59/0x96
[  214.451462]  [<ffffffff810774b6>] ? up_read+0x28/0x2c
[  214.451471]  [<ffffffff814bcf30>] ? do_page_fault+0x360/0x3b4
[  214.451475]  [<ffffffff8113703f>] vfs_fstatat+0x44/0x6e
[  214.451478]  [<ffffffff81101127>] ? might_fault+0x5c/0xac
[  214.451481]  [<ffffffff81137087>] vfs_lstat+0x1e/0x20
[  214.451484]  [<ffffffff811371d6>] sys_newlstat+0x1a/0x33
[  214.451493]  [<ffffffff810870f3>] ? trace_hardirqs_on_caller+0x10b/0x12f
[  214.451502]  [<ffffffff810a9e5f>] ? audit_syscall_entry+0x11c/0x148
[  214.451523]  [<ffffffff81256e1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  214.451526]  [<ffffffff814c05c2>] system_call_fastpath+0x16/0x1b
[  214.451532] nfs_d_automount(1677): enter
[  214.451548] 1677: dentry=ffff88001005c540 path=/misc/salusa/scratch
[  214.471100] SELinux: initialized (dev 0:2b, type nfs4), uses genfs_contexts
[  214.471587] nfs_d_automount(1677): done, success
[  214.471607] 1677: <-- nfs_d_automount() = ffff88001194d140
[  214.511636]  [<ffffffff81078a0d>] ? sched_clock_local+0x12/0x75
[  214.514001]  [<ffffffff81083764>] ? trace_hardirqs_off+0xd/0xf
[  214.515589]  [<ffffffff81078bda>] ? local_clock+0x36/0x4d
[  214.517049]  [<ffffffff81083e91>] ? lock_release_holdtime.part.8+0x6b/0x72
[  214.518857]  [<ffffffff814b849f>] ? __mutex_unlock_slowpath+0x112/0x122
[  214.521946]  [<ffffffff8113c463>] follow_managed+0x12a/0x1ce
[  214.523395]  [<ffffffff8113d7b8>] walk_component+0x22c/0x335
[  214.524896]  [<ffffffff8113d956>] ? link_path_walk+0x95/0x439
[  214.526307]  [<ffffffff8113e4f0>] do_last+0x13f/0x5ad
[  214.527571]  [<ffffffff8113f4fe>] path_openat+0xcb/0x31f
[  214.528964]  [<ffffffff81078a0d>] ? sched_clock_local+0x12/0x75
[  214.530742]  [<ffffffff8113f78a>] do_filp_open+0x38/0x86
[  214.532133]  [<ffffffff814b9a15>] ? _raw_spin_unlock+0x28/0x2c
[  214.533579]  [<ffffffff8114a164>] ? alloc_fd+0x17f/0x191
[  214.534965]  [<ffffffff811327a0>] do_sys_open+0x6e/0x100
[  214.536294]  [<ffffffff810a9e5f>] ? audit_syscall_entry+0x11c/0x148
[  214.537889]  [<ffffffff81132852>] sys_open+0x20/0x22
[  214.539129]  [<ffffffff814c05c2>] system_call_fastpath+0x16/0x1b
[  214.540601] nfs_d_automount(1670): enter
[  214.541958] 1670: dentry=ffff88001005c540 path=/misc/salusa/scratch
[  214.553643] nfs_d_automount(1670): done, success
[  214.555077] 1670: <-- nfs_d_automount() = ffff88001194d280

Comment 28 Jeff Layton 2011-06-10 15:00:34 UTC

I've limited the reproducer down a little...

I can reproduce this as long as I use a gtk/gnome dialog to trigger a mount. I don't actually need to save anything in the directory. I can just pull up a "browse" dialog, type in "/misc/salusa/scratch/" and hit return. That opens the directory and triggers the mount.

Comment 29 Jeff Layton 2011-06-10 17:20:35 UTC

I hacked up a quick patch to make nfs_d_automount return NULL after the first automount attempt. The first time nfs_d_automount is called, it works normally and the second time it just returns NULL (indicating that . This seems to work around the bug, lending some weight to the idea that the concurrent d_automount activity is somehow causing the issue.

Comment 30 Jeff Layton 2011-06-10 19:15:26 UTC

A little more printk debugging:

[  450.447517] 1507: --> nfs_d_automount()
[  450.449513] nfs_d_automount(1507): enter
[  450.452785] 1512: --> nfs_d_automount()
[  450.455700] 1507: dentry=ffff880012520a80 path=/misc/salusa/scratch
[  450.471407] SELinux: initialized (dev 0:2b, type nfs4), uses genfs_contexts
[  450.471963] nfs_d_automount(1507): done, success
[  450.474253] 1507: <-- nfs_d_automount() = ffff880022765640
[  450.475581] nfs_d_automount(1512): enter
[  450.476725] 1512: dentry=ffff880012520a80 path=/misc/salusa/scratch
[  450.480416] follow_automount(1507): finish_automount returned 0
[  450.490699] nfs_d_automount(1512): done, success
[  450.493240] 1512: <-- nfs_d_automount() = ffff8800114c2000
[  450.496763] follow_automount(1512): finish_automount returned -16
[  472.333182] BUG: Dentry ffff880012520930{i=2,n=} still in use (1) [unmount of nfs4 0:2a]

...so follow_automount is returning -EBUSY. It seems somewhat likely that the cleanup of the second vfsmount is not completely correct in this situation.

Comment 31 Jeff Layton 2011-06-10 19:47:23 UTC

...and a little more -- show the mnt_count. It's 2 for both mounts prior to calling finish_automount:

[  486.108984] SELinux: initialized (dev 0:2a, type nfs4), uses genfs_contexts
[  486.117197] mount.nfs4 used greatest stack depth: 2304 bytes left
[  486.127215] 1494: --> nfs_d_automount()
[  486.128641] nfs_d_automount(1494): enter
[  486.130170] 1494: dentry=ffff8800125dc3f0 path=/misc/salusa/scratch
[  486.139573] 1499: --> nfs_d_automount()
[  486.143287] SELinux: initialized (dev 0:2b, type nfs4), uses genfs_contexts
[  486.143841] nfs_d_automount(1494): done, success
[  486.145655] 1494: <-- nfs_d_automount() = ffff880011941140
[  486.147213] follow_automount(1494): mnt->mnt_count = 2
[  486.148720] nfs_d_automount(1499): enter
[  486.149828] 1499: dentry=ffff8800125dc3f0 path=/misc/salusa/scratch
[  486.154517] follow_automount(1494): finish_automount returned 0
[  486.168236] nfs_d_automount(1499): done, success
[  486.170616] 1499: <-- nfs_d_automount() = ffff880011941280
[  486.174350] follow_automount(1499): mnt->mnt_count = 2
[  486.180294] follow_automount(1499): finish_automount returned -16

Comment 32 Ian Kent 2011-06-14 08:30:56 UTC

I can't for the life of me. find where the extra reference is being
picked up.

I agree that the fsnotify system probably has a lot to do with this
problem but I can't see why. The most obvious way to take fsnotify
out of this is to not go through the vfsmount setup for a mount
that essentially already exists. That avoids mntput() calling
fsnotify_unmount_inodes(). Alternately, we need to consult someone
with in-depth knowledge of the fsnotify system. Any ideas who that
might be?

Anyway, this amounts to using a mutex in ->d_automount() to
serialize calls to it and then calling d_mountpoint() on the
incoming path dentry after aquiring it and returning NULL if
it is true. This should be all that's needed since the mount
would be complete by the time we exit ->d_automount().

I also believe that the CIFS d_automount patch suffers the same
fault.

Comment 33 Ian Kent 2011-06-14 09:05:14 UTC

(In reply to comment #32)
> 
> I also believe that the CIFS d_automount patch suffers the same
> fault.

And quite probably nfs referral mounts as well.

Comment 34 Jeff Layton 2011-06-15 12:23:32 UTC

I rolled up a small test program that can reproduce this now without a gui:

-----------------[snip]------------------
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int
main(int argc, char **argv)
{
	struct stat buf;

	fork();

	stat(argv[1], &buf);

	return 0;
}
-----------------[snip]------------------

...when I run this with /misc/salusa/scratch/foo as the first argument, then unmounting /misc/salusa triggers the panic. Ian, were you trying to reproduce this on a single-cpu box before? If so, then that might explain why you couldn't get it to happen.

With this, I think we can rule out inotify as the problem. The issue is something with simultaneous d_automounts running.

Comment 35 Ian Kent 2011-06-15 13:03:51 UTC

(In reply to comment #34)
> I rolled up a small test program that can reproduce this now without a gui:
> 
> -----------------[snip]------------------
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <unistd.h>
> 
> int
> main(int argc, char **argv)
> {
>  struct stat buf;
> 
>  fork();
> 
>  stat(argv[1], &buf);
> 
>  return 0;
> }
> -----------------[snip]------------------

OK, I'll try this in a VM.
Only have an F15, a 2.6.38 based kernel I think, but that shouldn't make a difference.

> 
> ...when I run this with /misc/salusa/scratch/foo as the first argument, then
> unmounting /misc/salusa triggers the panic. Ian, were you trying to reproduce
> this on a single-cpu box before? If so, then that might explain why you
> couldn't get it to happen.

I use at least 2-cpus in vms, and I think I had 4 in the one I
used to try and reproduce this.

> 
> With this, I think we can rule out inotify as the problem. The issue is
> something with simultaneous d_automounts running.

It does look that way, yes.
At least that will save me the time trying to write an inotify
based reproducer.

Ian

Comment 36 Jeff Layton 2011-06-15 14:07:44 UTC

...and with the above reproducer, I can reproduce this without autofs in the mix now. Here's the details:

The server (a f15 box named salusa) has a separate filesystem mounted on /scratch that is exported with (rw,insecure,no_root_squash). It has a file in it called "foo". The server does not declare an explicit fsid=0 export, so it fakes up a pseudoroot.

On the client, I mount the filesystem:

    # mount -t nfs4 salusa:/scratch /mnt/salusa

...I'm then careful not to touch anything under /mnt/salusa. I then run the reproducer above with the following path:

    # ./forkstat /mnt/salusa/scratch/foo

...and then:

    # umount /mnt/salusa

...at that point, the BUG pops.

Comment 37 info@kobaltwit.be 2011-06-15 14:43:46 UTC

*** Bug 708099 has been marked as a duplicate of this bug. ***

Comment 38 Jeff Layton 2011-06-15 15:11:13 UTC

...sorry, I made a mistake in comment #36. The mount command should read as:

    # mount -t nfs4 salusa:/ /mnt/salusa

...everything else should be correct.

Comment 39 Ian Kent 2011-06-16 02:16:31 UTC

(In reply to comment #38)
> ...sorry, I made a mistake in comment #36. The mount command should read as:
> 
>     # mount -t nfs4 salusa:/ /mnt/salusa
> 
> ...everything else should be correct.

Ha, I'm able to reproduce this now, and with only 2-cpus.

Comment 40 David Howells 2011-06-16 14:55:09 UTC

I tracked this down to lock_mount() releasing the caller's ref on path->mnt when in finds it has to transit to a new mountpoint from lookup_mnt().  The problem is that pathwalk may not be holding a ref on path->mnt if it's the same as nd->path.mnt as it tries to save on mntget()/mntput() calls.

Comment 41 Ian Kent 2011-06-16 15:25:40 UTC

Created attachment 505077 [details]
Patch - VFS: Fix vfsmount overput on simultaneous automount

We believe this patch resolves the problem seen in this bug.

Since Jeff has put so much effort into this already I'll set
about creating a test kernel package.

Comment 42 Ian Kent 2011-06-16 15:54:57 UTC

Created attachment 505085 [details]
Patch - VFS: Fix vfsmount overput on simultaneous automount (F15)

Oops, original patch didn't apply cleanly.
All the more reason to test it yourselves, please.

Comment 43 Ian Kent 2011-06-17 03:41:51 UTC

A kernel with the above patch has been built.
Please give it a try and reprot back with results.

It can be found at:
http://people.redhat.com/~ikent/kernel-2.6.38.8-32.bz708039.1.fc15

Comment 44 Andy 2011-06-20 01:19:23 UTC

I have a similar (maybe?) bug that started with FC15.  I run CrashPlan client on a nfs auto mounted file system.  The kernel above didn't fix it.  I only have 1 CPU.  


2.6.38.8-32.bz708039.1.fc15.i686 #1 SMP Thu Jun 16 17:51:40 UTC 2011 i686 i686 i386 GNU/Linux





[138413.271428] BUG: Dentry f02ad480{i=13f2d0,n=/} still in use (1) [unmount of autofs autofs]
[138413.274358] ------------[ cut here ]------------
[138413.275316] kernel BUG at fs/dcache.c:947!
[138413.275316] invalid opcode: 0000 [#1] SMP 
[138413.275316] last sysfs file: /sys/devices/virtual/bdi/0:81/uevent
[138413.275316] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc p4_clockmod vfat fat snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm ppdev iTCO_wdt e100 iTCO_vendor_support parport_pc mii snd_timer snd microcode soundcore parport i2c_i801 snd_page_alloc serio_raw i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: mperf]
[138413.275316] 
[138413.275316] Pid: 1885, comm: automount Not tainted 2.6.38.8-32.bz708039.1.fc15.i686 #1 Compaq Evo D510 CMT/07E8h
[138413.275316] EIP: 0060:[<c04f382d>] EFLAGS: 00010296 CPU: 0
[138413.275316] EIP is at shrink_dcache_for_umount_subtree+0xd0/0x162
[138413.275316] EAX: 00000065 EBX: f02ad480 ECX: 00000046 EDX: 00000000
[138413.275316] ESI: c093985f EDI: f282115c EBP: f43a9edc ESP: f43a9eac
[138413.275316]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[138413.275316] Process automount (pid: 1885, ti=f43a8000 task=f2bbd7f0 task.ti=f43a8000)
[138413.275316] Stack:
[138413.275316]  c092d11a f02ad480 0013f2d0 f02ad4a4 00000001 c093985f f282115c 00000000
[138413.275316]  f02ad4a4 f02ad480 f2821000 c07f9e64 f43a9eec c04f401b f2821000 c67ba780
[138413.275316]  f43a9f08 c04e5e3d f5b9a740 c67ba780 00000029 c67ba780 f409fa00 f43a9f14
[138413.275316] Call Trace:
[138413.275316]  [<c04f401b>] shrink_dcache_for_umount+0x3e/0x4a
[138413.275316]  [<c04e5e3d>] generic_shutdown_super+0x1e/0xb8
[138413.275316]  [<c04e5f48>] kill_anon_super+0x11/0x44
[138413.275316]  [<c04e5f99>] kill_litter_super+0x1e/0x21
[138413.275316]  [<c05812fe>] autofs4_kill_sb+0x35/0x39
[138413.275316]  [<c04e610a>] deactivate_locked_super+0x1f/0x40
[138413.275316]  [<c04e6a9b>] deactivate_super+0x2e/0x31
[138413.275316]  [<c04f8694>] mntput_no_expire+0xb5/0xb9
[138413.275316]  [<c04f8f32>] sys_umount+0x270/0x297
[138413.275316]  [<c04f8f77>] sys_oldumount+0x1e/0x20
[138413.275316]  [<c07d691c>] syscall_call+0x7/0xb
[138413.275316]  [<c07d0000>] ? audit_log_exit+0xbca/0xdc3
[138413.275316] Code: 03 8b 41 60 89 54 24 10 8b 55 f0 89 7c 24 18 89 74 24 14 89 5c 24 04 89 54 24 0c 89 44 24 08 c7 04 24 1a d1 92 c0 e8 2c b5 2d 00 <0f> 0b 8b 73 10 39 f3 75 0c 8d 43 68 31 f6 e8 24 2b 0e 00 eb 16 
[138413.275316] EIP: [<c04f382d>] shrink_dcache_for_umount_subtree+0xd0/0x162 SS:ESP 0068:f43a9eac
[138413.418764] ---[ end trace 073c5e8debef6d67 ]---

Comment 45 Ian Kent 2011-06-20 03:36:49 UTC

(In reply to comment #44)
> I have a similar (maybe?) bug that started with FC15.  I run CrashPlan client
> on a nfs auto mounted file system.  The kernel above didn't fix it.  I only
> have 1 CPU.  
>

That not the same bug, see bug 698806.

Comment 46 John 2011-09-05 06:43:04 UTC

Same issue here. 
Once the last open file under the mountpoint is closed, umount causes kernel ooops. It looks to me like something has been freed when it shouldn't have been. Note the fact that umount reports "/cluster/gslap01: not mounted", when it most certainly was mounted:

[root@gslap02 ~]# killall -TERM lock_cleaner
[root@gslap02 ~]# lsof | grep /cluster/gslap01
lock_clea 7592   survey1  txt       REG       0,23  1332384   95191157 /cluster/gslap01/gs/bin/a.1.1.0/lock_cleaner
lock_clea 7592   survey1    1w      REG       0,23     3886   95191163 /cluster/gslap01/gs/bin/a.1.1.0/lock_cleaner.errors
lock_clea 7592   survey1    2w      REG       0,23     3886   95191163 /cluster/gslap01/gs/bin/a.1.1.0/lock_cleaner.errors
lock_clea 7592   survey1    3u      REG       0,23    16018   94405062 /cluster/gslap01/gs/database/11_7med/mission_plan/db_lock_table.dat
lock_clea 7592   survey1    4u      REG       0,23      260   95191135 /cluster/gslap01/gs/bin/a.1.1.0/common_lock_table.dat
[root@gslap02 ~]#
[root@gslap02 ~]# lsof | grep /cluster/gslap01
[root@gslap02 ~]# umount /cluster/gslap01
umount: /cluster/gslap01: not mounted
[root@gslap02 ~]#
Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: ------------[ cut here ]------------

Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: invalid opcode: 0000 [#1] PREEMPT SMP

Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1c.5/0000:09:00.0/net/eth0/statistics/collisions

Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: Process umount.nfs4 (pid: 8895, ti=e0834000 task=e08b7540 task.ti=e0834000)

Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: Stack:

Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: Call Trace:

Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: Code: 46 20 c7 45 ec 00 00 00 00 85 c0 74 06 8b 40 60 89 45 ec 8d 82 5c 01 00 00 50 51 53 57 ff 75 ec 56 68 18 26 7f c0 e8 fe d2 f3 ff <0f> 0b 83 c4 1c eb fe 8b 7e 10 8d 46 68 89 45 e4 39 fe 75 1f 8b

Message from syslogd@ at Mon Sep  5 10:54:52 2011 ...
gslap02 kernel: EIP: [<c050360a>] shrink_dcache_for_umount_subtree+0xd5/0x1a0 SS:ESP 0068:e0835e8c

Comment 47 John 2011-09-05 06:58:19 UTC

I should add the above bug occurs with kernel-ml 2.6.39-4.el5.elrepo.x86_64

Comment 48 John 2011-09-05 08:51:22 UTC

I'm also seeing another nfs4 issue in the logs, stuff like:
      bad fileid, expected foo got bar 

But it didn't look too bad - the fileid/inode concerned was the root of the NFS4 mount, and I figured some non-critical code inside nfs somewhere was just seeing inode of the mountpoint instead of the directory mounted on the mountpoint, or something along those lines. It didn't seem to immediately cause any problems, but did puzzle me a bit.

Now however, I think its possible the bad fileid issue may be related to this more serious oops on umount bug.

I am just speculating, as I have no knowledge of the source code, but the bad fileid errors I see ALWAYS involve the mountpoint where the nfs share is being mounted. So [erhaps the code to handle bad fileids is a trigger for dentry leak and therefore cause oops on umount?

In any case, please fix these dentry cache errors soon. It is disappointing when the NFSv4 lottery crashes servers and clients year after year, kernel after kernel. 

Do we really need to sacrifice stability and lose data just to add new features to nfsv4? 

Can we not fix 1 bug without creating 10 more?

Is it time to declare nfs4 fatally flawed and impossible to implement reliably - and then move it out of the kernel and into fuse?

Comment 49 John 2011-09-08 00:44:55 UTC

its all a bit too hard isnt it.

Comment 50 Josh Boyer 2011-09-26 18:52:37 UTC

The patch from comment #42 has been included in the 3.0 kernel release.  F15 is using 2.6.40 now, which is 3.0 renamed.  If this issue is still present with the latest F15 kernel update, please reopen the bug

Comment 51 Josh Boyer 2011-09-26 18:53:17 UTC

*** Bug 715374 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.