On my main workstation at home, I have autofs set up to mount the pseudoroot of my main fileserver (details on this to follow). When I mount that root and do even a small bit of I/O to it, and then let the mount expire, I get a BUG() pop on umount: [ 523.953284] BUG: Dentry ffff8801f3084180{i=2,n=} still in use (1) [unmount of nfs4 0:2c] [ 523.953306] ------------[ cut here ]------------ [ 523.954013] kernel BUG at fs/dcache.c:925! [ 523.954013] invalid opcode: 0000 [#1] SMP [ 523.954013] last sysfs file: /sys/devices/virtual/bdi/0:45/uevent [ 523.954013] CPU 1 [ 523.954013] Modules linked in: nfs lockd auth_rpcgss nfs_acl tun fuse ip6table_filter ip6_tables ebtable_nat ebtables sunrpc cachefiles fscache cpufreq_ondemand powernow_k8 freq_table mperf it87 adt7475 hwmon_vid xfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel raid1 snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi snd_seq_device snd_pcm snd_timer snd uvcvideo ppdev videodev soundcore media sp5100_tco v4l2_compat_ioctl32 edac_core parport_pc snd_page_alloc i2c_piix4 edac_mce_amd k10temp parport wmi r8169 microcode mii virtio_net kvm_amd kvm ipv6 ata_generic pata_acpi pata_atiixp radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [ 523.954013] [ 523.954013] Pid: 8387, comm: umount.nfs4 Not tainted 2.6.39-1.fc16.x86_64 #1 BIOSTAR Group TA790GX 128M/TA790GX 128M [ 523.954013] RIP: 0010:[<ffffffff81131788>] [<ffffffff81131788>] shrink_dcache_for_umount_subtree+0x104/0x1c9 [ 523.954013] RSP: 0018:ffff8801f2385d98 EFLAGS: 00010296 [ 523.954013] RAX: 0000000000000062 RBX: ffff8801f3084180 RCX: 00000000000027a7 [ 523.954013] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 [ 523.954013] RBP: ffff8801f2385dc8 R08: 0000000000000002 R09: 0000ffff00066c0a [ 523.954013] R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff8801f3084180 [ 523.954013] R13: ffff8801f31305f0 R14: ffff8801f30ec5f0 R15: 0000000000000000 [ 523.954013] FS: 00007f08e468d720(0000) GS:ffff88022fc80000(0000) knlGS:00000000f770f870 [ 523.954013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 523.954013] CR2: 00007fff8fc29ff8 CR3: 00000001f7ac4000 CR4: 00000000000006e0 [ 523.954013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 523.954013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 523.954013] Process umount.nfs4 (pid: 8387, threadinfo ffff8801f2384000, task ffff8801f7a61710) [ 523.954013] Stack: [ 523.954013] ffff8801fdd74a58 0000000000000000 ffff8801fdd74800 ffff8801f3084240 [ 523.954013] ffff8801f2385f28 ffff8801f30ec540 ffff8801f2385de8 ffffffff81132063 [ 523.954013] ffff8801fdd74800 ffffffffa0542300 ffff8801f2385e18 ffffffff81121f15 [ 523.954013] Call Trace: [ 523.954013] [<ffffffff81132063>] shrink_dcache_for_umount+0x4d/0x5f [ 523.954013] [<ffffffff81121f15>] generic_shutdown_super+0x23/0xde [ 523.954013] [<ffffffff8112204d>] kill_anon_super+0x13/0x4c [ 523.954013] [<ffffffffa051c436>] nfs4_kill_super+0x44/0x77 [nfs] [ 523.954013] [<ffffffff81122271>] deactivate_locked_super+0x26/0x4b [ 523.954013] [<ffffffff81122a83>] deactivate_super+0x37/0x3b [ 523.954013] [<ffffffff811375b3>] mntput_no_expire+0xcc/0xd1 [ 523.954013] [<ffffffff811375de>] mntput+0x26/0x28 [ 523.954013] [<ffffffff81137bef>] release_mounts+0x59/0x73 [ 523.954013] [<ffffffff81138155>] sys_umount+0x28d/0x2da [ 523.954013] [<ffffffff814828c2>] system_call_fastpath+0x16/0x1b [ 523.954013] Code: 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 07 48 8b 90 a8 00 00 00 48 89 34 24 48 c7 c7 f9 ed 7a 81 48 89 de 31 c0 e8 4e 17 34 00 <0f> 0b 4c 8b 63 18 4c 39 e3 75 11 48 8d bb 90 00 00 00 45 31 e4 [ 523.954013] RIP [<ffffffff81131788>] shrink_dcache_for_umount_subtree+0x104/0x1c9 [ 523.954013] RSP <ffff8801f2385d98>
Ok, here's my reproducer for this so far. This can probably be simplified some and I'm still working on doing that. I have the following in /etc/auto.misc: salusa -fstype=nfs4,fsc salusa.poochiereds.net:/ ...and then this is auto.master (along with a few other entries): /misc /etc/auto.misc ...salusa exports a /git filesystem, and I have a symlink on the client that looks like this: File: `/git' -> `/misc/salusa/git' ...with all of that set up, I then use claws-mail to save an email to a file under /git/linux-2.6. Then I don't touch the mount again. Once the mount expires, autofs goes to unmount it and then the BUG() pops.
After I do the initial steps in the reproducer, this is what the contents of /proc/fs/nfsfs/volumes looks like: NV SERVER PORT DEV FSID FSC v4 2001047000080d63020e0cfffec64b16 801 0:44 0:0 yes v4 2001047000080d63020e0cfffec64b16 801 0:45 28e2782bbc8a82b3 yes ...the DEV field here is in decimal, and the BUG message in this instance referenced device 0:2c, which corresponds to 0:44 above. So the bug pop is occurring when unmounting the server's pseudoroot.
I set the timeout on /misc to a shorter value and was able to reproduce this more quickly: /misc /etc/auto.misc --timeout=60 ...the /git symlink is unneeded as well. It reproduces just fine if I save a file to /misc/salusa/scratch (a scratch export that I use for testing).
I can't reproduce it using the procedure here. I installed f15 (into a kvm vm) and tried with the default kernel (2.6.38.6-27) to ensure it worked for that kernel. Then I installed the rawhide kernel 2.6.39-1 expecting it to fail but it didn't. I noticed that for claws-mail to see the mount it needs to be already mounted (no surprise) and that it appears to hold something open after saving a mail since I had to close it before the mount would expire.
There is some breakage using the rawhide kernel. I think I was just lucky it booted. I'm rebuilding the kernel locally to see if that will help.
this, and the (In reply to comment #4) > I can't reproduce it using the procedure here. > > I installed f15 (into a kvm vm) and tried with the default > kernel (2.6.38.6-27) to ensure it worked for that kernel. > > Then I installed the rawhide kernel 2.6.39-1 expecting it > to fail but it didn't. > > I noticed that for claws-mail to see the mount it needs to > be already mounted (no surprise) and that it appears to hold > something open after saving a mail since I had to close it > before the mount would expire. Interesting... I did a bit more factor elimination. The fsc option isn't needed to reproduce this. I also tried to reproduce this by several different command-line operations and it didn't occur. I should note that when I'm saving the file here, I'm not navigating to the destination via point and click. I'm just prepending the filename with the path in the save box and that allows claws to trigger the mount. Perhaps that also keeps claws from pinning the directory so that it can expire? I'm at a loss as to what claws is doing though that allows it to reliably trigger this...
Oh, and one other strange thing. It seems like I can only reproduce this on the first mount. If I trigger the automount via other means and then let it expire, and then try my reproducer steps then it doesn't seem to happen reliably on subsequent attempts. I need to go back and confirm that though.
Yes, confirmed -- did: $ ls -l /misc/salusa/scratch ...and then let the mount expire. Then I went into claws, saved a file to the same directory using my reproducer method here and the subsequent expire didn't cause an oops.
For the record, I'm currently testing with: autofs-5.0.5-35.fc15.x86_64 kernel-2.6.39-1.fc16.x86_64
(In reply to comment #9) > For the record, I'm currently testing with: > > autofs-5.0.5-35.fc15.x86_64 > kernel-2.6.39-1.fc16.x86_64 I was only able to get my vm to boot once with 2.6.39-1, it just spews backtrace messages when I try and boot.
Created attachment 501306 [details] info on root dentry, post-oops Poking around with crash gives me this info on the root dentry, post-oops. The d_flags are: d_flags = 0xc004 = DCACHE_DISCONNECTED|DCACHE_OP_REVALIDATE|DCACHE_OP_DELETE ...nothing terribly exciting there. d_parent points to itself (as expected), and all of the list_heads seem to be empty.
I've also found that I can manually unmount the filesystem and trigger the bug, so I don't need to rely on autofs expiring the mount (that's good since that seemed to be broken in some early pre 2.6.38-rc1 versions: Since I have a fairly reliable reproducer, I decided to try bisecting this down. The log so far: $ git bisect log git bisect start '--' 'fs/' # bad: [1bae4ce27c9c90344f23c65ea6966c50ffeae2f5] Linux 2.6.38-rc2 git bisect bad 1bae4ce27c9c90344f23c65ea6966c50ffeae2f5 # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37 git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 # good: [b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1] Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 git bisect good b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1 # good: [b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1] Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 git bisect good b9d919a4ac6cf031b8e065f82ad8f1b0c9ed74b1 # good: [6ab82196492a0b6968a654a06aae923b28afef0d] Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block git bisect good 6ab82196492a0b6968a654a06aae923b28afef0d # bad: [9e8a462a0141b12e22c4a2f0c12e0542770401f0] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 git bisect bad 9e8a462a0141b12e22c4a2f0c12e0542770401f0 ...looks like I have about 5 more iterations to go.
Created attachment 501391 [details] git bisect log and slightly different oops Getting closer. Here's the log so far. Note that with the last revision I couldn't reproduce the exact same bug as autofs failed to mount the fs. Suspecting that automountd was hoses, I did a service autofs restart and then it panicked with a very similar backtrace. So, I'm marking this bad on the assumption that it's a related problem...
And the winner (loser?) of the bisect is... commit 36d43a43761b004ad1879ac21471d8fc5f3157ec Author: David Howells <dhowells> Date: Fri Jan 14 18:45:42 2011 +0000 NFS: Use d_automount() rather than abusing follow_link() Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells> Acked-by: Trond Myklebust <Trond.Myklebust> Acked-by: Ian Kent <raven> Signed-off-by: Al Viro <viro.org.uk> ...which unfortunately doesn't narrow things down that much :-/
Created attachment 501403 [details] final git bisect log Here's the final git bisect log in case anyone is interested. Anyone have thoughts as to other possible avenues for investigation?
(In reply to comment #11) > Created attachment 501306 [details] > info on root dentry, post-oops > > Poking around with crash gives me this info on the root dentry, post-oops. The > d_flags are: > > d_flags = 0xc004 = DCACHE_DISCONNECTED|DCACHE_OP_REVALIDATE|DCACHE_OP_DELETE > > ...nothing terribly exciting there. d_parent points to itself (as expected), > and all of the list_heads seem to be empty. It's about time for a complete stab in the dark! It's odd that the dentry above hasn't been __d_drop()ed .... Which call of shrink_dcache_for_umount_subtree() does this oops on, the first call or the second in shrink_dcache_for_umount()? The second call is the one that walks the anonymous dentry list so that should be the one. And the first thing shrink_dcache_for_umount_subtree() does is __d_drop() the dentry. If it is the first call then how does the anonymous dentry come to be under s_root? And why does shrink_dcache_for_umount() decrement the count for the s_root dentry but not for dentrys found on the sb->s_anon list, David?
FWIW, I did some more testing yesterday. I did the reproducer on a kernel both with and without commit 36d43a437, and used crash to look at the d_count after triggering the mount and before the unmount. Those counts were the same in both cases -- the mnt_root had a d_count of 2, and s_root a d_count of 1. Perhaps I should do it again and have a look at the d_flags, or just get the entire dentry structs in both cases so we can compare them.
(In reply to comment #17) > FWIW, I did some more testing yesterday. I did the reproducer on a kernel both > with and without commit 36d43a437, and used crash to look at the d_count after > triggering the mount and before the unmount. Those counts were the same in both > cases -- the mnt_root had a d_count of 2, and s_root a d_count of 1. That makes sense, since s_root won't be used in path lookups and then mnt_root will have one for when it was d_alloc()ed and one from the umount code (I think). > > Perhaps I should do it again and have a look at the d_flags, or just get the > entire dentry structs in both cases so we can compare them. I think we are most interested in the mnt_root dentry because it is the one assigned to the vfsmount struct and so will be the one involved in path walks. I think s_root is tucked away in the super block and is basically inert in the NFS case. What I don't get is whether the oops occurs from the fist call to shrink_dcache_for_umount_subtree() or not. If it does then I'd be wondering how the anonymous dentry got into the tree under s_root at all (although that itself deserves more looking around in the code). The other thing that bugs me is in __d_drop() where we have: if dentry is hashed then if dentry is disconnected remove from appropriate list else remove from other appropriate list maybe block on rcuwalk barrier done done So we don't boot active rcu-walk walks when dropping the anonymous dentry, if it even gets to the drop that is.
Created attachment 502245 [details] pre and post bad patch dentry info Here is the dentry info for the vfsmount->mnt_root and super_block->s_root, both without the "bad" patch (pre) and with the patch (post). I forgot to "set hex" beforehand so a lot of the interesting fields (e.g. d_flags) are in decimal. The d_flags seem to be identical in both cases as well. I don't see much in the way of discrepancies... Also, I should note that the busy dentry in the oops message is the vfsmount->mnt_root, and *not* the super_block->s_root.
I'd be very surprised if you saw the bug appearing in relation to the dentry attached to s_root. NFS's s_root dentry is a placeholder to stop the VFS blowing up, no more. It isn't actually used for anything. As NFS may share superblocks between several disjoint mounts from the same remote filesystem, it keeps its roots in the s_anon list instead. Should the root of a mount be detected in the tree of another mount, then the two trees are spliced together at that point by d_materialise_unique(). The resulting tree will still have an anonymous root. It is done this way as we may not have access to the actual root directory of the remote filesystem, and so may not be able to put that into our tree. Thus the root of the mount is attached directly to the vfsmount struct without going anywhere near the core tree in the superblock.
I was also able to reproduce this using a direct automount map as well (Ian asked if it was possible), so direct vs. indirect map doesn't seem to make any difference.
Also, just for giggles I tried Ian's suggestion to make __d_drop always call dentry_rcuwalk_barrier() and it didn't prevent the oops on unmount.
Here's what I don't quite get... I've bisected the problem down to the patch that switched NFS over to use d_automount. That doesn't really affect unmounting afaict and should really only be affecting things when the automount is triggered. So, it seems likely that that change is causing some sort of difference in the mounted filesystem. The question is what? I'll be happy to collect info from kernels with and without that patch to try and discern where the difference is -- any thoughts about what I should be collecting info on?
(In reply to comment #23) > Here's what I don't quite get... > > I've bisected the problem down to the patch that switched NFS over to use > d_automount. That doesn't really affect unmounting afaict and should really > only be affecting things when the automount is triggered. > > So, it seems likely that that change is causing some sort of difference in the > mounted filesystem. The question is what? I'll be happy to collect info from > kernels with and without that patch to try and discern where the difference is > -- any thoughts about what I should be collecting info on? We should have also checked /proc/mounts in each case.
(In reply to comment #24) > We should have also checked /proc/mounts in each case. I got some clarification from Ian and he was just concerned that it might have been mounted more than once. I verified that it wasn't.
I decided to strace claws-mail. The first thing it does to trigger an automount is an open on the directory: 18672 13:17:53.843597 open("/misc/salusa/scratch", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC <unfinished ...> ...while that crunching away, a different thread does an lstat of a file in that directory (the one it intends to save): 18682 13:17:53.975947 lstat("/misc/salusa/scratch/[CapitalCityCigars] And so it begins!!", <unfinished ...> ...that comes back first: 18682 13:17:54.301130 <... lstat resumed> 0x7f7baea2a910) = -1 ENOENT (No such file or directory) <0.325145> ...then it does an lstat of the parent: 18682 13:17:54.303366 lstat("/misc/salusa/scratch", {st_dev=makedev(0, 42), st_ino=2, st_mode=S_IFDIR|S_ISVTX|0777, st_nlink=8, st_uid=99, st_gid=99, st_blksize=524288, st_blocks=8, st_size=4096, st_atime=2011/06/07-12:13:09, st_mtime=2011/06/07-12:13:09, st_ctime=2011/06/07-12:13:09}) = 0 <0.000390> ...a little later, the opendir returns: 18672 13:17:54.340699 <... open resumed> ) = 23 <0.497070> ...then it sets an inotify watch (probably via some gtk open dialog goop): 18668 13:17:54.341626 inotify_add_watch(21, "/misc/salusa/scratch", IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF|IN_UNMOUNT|IN_ONLYDIR) = 3 <0.000043> ...and then starts a readdir: 18682 13:17:54.342309 getdents(23, <unfinished ...> ...that comes back (I won't paste it as it's pretty big), and then it starts lstat'ing each dentry: 18682 13:17:54.347897 lstat("/misc/salusa/scratch/testfile", {st_dev=makedev(0, 42), st_ino=13, st_mode=S_IFREG|0644, st_nlink=1, st_uid=99, st_gid=99, st_blksize=524288, st_blocks=2097160, st_si ze=1073741824, st_atime=2011/05/19-10:56:52, st_mtime=2011/05/19-10:59:30, st_ctime=2011/05/19-10:59:30}) = 0 <0.000033> ...most of this is pretty straightforward stuff, but I wonder if the inotify_add_watch is important somehow as that's somewhat unusual from command-line apps.
Created attachment 504121 [details] debug patch -- instrument nfs_d_automount Taking Ian's suggestion on IRC into account, I decided to instrument nfs_d_automount. I turned all of the dprintk's into printk's, had them print current->pid. I also added a printk of the string representation of the struct path. It looks like we do have 2 concurrent nfs_d_automount requests going on here. They both return different vfsmounts, though. The second vfsmount does not show up in the mount table (as shown by crash's mount command). Here's the snippet of dmesg when I did the reproducer to trigger a mount. Unfortunately, the dump_stack() got a little muddled, but it gives you an idea of what's going on... [ 214.444300] 1670: --> nfs_d_automount() [ 214.445701] Pid: 1670, comm: claws-mail Not tainted 2.6.39-1.fc16.x86_64.debug #1 [ 214.448353] Call Trace: [ 214.449301] [<ffffffffa01dc897>] nfs_d_automount+0x60/0x579 [nfs] [ 214.451247] 1677: --> nfs_d_automount() [ 214.451251] Pid: 1677, comm: claws-mail Not tainted 2.6.39-1.fc16.x86_64.debug #1 [ 214.451253] Call Trace: [ 214.451279] [<ffffffffa01dc897>] nfs_d_automount+0x60/0x579 [nfs] [ 214.451334] [<ffffffff814b9a15>] ? _raw_spin_unlock+0x28/0x2c [ 214.451362] [<ffffffff811455b4>] ? dput+0xdc/0xfc [ 214.451373] [<ffffffffa01cc347>] ? nfs_lookup_revalidate+0x227/0x366 [nfs] [ 214.451377] [<ffffffff8113c463>] follow_managed+0x12a/0x1ce [ 214.451381] [<ffffffff8113d7b8>] walk_component+0x22c/0x335 [ 214.451402] [<ffffffff81207715>] ? security_inode_exec_permission+0x25/0x27 [ 214.451406] [<ffffffff8113da60>] link_path_walk+0x19f/0x439 [ 214.451409] [<ffffffff8113de1e>] path_lookupat+0x5a/0x2e1 [ 214.451432] [<ffffffff81101170>] ? might_fault+0xa5/0xac [ 214.451435] [<ffffffff81101127>] ? might_fault+0x5c/0xac [ 214.451438] [<ffffffff8113c8ee>] ? getname_flags+0x31/0x1ca [ 214.451441] [<ffffffff8113ef4f>] do_path_lookup+0x28/0x96 [ 214.451444] [<ffffffff8113f374>] user_path_at+0x59/0x96 [ 214.451462] [<ffffffff810774b6>] ? up_read+0x28/0x2c [ 214.451471] [<ffffffff814bcf30>] ? do_page_fault+0x360/0x3b4 [ 214.451475] [<ffffffff8113703f>] vfs_fstatat+0x44/0x6e [ 214.451478] [<ffffffff81101127>] ? might_fault+0x5c/0xac [ 214.451481] [<ffffffff81137087>] vfs_lstat+0x1e/0x20 [ 214.451484] [<ffffffff811371d6>] sys_newlstat+0x1a/0x33 [ 214.451493] [<ffffffff810870f3>] ? trace_hardirqs_on_caller+0x10b/0x12f [ 214.451502] [<ffffffff810a9e5f>] ? audit_syscall_entry+0x11c/0x148 [ 214.451523] [<ffffffff81256e1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 214.451526] [<ffffffff814c05c2>] system_call_fastpath+0x16/0x1b [ 214.451532] nfs_d_automount(1677): enter [ 214.451548] 1677: dentry=ffff88001005c540 path=/misc/salusa/scratch [ 214.471100] SELinux: initialized (dev 0:2b, type nfs4), uses genfs_contexts [ 214.471587] nfs_d_automount(1677): done, success [ 214.471607] 1677: <-- nfs_d_automount() = ffff88001194d140 [ 214.511636] [<ffffffff81078a0d>] ? sched_clock_local+0x12/0x75 [ 214.514001] [<ffffffff81083764>] ? trace_hardirqs_off+0xd/0xf [ 214.515589] [<ffffffff81078bda>] ? local_clock+0x36/0x4d [ 214.517049] [<ffffffff81083e91>] ? lock_release_holdtime.part.8+0x6b/0x72 [ 214.518857] [<ffffffff814b849f>] ? __mutex_unlock_slowpath+0x112/0x122 [ 214.521946] [<ffffffff8113c463>] follow_managed+0x12a/0x1ce [ 214.523395] [<ffffffff8113d7b8>] walk_component+0x22c/0x335 [ 214.524896] [<ffffffff8113d956>] ? link_path_walk+0x95/0x439 [ 214.526307] [<ffffffff8113e4f0>] do_last+0x13f/0x5ad [ 214.527571] [<ffffffff8113f4fe>] path_openat+0xcb/0x31f [ 214.528964] [<ffffffff81078a0d>] ? sched_clock_local+0x12/0x75 [ 214.530742] [<ffffffff8113f78a>] do_filp_open+0x38/0x86 [ 214.532133] [<ffffffff814b9a15>] ? _raw_spin_unlock+0x28/0x2c [ 214.533579] [<ffffffff8114a164>] ? alloc_fd+0x17f/0x191 [ 214.534965] [<ffffffff811327a0>] do_sys_open+0x6e/0x100 [ 214.536294] [<ffffffff810a9e5f>] ? audit_syscall_entry+0x11c/0x148 [ 214.537889] [<ffffffff81132852>] sys_open+0x20/0x22 [ 214.539129] [<ffffffff814c05c2>] system_call_fastpath+0x16/0x1b [ 214.540601] nfs_d_automount(1670): enter [ 214.541958] 1670: dentry=ffff88001005c540 path=/misc/salusa/scratch [ 214.553643] nfs_d_automount(1670): done, success [ 214.555077] 1670: <-- nfs_d_automount() = ffff88001194d280
I've limited the reproducer down a little... I can reproduce this as long as I use a gtk/gnome dialog to trigger a mount. I don't actually need to save anything in the directory. I can just pull up a "browse" dialog, type in "/misc/salusa/scratch/" and hit return. That opens the directory and triggers the mount.
I hacked up a quick patch to make nfs_d_automount return NULL after the first automount attempt. The first time nfs_d_automount is called, it works normally and the second time it just returns NULL (indicating that . This seems to work around the bug, lending some weight to the idea that the concurrent d_automount activity is somehow causing the issue.
A little more printk debugging: [ 450.447517] 1507: --> nfs_d_automount() [ 450.449513] nfs_d_automount(1507): enter [ 450.452785] 1512: --> nfs_d_automount() [ 450.455700] 1507: dentry=ffff880012520a80 path=/misc/salusa/scratch [ 450.471407] SELinux: initialized (dev 0:2b, type nfs4), uses genfs_contexts [ 450.471963] nfs_d_automount(1507): done, success [ 450.474253] 1507: <-- nfs_d_automount() = ffff880022765640 [ 450.475581] nfs_d_automount(1512): enter [ 450.476725] 1512: dentry=ffff880012520a80 path=/misc/salusa/scratch [ 450.480416] follow_automount(1507): finish_automount returned 0 [ 450.490699] nfs_d_automount(1512): done, success [ 450.493240] 1512: <-- nfs_d_automount() = ffff8800114c2000 [ 450.496763] follow_automount(1512): finish_automount returned -16 [ 472.333182] BUG: Dentry ffff880012520930{i=2,n=} still in use (1) [unmount of nfs4 0:2a] ...so follow_automount is returning -EBUSY. It seems somewhat likely that the cleanup of the second vfsmount is not completely correct in this situation.
...and a little more -- show the mnt_count. It's 2 for both mounts prior to calling finish_automount: [ 486.108984] SELinux: initialized (dev 0:2a, type nfs4), uses genfs_contexts [ 486.117197] mount.nfs4 used greatest stack depth: 2304 bytes left [ 486.127215] 1494: --> nfs_d_automount() [ 486.128641] nfs_d_automount(1494): enter [ 486.130170] 1494: dentry=ffff8800125dc3f0 path=/misc/salusa/scratch [ 486.139573] 1499: --> nfs_d_automount() [ 486.143287] SELinux: initialized (dev 0:2b, type nfs4), uses genfs_contexts [ 486.143841] nfs_d_automount(1494): done, success [ 486.145655] 1494: <-- nfs_d_automount() = ffff880011941140 [ 486.147213] follow_automount(1494): mnt->mnt_count = 2 [ 486.148720] nfs_d_automount(1499): enter [ 486.149828] 1499: dentry=ffff8800125dc3f0 path=/misc/salusa/scratch [ 486.154517] follow_automount(1494): finish_automount returned 0 [ 486.168236] nfs_d_automount(1499): done, success [ 486.170616] 1499: <-- nfs_d_automount() = ffff880011941280 [ 486.174350] follow_automount(1499): mnt->mnt_count = 2 [ 486.180294] follow_automount(1499): finish_automount returned -16
I can't for the life of me. find where the extra reference is being picked up. I agree that the fsnotify system probably has a lot to do with this problem but I can't see why. The most obvious way to take fsnotify out of this is to not go through the vfsmount setup for a mount that essentially already exists. That avoids mntput() calling fsnotify_unmount_inodes(). Alternately, we need to consult someone with in-depth knowledge of the fsnotify system. Any ideas who that might be? Anyway, this amounts to using a mutex in ->d_automount() to serialize calls to it and then calling d_mountpoint() on the incoming path dentry after aquiring it and returning NULL if it is true. This should be all that's needed since the mount would be complete by the time we exit ->d_automount(). I also believe that the CIFS d_automount patch suffers the same fault.
(In reply to comment #32) > > I also believe that the CIFS d_automount patch suffers the same > fault. And quite probably nfs referral mounts as well.
I rolled up a small test program that can reproduce this now without a gui: -----------------[snip]------------------ #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> int main(int argc, char **argv) { struct stat buf; fork(); stat(argv[1], &buf); return 0; } -----------------[snip]------------------ ...when I run this with /misc/salusa/scratch/foo as the first argument, then unmounting /misc/salusa triggers the panic. Ian, were you trying to reproduce this on a single-cpu box before? If so, then that might explain why you couldn't get it to happen. With this, I think we can rule out inotify as the problem. The issue is something with simultaneous d_automounts running.
(In reply to comment #34) > I rolled up a small test program that can reproduce this now without a gui: > > -----------------[snip]------------------ > #include <stdio.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <unistd.h> > > int > main(int argc, char **argv) > { > struct stat buf; > > fork(); > > stat(argv[1], &buf); > > return 0; > } > -----------------[snip]------------------ OK, I'll try this in a VM. Only have an F15, a 2.6.38 based kernel I think, but that shouldn't make a difference. > > ...when I run this with /misc/salusa/scratch/foo as the first argument, then > unmounting /misc/salusa triggers the panic. Ian, were you trying to reproduce > this on a single-cpu box before? If so, then that might explain why you > couldn't get it to happen. I use at least 2-cpus in vms, and I think I had 4 in the one I used to try and reproduce this. > > With this, I think we can rule out inotify as the problem. The issue is > something with simultaneous d_automounts running. It does look that way, yes. At least that will save me the time trying to write an inotify based reproducer. Ian
...and with the above reproducer, I can reproduce this without autofs in the mix now. Here's the details: The server (a f15 box named salusa) has a separate filesystem mounted on /scratch that is exported with (rw,insecure,no_root_squash). It has a file in it called "foo". The server does not declare an explicit fsid=0 export, so it fakes up a pseudoroot. On the client, I mount the filesystem: # mount -t nfs4 salusa:/scratch /mnt/salusa ...I'm then careful not to touch anything under /mnt/salusa. I then run the reproducer above with the following path: # ./forkstat /mnt/salusa/scratch/foo ...and then: # umount /mnt/salusa ...at that point, the BUG pops.
*** Bug 708099 has been marked as a duplicate of this bug. ***
...sorry, I made a mistake in comment #36. The mount command should read as: # mount -t nfs4 salusa:/ /mnt/salusa ...everything else should be correct.
(In reply to comment #38) > ...sorry, I made a mistake in comment #36. The mount command should read as: > > # mount -t nfs4 salusa:/ /mnt/salusa > > ...everything else should be correct. Ha, I'm able to reproduce this now, and with only 2-cpus.
I tracked this down to lock_mount() releasing the caller's ref on path->mnt when in finds it has to transit to a new mountpoint from lookup_mnt(). The problem is that pathwalk may not be holding a ref on path->mnt if it's the same as nd->path.mnt as it tries to save on mntget()/mntput() calls.
Created attachment 505077 [details] Patch - VFS: Fix vfsmount overput on simultaneous automount We believe this patch resolves the problem seen in this bug. Since Jeff has put so much effort into this already I'll set about creating a test kernel package.
Created attachment 505085 [details] Patch - VFS: Fix vfsmount overput on simultaneous automount (F15) Oops, original patch didn't apply cleanly. All the more reason to test it yourselves, please.
A kernel with the above patch has been built. Please give it a try and reprot back with results. It can be found at: http://people.redhat.com/~ikent/kernel-2.6.38.8-32.bz708039.1.fc15
I have a similar (maybe?) bug that started with FC15. I run CrashPlan client on a nfs auto mounted file system. The kernel above didn't fix it. I only have 1 CPU. 2.6.38.8-32.bz708039.1.fc15.i686 #1 SMP Thu Jun 16 17:51:40 UTC 2011 i686 i686 i386 GNU/Linux [138413.271428] BUG: Dentry f02ad480{i=13f2d0,n=/} still in use (1) [unmount of autofs autofs] [138413.274358] ------------[ cut here ]------------ [138413.275316] kernel BUG at fs/dcache.c:947! [138413.275316] invalid opcode: 0000 [#1] SMP [138413.275316] last sysfs file: /sys/devices/virtual/bdi/0:81/uevent [138413.275316] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc p4_clockmod vfat fat snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm ppdev iTCO_wdt e100 iTCO_vendor_support parport_pc mii snd_timer snd microcode soundcore parport i2c_i801 snd_page_alloc serio_raw i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: mperf] [138413.275316] [138413.275316] Pid: 1885, comm: automount Not tainted 2.6.38.8-32.bz708039.1.fc15.i686 #1 Compaq Evo D510 CMT/07E8h [138413.275316] EIP: 0060:[<c04f382d>] EFLAGS: 00010296 CPU: 0 [138413.275316] EIP is at shrink_dcache_for_umount_subtree+0xd0/0x162 [138413.275316] EAX: 00000065 EBX: f02ad480 ECX: 00000046 EDX: 00000000 [138413.275316] ESI: c093985f EDI: f282115c EBP: f43a9edc ESP: f43a9eac [138413.275316] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [138413.275316] Process automount (pid: 1885, ti=f43a8000 task=f2bbd7f0 task.ti=f43a8000) [138413.275316] Stack: [138413.275316] c092d11a f02ad480 0013f2d0 f02ad4a4 00000001 c093985f f282115c 00000000 [138413.275316] f02ad4a4 f02ad480 f2821000 c07f9e64 f43a9eec c04f401b f2821000 c67ba780 [138413.275316] f43a9f08 c04e5e3d f5b9a740 c67ba780 00000029 c67ba780 f409fa00 f43a9f14 [138413.275316] Call Trace: [138413.275316] [<c04f401b>] shrink_dcache_for_umount+0x3e/0x4a [138413.275316] [<c04e5e3d>] generic_shutdown_super+0x1e/0xb8 [138413.275316] [<c04e5f48>] kill_anon_super+0x11/0x44 [138413.275316] [<c04e5f99>] kill_litter_super+0x1e/0x21 [138413.275316] [<c05812fe>] autofs4_kill_sb+0x35/0x39 [138413.275316] [<c04e610a>] deactivate_locked_super+0x1f/0x40 [138413.275316] [<c04e6a9b>] deactivate_super+0x2e/0x31 [138413.275316] [<c04f8694>] mntput_no_expire+0xb5/0xb9 [138413.275316] [<c04f8f32>] sys_umount+0x270/0x297 [138413.275316] [<c04f8f77>] sys_oldumount+0x1e/0x20 [138413.275316] [<c07d691c>] syscall_call+0x7/0xb [138413.275316] [<c07d0000>] ? audit_log_exit+0xbca/0xdc3 [138413.275316] Code: 03 8b 41 60 89 54 24 10 8b 55 f0 89 7c 24 18 89 74 24 14 89 5c 24 04 89 54 24 0c 89 44 24 08 c7 04 24 1a d1 92 c0 e8 2c b5 2d 00 <0f> 0b 8b 73 10 39 f3 75 0c 8d 43 68 31 f6 e8 24 2b 0e 00 eb 16 [138413.275316] EIP: [<c04f382d>] shrink_dcache_for_umount_subtree+0xd0/0x162 SS:ESP 0068:f43a9eac [138413.418764] ---[ end trace 073c5e8debef6d67 ]---
(In reply to comment #44) > I have a similar (maybe?) bug that started with FC15. I run CrashPlan client > on a nfs auto mounted file system. The kernel above didn't fix it. I only > have 1 CPU. > That not the same bug, see bug 698806.
Same issue here. Once the last open file under the mountpoint is closed, umount causes kernel ooops. It looks to me like something has been freed when it shouldn't have been. Note the fact that umount reports "/cluster/gslap01: not mounted", when it most certainly was mounted: [root@gslap02 ~]# killall -TERM lock_cleaner [root@gslap02 ~]# lsof | grep /cluster/gslap01 lock_clea 7592 survey1 txt REG 0,23 1332384 95191157 /cluster/gslap01/gs/bin/a.1.1.0/lock_cleaner lock_clea 7592 survey1 1w REG 0,23 3886 95191163 /cluster/gslap01/gs/bin/a.1.1.0/lock_cleaner.errors lock_clea 7592 survey1 2w REG 0,23 3886 95191163 /cluster/gslap01/gs/bin/a.1.1.0/lock_cleaner.errors lock_clea 7592 survey1 3u REG 0,23 16018 94405062 /cluster/gslap01/gs/database/11_7med/mission_plan/db_lock_table.dat lock_clea 7592 survey1 4u REG 0,23 260 95191135 /cluster/gslap01/gs/bin/a.1.1.0/common_lock_table.dat [root@gslap02 ~]# [root@gslap02 ~]# lsof | grep /cluster/gslap01 [root@gslap02 ~]# umount /cluster/gslap01 umount: /cluster/gslap01: not mounted [root@gslap02 ~]# Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: ------------[ cut here ]------------ Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: invalid opcode: 0000 [#1] PREEMPT SMP Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1c.5/0000:09:00.0/net/eth0/statistics/collisions Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: Process umount.nfs4 (pid: 8895, ti=e0834000 task=e08b7540 task.ti=e0834000) Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: Stack: Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: Call Trace: Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: Code: 46 20 c7 45 ec 00 00 00 00 85 c0 74 06 8b 40 60 89 45 ec 8d 82 5c 01 00 00 50 51 53 57 ff 75 ec 56 68 18 26 7f c0 e8 fe d2 f3 ff <0f> 0b 83 c4 1c eb fe 8b 7e 10 8d 46 68 89 45 e4 39 fe 75 1f 8b Message from syslogd@ at Mon Sep 5 10:54:52 2011 ... gslap02 kernel: EIP: [<c050360a>] shrink_dcache_for_umount_subtree+0xd5/0x1a0 SS:ESP 0068:e0835e8c
I should add the above bug occurs with kernel-ml 2.6.39-4.el5.elrepo.x86_64
I'm also seeing another nfs4 issue in the logs, stuff like: bad fileid, expected foo got bar But it didn't look too bad - the fileid/inode concerned was the root of the NFS4 mount, and I figured some non-critical code inside nfs somewhere was just seeing inode of the mountpoint instead of the directory mounted on the mountpoint, or something along those lines. It didn't seem to immediately cause any problems, but did puzzle me a bit. Now however, I think its possible the bad fileid issue may be related to this more serious oops on umount bug. I am just speculating, as I have no knowledge of the source code, but the bad fileid errors I see ALWAYS involve the mountpoint where the nfs share is being mounted. So [erhaps the code to handle bad fileids is a trigger for dentry leak and therefore cause oops on umount? In any case, please fix these dentry cache errors soon. It is disappointing when the NFSv4 lottery crashes servers and clients year after year, kernel after kernel. Do we really need to sacrifice stability and lose data just to add new features to nfsv4? Can we not fix 1 bug without creating 10 more? Is it time to declare nfs4 fatally flawed and impossible to implement reliably - and then move it out of the kernel and into fuse?
its all a bit too hard isnt it.
The patch from comment #42 has been included in the 3.0 kernel release. F15 is using 2.6.40 now, which is 3.0 renamed. If this issue is still present with the latest F15 kernel update, please reopen the bug
*** Bug 715374 has been marked as a duplicate of this bug. ***