abrt version: 2.0.1 cmdline: ro root=/dev/mapper/vg_starbuck-lv_root rd_LUKS_UUID=luks-9d65fb86-1ea5-4399-9e03-590df5d86a5a rd_LVM_LV=vg_starbuck/lv_root rd_LVM_LV=vg_starbuck/lv_swap rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us nouveau.modeset=0 rdblacklist=nouveau quiet component: kernel kernel_tainted: 129 kernel: 2.6.38.2-9.fc15.x86_64 reason: BUG: Dentry ffff880137e720c0{i=2d319,n=/} still in use (1) [unmount of autofs autofs] architecture: x86_64 package: kernel os_release: Fedora release 15 (Lovelock) time: 1303416690 Text file: backtrace, 3061 bytes comment ----- This happens randomly when having a nfs4 filesystem automounted on /net From what I read on random mailing lists this is an upstream bug. event_log ----- 2011-04-21-22:12:44> The report was appended to /tmp/abrt.log 2011-04-21-22:24:24> Submitting oops report to http://submit.kerneloops.org/submitoops.php 2011-04-21-22:24:25 Kernel oops report was uploaded reported_to ----- file: /tmp/abrt.log kerneloops: URL=http://submit.kerneloops.org/submitoops.php
Created attachment 493982 [details] File: backtrace
Yep, I've seen this personally too... My (hand wavy) suspicion is that it's related to the RCU pathwalk patches that went into 2.6.38, but it may be something else entirely. What we could really use is a reliable reproducer for this. One thing that might be helpful is that when this occurs, collect the output of nfsstat -c. That might allow us to rule out some codepaths.
What do the autofs maps you are using look like?
(In reply to comment #3) > What do the autofs maps you are using look like? Hang on, you say your using the hosts map. Do the hosts you are using have many exports?
(In reply to comment #2) > Yep, I've seen this personally too... I think there is more than one problem causing these.
I have two servers that I connect to with autofs. One F14 x86_64 and one F14 i686 install (both always fully updated). It seems that this occurs more frequently on the i686 install. As requested I've now attached a nfsstat -c of when this error just occurred, under a minute.
Created attachment 494221 [details] nfsstat
(In reply to comment #5) > (In reply to comment #2) > > Yep, I've seen this personally too... > > I think there is more than one problem causing these. First thing to do is to update with the autofs patches that went into 2.6.39-rc. I'll work on getting a kernel built with those. The down side is, if this really isn't an autofs problem, these patches will probably hide the real bug. OTOH I have a kernel.org bug, slightly different to this, that these patches didn't fix, but they do fix the problem that my autofs submount-test shows up that has a back trace just like this.
I just checked and this nfsstat is related to an auto-unmount from the F14 i686 install which is a nfs3 (!) export, only one filesystem is exported from this server. The other exports, from the F14 x86_64 install which exports multiple filesystems, have almost no problems. I've attached two other abrt crash logs, both related to the last crash. abrt_log_20100422153545 is the log of the exact moment of the last crash. abrt_log_20100422154421 is the log of me doing 'umount -l /net' after a SIGKILL of the remaining automount process. This seems the only way to resolve this. The problem that remains is that I am now no longer able to restart my F15 install, it just stalls. I'll do that now and make a note of what happens.
Created attachment 494222 [details] abrt_log_20100422153545
Created attachment 494223 [details] abrt_log_20100422154421
You may be right that this is due to a number of different problems. I didn't notice before that the original problem in this bug was due to unmounting an autofs mount. I've seen similar oopses when unmounting nfs4 mounts too: http://www.spinics.net/lists/linux-nfs/msg20232.html ...perhaps these problems are related? I'm still at a bit of a loss as to how best to attack this though.
Jeff, that seems very similair to what I'm experiencing. And sorry for not being clear from the start, I knew it had something to do with a timed unmount. I just wanted to create a placeholder and fill in the details as time passed by. I've attached my /var/log/messages of the reboot, as you can see there's nothing special regarding me rebooting... I had to poweroff my system. Please ignore the SELinux errors, I'm still working on those and until those are sorted (I might make bugs) I'm running permissive.
Created attachment 494228 [details] crash_reboot_log
(In reply to comment #12) > You may be right that this is due to a number of different problems. I didn't > notice before that the original problem in this bug was due to unmounting an > autofs mount. I've seen similar oopses when unmounting nfs4 mounts too: > > http://www.spinics.net/lists/linux-nfs/msg20232.html > > ...perhaps these problems are related? I'm still at a bit of a loss as to how > best to attack this though. There is definitely a possible dentry ref count leak in the autofs expire code in 2.6.38. The real problem is that it should be hard to trigger but the reports we get it seems that people are able to trigger it easily. Possibly the reason it is easy to trigger in some cases is because the rcu-walk series changed one of the traversals from a directory entry list traversal to a depth first tree traversal in the expire check code. Though I couldn't work out why that would make it happen more easily either. In any case the autofs patches that are going into 2.6.39 fixed the problems I was able to force in testing. It will take a little while for me to dig them out and build a test kernel.
Since I'm not going to get onto this until tomorrow I can post the patches so you can see what the changes are, at least.
Created attachment 494237 [details] vfs - check non-mountpoint dentry might block in __follow_mount_rcu()
Created attachment 494238 [details] autofs4 - reinstate last used update on access
Created attachment 494239 [details] autofs4 - fix dentry leak in autofs4_expire_direct()
Created attachment 494240 [details] autofs4 - fix autofs4_expire_indirect() traversal
Created attachment 494241 [details] autofs4 - fix d_manage() return on rcu-walk
Created attachment 494242 [details] autofs4 - remove autofs4_lock
Created attachment 494243 [details] autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd() This patch isn't a result of testing on 2.6.38, it is a contributed patch I had hanging around. So, for the sake of completeness wrt. what went into 2.6.39 I have included it here as well.
Clearly not all these patches are likely related to the problem here. But this wasn't the only problem that I found that was related to the unexpected concurrent merge of the rcu-walk and the vfs-automount series in 2.6.38. Ouch!
Sorry to take so long to build the test kernel. A kernel with the above patches can be found at: http://people.redhat.com/~ikent/kernel-2.6.38.3-18.bz698806.1.fc15 Please try this out and let me know how it goes.
No worries, I'm installing as I am writing this.
Hmm, I forgot that to really test I also need the devel packages for my nvidia driver.
I'll use nouveau for now. In general I don't like to change something while testing a problem.
Good news, everyone! The patched kernel seems to work for me, I've been testing and timed auto unmounts don't crash the kernel.
Reassigning to Ian since he's doing all the work here anyway :)
I've added Kyle McMartin to the cc list here. Kyle, if we aren't going to see 2.6.39 for F15 sometime soon we really should apply this patch series. Can you help please?
any update on this?
(In reply to comment #32) > any update on this? I was hopeing Kyle would get around to adding these patches but we need to wait a little anyway because it looks like there will be another patch going upstream shortly. See bug #719607 for more information.
so this problem is fixed by the #719607 patch and that will be part of some v3.X kernel update?
(In reply to comment #34) > so this problem is fixed by the #719607 patch and that will be part of some > v3.X kernel update? The fix for bug 719607 will be in 2.6.40.3-2
This should be fixed per comment #35.