Red Hat Bugzilla – Bug 177122
VFS: busy inodes after unmount
Last modified: 2007-11-30 17:07:22 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Description of problem:
We saw the following once on a Dell poweredge 1750 using 2.6.9-22.0.1.ELsmp.
A nfs mount seemed to mysteriously disappear followed by:
Dec 31 13:23:01 ajak kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
hours later we got the following oops:
Dec 31 14:53:26 ajak kernel: Unable to handle kernel paging request at virtual address 01081b17
Dec 31 14:53:26 ajak kernel: printing eip:
Dec 31 14:53:26 ajak kernel: c0170372
Dec 31 14:53:26 ajak kernel: *pde = 367b1001
Dec 31 14:53:26 ajak kernel: Oops: 0000 [#1]
Dec 31 14:53:26 ajak kernel: SMP
Dec 31 14:53:26 ajak kernel: Modules linked in: md5 ipv6 parport_pc lp parport nfs lockd autofs4 i2c_dev i2c_core sunrpc dm_mod button bat
tery ac ohci_hcd tg3 floppy sg ext3 jbd mptscsih mptbase sd_mod scsi_mod
Dec 31 14:53:26 ajak kernel: CPU: 0
Dec 31 14:53:26 ajak kernel: EIP: 0060:[<c0170372>] Not tainted VLI
Dec 31 14:53:26 ajak kernel: EFLAGS: 00010206 (2.6.9-22.0.1.ELsmp)
Dec 31 14:53:26 ajak kernel: EIP is at iput+0x25/0x61
Dec 31 14:53:26 ajak kernel: eax: 01081b03 ebx: eec738ec ecx: f8a39bae edx: eec738ec
Dec 31 14:53:26 ajak kernel: esi: f6c2ce8c edi: f6c2ce94 ebp: 0000007e esp: f7db3eec
Dec 31 14:53:26 ajak kernel: ds: 007b es: 007b ss: 0068
Dec 31 14:53:26 ajak kernel: Process kswapd0 (pid: 52, threadinfo=f7db3000 task=f7de38b0)
Dec 31 14:53:26 ajak kernel: Stack: eec738ec c016df98 00000000 00000080 00000000 f7ffe9c0 c016e31f c0148630
Dec 31 14:53:26 ajak kernel: 00153100 00000000 00000002 00000000 0008f901 000000d0 00000020 c0327700
Dec 31 14:53:26 ajak kernel: 00000001 c0324f00 0000000c c01498bc c02cf250 0008f901 f7db3f9c 00000000
Dec 31 14:53:26 ajak kernel: Call Trace:
Dec 31 14:53:26 ajak kernel: [<c016df98>] prune_dcache+0x14b/0x19a
Dec 31 14:53:26 ajak kernel: [<c016e31f>] shrink_dcache_memory+0x14/0x2b
Dec 31 14:53:26 ajak kernel: [<c0148630>] shrink_slab+0xf8/0x161
Dec 31 14:53:26 ajak kernel: [<c01498bc>] balance_pgdat+0x1d2/0x2f8
Dec 31 14:53:26 ajak kernel: [<c02cf250>] schedule+0x844/0x87a
Dec 31 14:53:26 ajak kernel: [<c011fdf4>] prepare_to_wait+0x12/0x4c
Dec 31 14:53:26 ajak kernel: [<c0149aac>] kswapd+0xca/0xcc
Dec 31 14:53:26 ajak kernel: [<c011fec9>] autoremove_wake_function+0x0/0x2d
Dec 31 14:53:27 ajak kernel: [<c02d0ed6>] ret_from_fork+0x6/0x14
Dec 31 14:53:27 ajak kernel: [<c011fec9>] autoremove_wake_function+0x0/0x2d
Dec 31 14:53:27 ajak kernel: [<c01499e2>] kswapd+0x0/0xcc
Dec 31 14:53:27 ajak kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb
Dec 31 14:53:27 ajak kernel: Code: ff e9 e5 fe ff ff 53 85 c0 89 c3 74 58 83 bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 0f 0b 54 0
4 e3 88 2e c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba f0 9d 32 c0 e8 72
Dec 31 14:53:27 ajak kernel: <0>Fatal exception: panic in 5 seconds
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. We have many NFS partitions automounted.
2. One of them mysteriously disappeared
3. Followed by the problem
Actual Results: Machine is hung.
Happy to provide anything else you may need. Thanks!
Created attachment 122882 [details]
fix one source of busy inodes after umount
Do you have the means to port this patch to RHEL4 and give it a try?
Also, if you are a Red Hat customer, please be sure to go through your TAM or
https://enterprise.redhat.com/portal to ensure this gets the proper attention.
Hi. We hit this again, this time on a RHEL4 ES machine. Same VFS message.
Same oops. We are a redhat customer, well, we have enterprise subscriptions.
I'm not sure that entitles us to the kind of support you are talking about? It
doesn't look trivial to backport this patch. Thanks for your time.
Please provide the information requested in the "Filing bug reports" section of
my people page:
I understand that it takes a long time to reproduce, so you can skip the part
about configuring debug logs. I do want to have a look at your maps, and I want
to know which mountpoint disappeared just before the problem happened.
Created attachment 123677 [details]
Here is the information you requested.
This machine does NAT for some machines behind it which also use
autofs/ypbind/.. and are running similar jobs. Same exact setup, except they
are RHEL4 WS machines. One of them died very close to this time with the same
error. Others were ok. Thanks for your help.
OK. After looking at your maps, I don't think the attached patch will fix your
problems. The reason is this: the patch addresses a case whereby unmounting an
autofs file system may leave busy inodes. In your case, it is the act of
unmounting an NFS file system that triggers the warning.
It really looks to me as though this is an NFS issue. I'm going to reassign the
bug to our NFS guru. I fully intended to put together a patch with some
debugging information, but I don't know exactly what would be useful
instrumentation at this point.
Steve, could you please give this a look?
I forgot to change the component to kernel.
Hi, I guess I'll re-open my original bug, since the duplicate bug fell silent.
We finally were able to update our system to RH4-U4 WS.
We saw the same trace again today. kernel was definitely 2.6.9-42.ELsmp.
We might have to upgrade to U5 soon to fix an NFS cache bug we are also seeing,
so we should be able to test the latest and greatest kernel in a week or so.
If there is new information, please file a new bugzilla. This way, our
recording keeping will keep straight.
*** This bug has been marked as a duplicate of 173843 ***
(In reply to comment #11)
> If there is new information, please file a new bugzilla. This way, our
> recording keeping will keep straight.
> *** This bug has been marked as a duplicate of 173843 ***
I am reading through the description of 173843 and though the problem seems to
manifest itself with the same symptoms, I'm not sure the cause is the same. In
particular we are seeing this on NFS filesystems *only* with no GFS, ext3, ...
involved. So I didn't want to re-open the 173843 bug as it seems that problem
is fixed (or the submitter has gone quiet).
Our problem is not fixed and might be a *separate* NFS problem. So
re-opening this bug seems appropriate?
Whatever you think is best, let me know. I have no new information to report
other than it's still happening with the U4 kernel, which supposedly fixed the
other bug (173843). Thanks!
Looking through all the reports it looks like this might be closer to 177357.
Though it's not clear what filesystem the reporter is seeing that bug on. I'll
mark this as a duplicate of 177357, which still looks like an open issue.
*** This bug has been marked as a duplicate of 177357 ***