177122 – VFS: busy inodes after unmount

Bug 177122 - VFS: busy inodes after unmount

Summary: VFS: busy inodes after unmount

Keywords:
Status:	CLOSED DUPLICATE of bug 177357
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Peter Staubach
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	170416
TreeView+	depends on / blocked

Reported:	2006-01-06 15:13 UTC by Aaron Straus
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-06-19 18:08:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
fix one source of busy inodes after umount (3.21 KB, patch) 2006-01-06 16:55 UTC, Jeff Moyer	no flags	Details \| Diff
Here is the information you requested. (4.95 KB, text/plain) 2006-01-25 16:42 UTC, Aaron Straus	no flags	Details
View All

Description Aaron Straus 2006-01-06 15:13:50 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

Description of problem:
We saw the following once on a Dell poweredge 1750 using 2.6.9-22.0.1.ELsmp.  

A nfs mount seemed to mysteriously disappear followed by:

Dec 31 13:23:01 ajak kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...

hours later we got the following oops:
Dec 31 14:53:26 ajak kernel: Unable to handle kernel paging request at virtual address 01081b17
Dec 31 14:53:26 ajak kernel:  printing eip:
Dec 31 14:53:26 ajak kernel: c0170372
Dec 31 14:53:26 ajak kernel: *pde = 367b1001
Dec 31 14:53:26 ajak kernel: Oops: 0000 [#1]
Dec 31 14:53:26 ajak kernel: SMP 
Dec 31 14:53:26 ajak kernel: Modules linked in: md5 ipv6 parport_pc lp parport nfs lockd autofs4 i2c_dev i2c_core sunrpc dm_mod button bat
tery ac ohci_hcd tg3 floppy sg ext3 jbd mptscsih mptbase sd_mod scsi_mod
Dec 31 14:53:26 ajak kernel: CPU:    0
Dec 31 14:53:26 ajak kernel: EIP:    0060:[<c0170372>]    Not tainted VLI
Dec 31 14:53:26 ajak kernel: EFLAGS: 00010206   (2.6.9-22.0.1.ELsmp) 
Dec 31 14:53:26 ajak kernel: EIP is at iput+0x25/0x61
Dec 31 14:53:26 ajak kernel: eax: 01081b03   ebx: eec738ec   ecx: f8a39bae   edx: eec738ec
Dec 31 14:53:26 ajak kernel: esi: f6c2ce8c   edi: f6c2ce94   ebp: 0000007e   esp: f7db3eec
Dec 31 14:53:26 ajak kernel: ds: 007b   es: 007b   ss: 0068
Dec 31 14:53:26 ajak kernel: Process kswapd0 (pid: 52, threadinfo=f7db3000 task=f7de38b0)
Dec 31 14:53:26 ajak kernel: Stack: eec738ec c016df98 00000000 00000080 00000000 f7ffe9c0 c016e31f c0148630 
Dec 31 14:53:26 ajak kernel:        00153100 00000000 00000002 00000000 0008f901 000000d0 00000020 c0327700 
Dec 31 14:53:26 ajak kernel:        00000001 c0324f00 0000000c c01498bc c02cf250 0008f901 f7db3f9c 00000000 
Dec 31 14:53:26 ajak kernel: Call Trace:
Dec 31 14:53:26 ajak kernel:  [<c016df98>] prune_dcache+0x14b/0x19a
Dec 31 14:53:26 ajak kernel:  [<c016e31f>] shrink_dcache_memory+0x14/0x2b
Dec 31 14:53:26 ajak kernel:  [<c0148630>] shrink_slab+0xf8/0x161
Dec 31 14:53:26 ajak kernel:  [<c01498bc>] balance_pgdat+0x1d2/0x2f8
Dec 31 14:53:26 ajak kernel:  [<c02cf250>] schedule+0x844/0x87a
Dec 31 14:53:26 ajak kernel:  [<c011fdf4>] prepare_to_wait+0x12/0x4c
Dec 31 14:53:26 ajak kernel:  [<c0149aac>] kswapd+0xca/0xcc
Dec 31 14:53:26 ajak kernel:  [<c011fec9>] autoremove_wake_function+0x0/0x2d
Dec 31 14:53:27 ajak kernel:  [<c02d0ed6>] ret_from_fork+0x6/0x14
Dec 31 14:53:27 ajak kernel:  [<c011fec9>] autoremove_wake_function+0x0/0x2d
Dec 31 14:53:27 ajak kernel:  [<c01499e2>] kswapd+0x0/0xcc
Dec 31 14:53:27 ajak kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Dec 31 14:53:27 ajak kernel: Code: ff e9 e5 fe ff ff 53 85 c0 89 c3 74 58 83 bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 0f 0b 54 0
4 e3 88 2e c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba f0 9d 32 c0 e8 72 
Dec 31 14:53:27 ajak kernel:  <0>Fatal exception: panic in 5 seconds


Version-Release number of selected component (if applicable):
autofs-4.1.3-155

How reproducible:
Didn't try

Steps to Reproduce:
1.  We have many NFS partitions automounted. 
2.  One of them mysteriously disappeared
3.  Followed by the problem
  

Actual Results:  Machine is hung.

Additional info:

Happy to provide anything else you may need.  Thanks!

Comment 1 Jeff Moyer 2006-01-06 16:55:27 UTC

Created attachment 122882 [details]
fix one source of busy inodes after umount

Do you have the means to port this patch to RHEL4 and give it a try?

Also, if you are a Red Hat customer, please be sure to go through your TAM or
https://enterprise.redhat.com/portal to ensure this gets the proper attention.

Comment 2 Aaron Straus 2006-01-25 14:40:20 UTC

Hi.  We hit this again, this time on a RHEL4 ES machine.  Same VFS message. 
Same oops.  We are a redhat customer, well, we have enterprise subscriptions. 
I'm not sure that entitles us to the kind of support you are talking about?  It
doesn't look trivial to backport this patch.  Thanks for your time.

Comment 3 Jeff Moyer 2006-01-25 15:18:01 UTC

Please provide the information requested in the "Filing bug reports" section of
my people page:

  http://people.redhat.com/jmoyer

I understand that it takes a long time to reproduce, so you can skip the part
about configuring debug logs.  I do want to have a look at your maps, and I want
to know which mountpoint disappeared just before the problem happened.

Thanks!

Comment 4 Aaron Straus 2006-01-25 16:42:58 UTC

Created attachment 123677 [details]
Here is the information you requested.

This machine does NAT for some machines behind it which also use
autofs/ypbind/.. and are running similar jobs.	Same exact setup, except they
are RHEL4 WS machines.	One of them died very close to this time with the same
error.	Others were ok.  Thanks for your help.

Comment 5 Jeff Moyer 2006-01-27 18:32:34 UTC

OK.  After looking at your maps, I don't think the attached patch will fix your
problems.  The reason is this:  the patch addresses a case whereby unmounting an
autofs file system may leave busy inodes.  In your case, it is the act of
unmounting an NFS file system that triggers the warning.

It really looks to me as though this is an NFS issue.  I'm going to reassign the
bug to our NFS guru.  I fully intended to put together a patch with some
debugging information, but I don't know exactly what would be useful
instrumentation at this point.

Steve, could you please give this a look?

Thanks.

Comment 6 Jeff Moyer 2006-01-27 20:41:23 UTC

I forgot to change the component to kernel.

Comment 10 Aaron Straus 2007-06-19 18:06:33 UTC

Hi, I guess I'll re-open my original bug, since the duplicate bug fell silent.  

We finally were able to update our system to RH4-U4 WS.  

We saw the same trace again today.  kernel was definitely 2.6.9-42.ELsmp.  

We might have to upgrade to U5 soon to fix an NFS cache bug we are also seeing,
so we should be able to test the latest and greatest kernel in a week or so.

Comment 11 Peter Staubach 2007-06-19 18:08:47 UTC

If there is new information, please file a new bugzilla.  This way, our
recording keeping will keep straight.

*** This bug has been marked as a duplicate of 173843 ***

Comment 12 Aaron Straus 2007-06-19 18:19:02 UTC

(In reply to comment #11)
> If there is new information, please file a new bugzilla.  This way, our
> recording keeping will keep straight.
> 
> *** This bug has been marked as a duplicate of 173843 ***

Hi,

  I am reading through the description of 173843 and though the problem seems to
manifest itself with the same symptoms, I'm not sure the cause is the same.  In
particular we are seeing this on NFS filesystems *only* with no GFS, ext3, ...
involved.   So I didn't want to re-open the 173843 bug as it seems that problem
is fixed (or the submitter has gone quiet).

  Our problem is not fixed and might be a *separate* NFS problem.   So
re-opening this bug seems appropriate?  

  Whatever you think is best, let me know.  I have no new information to report
other than it's still happening with the U4 kernel, which supposedly fixed the
other bug (173843).  Thanks!

Comment 13 Aaron Straus 2007-06-19 18:45:19 UTC

Looking through all the reports it looks like this might be closer to 177357. 
Though it's not clear what filesystem the reporter is seeing that bug on.   I'll
mark this as a duplicate of 177357, which still looks like an open issue.  

Thanks!

*** This bug has been marked as a duplicate of 177357 ***

Note You need to log in before you can comment on or make changes to this bug.