From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 Description of problem: We saw the following once on a Dell poweredge 1750 using 2.6.9-22.0.1.ELsmp. A nfs mount seemed to mysteriously disappear followed by: Dec 31 13:23:01 ajak kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... hours later we got the following oops: Dec 31 14:53:26 ajak kernel: Unable to handle kernel paging request at virtual address 01081b17 Dec 31 14:53:26 ajak kernel: printing eip: Dec 31 14:53:26 ajak kernel: c0170372 Dec 31 14:53:26 ajak kernel: *pde = 367b1001 Dec 31 14:53:26 ajak kernel: Oops: 0000 [#1] Dec 31 14:53:26 ajak kernel: SMP Dec 31 14:53:26 ajak kernel: Modules linked in: md5 ipv6 parport_pc lp parport nfs lockd autofs4 i2c_dev i2c_core sunrpc dm_mod button bat tery ac ohci_hcd tg3 floppy sg ext3 jbd mptscsih mptbase sd_mod scsi_mod Dec 31 14:53:26 ajak kernel: CPU: 0 Dec 31 14:53:26 ajak kernel: EIP: 0060:[<c0170372>] Not tainted VLI Dec 31 14:53:26 ajak kernel: EFLAGS: 00010206 (2.6.9-22.0.1.ELsmp) Dec 31 14:53:26 ajak kernel: EIP is at iput+0x25/0x61 Dec 31 14:53:26 ajak kernel: eax: 01081b03 ebx: eec738ec ecx: f8a39bae edx: eec738ec Dec 31 14:53:26 ajak kernel: esi: f6c2ce8c edi: f6c2ce94 ebp: 0000007e esp: f7db3eec Dec 31 14:53:26 ajak kernel: ds: 007b es: 007b ss: 0068 Dec 31 14:53:26 ajak kernel: Process kswapd0 (pid: 52, threadinfo=f7db3000 task=f7de38b0) Dec 31 14:53:26 ajak kernel: Stack: eec738ec c016df98 00000000 00000080 00000000 f7ffe9c0 c016e31f c0148630 Dec 31 14:53:26 ajak kernel: 00153100 00000000 00000002 00000000 0008f901 000000d0 00000020 c0327700 Dec 31 14:53:26 ajak kernel: 00000001 c0324f00 0000000c c01498bc c02cf250 0008f901 f7db3f9c 00000000 Dec 31 14:53:26 ajak kernel: Call Trace: Dec 31 14:53:26 ajak kernel: [<c016df98>] prune_dcache+0x14b/0x19a Dec 31 14:53:26 ajak kernel: [<c016e31f>] shrink_dcache_memory+0x14/0x2b Dec 31 14:53:26 ajak kernel: [<c0148630>] shrink_slab+0xf8/0x161 Dec 31 14:53:26 ajak kernel: [<c01498bc>] balance_pgdat+0x1d2/0x2f8 Dec 31 14:53:26 ajak kernel: [<c02cf250>] schedule+0x844/0x87a Dec 31 14:53:26 ajak kernel: [<c011fdf4>] prepare_to_wait+0x12/0x4c Dec 31 14:53:26 ajak kernel: [<c0149aac>] kswapd+0xca/0xcc Dec 31 14:53:26 ajak kernel: [<c011fec9>] autoremove_wake_function+0x0/0x2d Dec 31 14:53:27 ajak kernel: [<c02d0ed6>] ret_from_fork+0x6/0x14 Dec 31 14:53:27 ajak kernel: [<c011fec9>] autoremove_wake_function+0x0/0x2d Dec 31 14:53:27 ajak kernel: [<c01499e2>] kswapd+0x0/0xcc Dec 31 14:53:27 ajak kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb Dec 31 14:53:27 ajak kernel: Code: ff e9 e5 fe ff ff 53 85 c0 89 c3 74 58 83 bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 0f 0b 54 0 4 e3 88 2e c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba f0 9d 32 c0 e8 72 Dec 31 14:53:27 ajak kernel: <0>Fatal exception: panic in 5 seconds Version-Release number of selected component (if applicable): autofs-4.1.3-155 How reproducible: Didn't try Steps to Reproduce: 1. We have many NFS partitions automounted. 2. One of them mysteriously disappeared 3. Followed by the problem Actual Results: Machine is hung. Additional info: Happy to provide anything else you may need. Thanks!
Created attachment 122882 [details] fix one source of busy inodes after umount Do you have the means to port this patch to RHEL4 and give it a try? Also, if you are a Red Hat customer, please be sure to go through your TAM or https://enterprise.redhat.com/portal to ensure this gets the proper attention.
Hi. We hit this again, this time on a RHEL4 ES machine. Same VFS message. Same oops. We are a redhat customer, well, we have enterprise subscriptions. I'm not sure that entitles us to the kind of support you are talking about? It doesn't look trivial to backport this patch. Thanks for your time.
Please provide the information requested in the "Filing bug reports" section of my people page: http://people.redhat.com/jmoyer I understand that it takes a long time to reproduce, so you can skip the part about configuring debug logs. I do want to have a look at your maps, and I want to know which mountpoint disappeared just before the problem happened. Thanks!
Created attachment 123677 [details] Here is the information you requested. This machine does NAT for some machines behind it which also use autofs/ypbind/.. and are running similar jobs. Same exact setup, except they are RHEL4 WS machines. One of them died very close to this time with the same error. Others were ok. Thanks for your help.
OK. After looking at your maps, I don't think the attached patch will fix your problems. The reason is this: the patch addresses a case whereby unmounting an autofs file system may leave busy inodes. In your case, it is the act of unmounting an NFS file system that triggers the warning. It really looks to me as though this is an NFS issue. I'm going to reassign the bug to our NFS guru. I fully intended to put together a patch with some debugging information, but I don't know exactly what would be useful instrumentation at this point. Steve, could you please give this a look? Thanks.
I forgot to change the component to kernel.
Hi, I guess I'll re-open my original bug, since the duplicate bug fell silent. We finally were able to update our system to RH4-U4 WS. We saw the same trace again today. kernel was definitely 2.6.9-42.ELsmp. We might have to upgrade to U5 soon to fix an NFS cache bug we are also seeing, so we should be able to test the latest and greatest kernel in a week or so.
If there is new information, please file a new bugzilla. This way, our recording keeping will keep straight. *** This bug has been marked as a duplicate of 173843 ***
(In reply to comment #11) > If there is new information, please file a new bugzilla. This way, our > recording keeping will keep straight. > > *** This bug has been marked as a duplicate of 173843 *** Hi, I am reading through the description of 173843 and though the problem seems to manifest itself with the same symptoms, I'm not sure the cause is the same. In particular we are seeing this on NFS filesystems *only* with no GFS, ext3, ... involved. So I didn't want to re-open the 173843 bug as it seems that problem is fixed (or the submitter has gone quiet). Our problem is not fixed and might be a *separate* NFS problem. So re-opening this bug seems appropriate? Whatever you think is best, let me know. I have no new information to report other than it's still happening with the U4 kernel, which supposedly fixed the other bug (173843). Thanks!
Looking through all the reports it looks like this might be closer to 177357. Though it's not clear what filesystem the reporter is seeing that bug on. I'll mark this as a duplicate of 177357, which still looks like an open issue. Thanks! *** This bug has been marked as a duplicate of 177357 ***