From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.7.12-1.1.3.2 Description of problem: Operational file server has crashed twice within a week with the same panic signature included below. Nothing unusual known to be occuring when the machine crashed, except that both times it was preceeded by a busy inodes mesg, see below Jan 10 09:17:38 VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Jan 10 09:35:57 Unable to handle kernel paging request at virtual address 6668c79a printing eip: c01815f7 *pde = 00000000 Oops: 0000 qla2300_conf ide-cd cdrom nfsd nfs lockd usbserial lp parport mvfs vnode sunrpc autofs4 e1000 floppy sg microcode keybdev mousedev hid input usb-uhci usbcore CPU: 0 EIP: 0060:[<c01815f7>] Tainted: PF EFLAGS: 00210206 EIP is at iput [kernel] 0x37 (2.4.21-37.ELsmp/i686) eax: 6668c782 ebx: eb54bb00 ecx: eb54bb10 edx: f289f380 esi: 6668c782 edi: e50abc00 ebp: 00000f17 esp: f7f0ff6c ds: 0068 es: 0068 ss: 0068 Process kswapd (pid: 11, stackpage=f7f0f000) Stack: efb82100 c017e2c0 f8e5cae7 f289f398 f289f380 eb54bb00 c017e7ca eb54bb00 eb54bb00 c03a7b80 0000b7c7 00000000 00000040 c017eb98 00011103 00000000 c0157180 00000006 000001d0 00000014 00000000 00000000 0001768a 00000000 Call Trace: [<c017e2c0>] dput [kernel] 0x30 (0xf7f0ff70) [<f8e5cae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f0ff74) [<c017e7ca>] prune_dcache [kernel] 0x18a (0xf7f0ff84) [<c017eb98>] shrink_dcache_memory [kernel] 0x68 (0xf7f0ffa0) [<c0157180>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f0ffac) [<c0157348>] kswapd [kernel] 0x68 (0xf7f0ffd0) [<c01572e0>] kswapd [kernel] 0x0 (0xf7f0ffe4) [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xf7f0fff0) Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 9c c5 3a c0 8d Kernel panic: Fatal exception Rebooting in 60 seconds.. Version-Release number of selected component (if applicable): kernel-smp-2.4.21-37.EL How reproducible: Couldn't Reproduce Steps to Reproduce: No known way to reproduce crash Additional info:
Created attachment 123010 [details] Attached is the sysreport for the server Also is a panic report from the previous crash It is identical to the one we had today. Jan 6 04:11:16 VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Jan 6 04:11:16 Jan 6 04:26:24 Unable to handle kernel paging request at virtual address 6668c79a Jan 6 04:26:24 printing eip: Jan 6 04:26:24 c0181257 Jan 6 04:26:24 *pde = 00000000 Jan 6 04:26:25 Oops: 0000 Jan 6 04:26:25 mvfs vnode nfsd nfs lockd sunrpc usbserial lp parport autofs4 e1000 floppy sg mi Jan 6 04:26:25 crocode keybdev mousedev hid input usb-uhci usbcore ext3 jbd raid1 qla2300 qla Jan 6 04:26:25 CPU: 1 Jan 6 04:26:25 EIP: 0060:[<c0181257>] Tainted: PF Jan 6 04:26:25 EFLAGS: 00010206 Jan 6 04:26:25 Jan 6 04:26:25 EIP is at iput [kernel] 0x37 (2.4.21-32.0.1.ELsmp/i686) Jan 6 04:26:25 eax: 6668c782 ebx: f2665980 ecx: f2665990 edx: eea8f600 Jan 6 04:26:25 esi: 6668c782 edi: ee457400 ebp: 00010ae4 esp: f7f03f6c Jan 6 04:26:25 ds: 0068 es: 0068 ss: 0068 Jan 6 04:26:25 Process kswapd (pid: 11, stackpage=f7f03000) Jan 6 04:26:25 Jan 6 04:26:25 Stack: f1b89d00 c017df70 f8cb5ae7 eea8f618 eea8f600 f2665980 c017e47a f2665980 Jan 6 04:26:25 Jan 6 04:26:25 f2665980 c03a7b00 00007c49 00000000 00000040 c017e848 00011d2d 00000000 Jan 6 04:26:25 Jan 6 04:26:25 c0157000 00000006 000001d0 00000014 00000000 00000000 0000f8d8 00000000 Jan 6 04:26:25 Jan 6 04:26:25 Call Trace: [<c017df70>] dput [kernel] 0x30 (0xf7f03f70) Jan 6 04:26:25 [<f8cb5ae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f03f74) Jan 6 04:26:26 [<c017e47a>] prune_dcache [kernel] 0x18a (0xf7f03f84) Jan 6 04:26:26 [<c017e848>] shrink_dcache_memory [kernel] 0x68 (0xf7f03fa0) Jan 6 04:26:26 [<c0157000>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f03fac) Jan 6 04:26:26 [<c01571c8>] kswapd [kernel] 0x68 (0xf7f03fd0) Jan 6 04:26:26 [<c0157160>] kswapd [kernel] 0x0 (0xf7f03fe4) Jan 6 04:26:26 [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xf7f03ff0) Jan 6 04:26:26 Jan 6 04:26:26 Jan 6 04:26:26 Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 1c c5 3a c0 8d Jan 6 04:26:26 Jan 6 04:26:26 Kernel panic: Fatal exception
Can this problem be reproduced without a tainted kernel?
We can't intentionally reproduce this error period. I can only say we have 2 servers with mvfs modules. Only one so far has crashed. Is there any reason to think it is related to the mvfs module ?
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds. > Is there any reason to think it is related to the mvfs module ? What filesystem umount caused this message?
No idea. Are you saying it was mvfs ?
> Are you saying it was mvfs ? No, I'm just asking. Without a core dump there's no way of telling; and even with a core dump, it still may also be impossible to tell, and would require a debug kernel. But when that "VFS Busy Inodes" message occurs, it means that there are one or more leftover inode(s) from the unmounted filesystem hanging around, containing stale pointers, and eventually some other entity is going to run into them, and cause a subsequent crash like you're seeing. The "self-destruct" message is letting you know to expect disaster in the near future.
Okay, we have started the netdump utility.
Can you tell me if the immediately preceeding unmount, is the cause of the busy inodes error message. Or, is there not such a sequential relationship ?
Yes -- if during an unmount, all attempts to flush all of the inodes in that filesystem fails (which should never happen normally), you will get that message. Those "dangling" inodes remain on in-kernel inode lists that are later parsed and dealt with, at which time the stale (freed) super_block pointer in the inode is used. Depending upon what happened to the memory previously used by the freed super_block, these types of crashes will occur. This is from the kill_super() function, which is called during the umount system call: if (invalidate_inodes(sb)) { printk(KERN_ERR "VFS: Busy inodes after unmount. " "Self-destruct in 5 seconds. Have a nice day...\n"); }
Here are the umounts in the message file immediatly preceeding the "have a nice day" message for both crashes. They refer to /cfsad which is an ext3 file system that just contains user home areas Jan 6 04:10:53 acnlin80 rpc.mountd: authenticated unmount request from acnlin22.pbn.bnl.gov:967 for /cfsad (/cfsad) Jan 6 04:10:55 acnlin80 rpc.mountd: authenticated unmount request from acnlin43.pbn.bnl.gov:724 for /cfsad (/cfsad) Jan 6 04:11:03 acnlin80 kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Jan 10 09:17:11 acnlin80 rpc.mountd: authenticated unmount request from acnlin43.pbn.bnl.gov:626 for /cfsad (/cfsad) Jan 10 09:17:31 acnlin80 kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
Ok, so we wait for a vmcore, since the oops messages don't give us anything to debug.
This latest problem is almost identical to one we reported last September 167839 รข kernel crashes with an Ooops. At the time we were told the problem was fixed in U6. We have upgraded our systems, but as you see the problem persists.
Agreed -- the NEEDINFO_REPORTER refers to the vmcore request in comment #12.
*** Bug 167839 has been marked as a duplicate of this bug. ***
We had the same problem today I'm having problems attaching a core dump says it's too large. How do I get it to you ? We also had the added complexity that we couln't reboot We kept getting the following message. Not sure what the relationship is to the original crash Kernel panic: no init found. Try passing init= option to kernel. here is the oops... Unable to handle kernel paging request at virtual address 6668c79a printing eip: c01815f7 *pde = 00000000 Oops: 0000 netconsole iptable_nat ip_conntrack iptable_filter ip_tables ide-cd cdrom nfsd nfs lockd usbserial lp parport mvfs vnode sunrpc autofs4 e1000 floppy sg microc CPU: 2 EIP: 0060:[<c01815f7>] Tainted: PF EFLAGS: 00010206 EIP is at iput [kernel] 0x37 (2.4.21-37.ELsmp/i686) eax: 6668c782 ebx: e73e5480 ecx: e73e5490 edx: d8910880 esi: 6668c782 edi: e92b8400 ebp: 000077a4 esp: f7f0ff6c ds: 0068 es: 0068 ss: 0068 Process kswapd (pid: 11, stackpage=f7f0f000) Stack: c9d42180 c017e2c0 f8e5cae7 d8910898 d8910880 e73e5480 c017e7ca e73e5480 e73e5480 c03a7b80 00003d59 00000000 00000040 c017eb98 00011928 00000000 c0157180 00000006 000001d0 00000014 00000000 00000000 00007ae0 00000000 Call Trace: [<c017e2c0>] dput [kernel] 0x30 (0xf7f0ff70) [<f8e5cae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f0ff74) [<c017e7ca>] prune_dcache [kernel] 0x18a (0xf7f0ff84) [<c017eb98>] shrink_dcache_memory [kernel] 0x68 (0xf7f0ffa0) [<c0157180>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f0ffac) [<c0157348>] kswapd [kernel] 0x68 (0xf7f0ffd0) [<c01572e0>] kswapd [kernel] 0x0 (0xf7f0ffe4) [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xf7f0fff0) Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 9c c5 3a c0 8d CPU#0 is frozen. CPU#1 is frozen. CPU#2 is executing netdump. CPU#3 is frozen. < netdump activated - performing handshake with the server. >
From User-Agent: XML-RPC The file can be uploaded to our ftp server: Hostname: enterprise.redhat.com Note: All ftp users are anonymous. No password required. # How to Access the ftp server Note: Provided is one of the many ways to the ftp server. To upload file(s): >lftp enterprise.redhat.com:/incoming >put unique-filename Or >mput unique-filename1 unique-filename2 ... unique-filenameX Let us know what the unique filename is, because the contents of the incoming directory are not viewable. Thanks. This event sent from IssueTracker by kbaxley issue 85922
In all probability, this is the same issue fixed in BZ #175216, presuming that the crash was preceded by "VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day..." The oops location is identical to several of those seen in #175216, where one or more inodes were left "dangling" after the faulty umount, and their inode->i_sb pointers contain a stale references to the freed super_block -- which was subsequently re-allocated as something else. Later on, the inode gets iput(), where the invalid super_block->s_op field is used, and the crash occurs when accessing op->put_inode: void iput(struct inode *inode) { if (inode) { struct super_block *sb = inode->i_sb; struct super_operations *op = NULL; if (inode->i_state == I_CLEAR) BUG(); if (sb && sb->s_op) op = sb->s_op; if (op && op->put_inode) op->put_inode(inode);
Yes it was preceeded by the have a nice day mesg. I can't seem to read this bug, tells me I'm not authorized ? How can I read about it ? Was there a fix ? What action is recommended ? I'm assuming you then no longer need the core file ? If you do, how do I get it to you ?
> Yes it was preceeded by the have a nice day mesg. > I can't seem to read this bug, tells me I'm not authorized ? Ah, sorry, apparently that's a private bugzilla. > How can I read about it ? You can't. > Was there a fix ? Yes. > What action is recommended ? There appears to be a hotfix kernel available that can be used prior to RHEL3-U8. Your SEG or TAM representative can help you with that. > I'm assuming you then no longer need the core file ? It would be nice to confirm it, but probably not absolutely necessary. > If you do, how do I get it to you ? As indicated in comment #21 above (or in the Issue Tracker).
A fix for this problem was committed to the RHEL3 U8 patch pool on 17-Feb-2006 (in kernel version 2.4.21-40.2.EL). *** This bug has been marked as a duplicate of 175216 ***
Adding a couple dozen bugs to CanFix list so I can complete the stupid advisory.
Seems bug is still around even with hot fix kernel 2.4.21-40.2.ELsmp VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Unable to handle kernel paging request at virtual address 5069c79a printing eip: c0182097 *pde = 00000000 Oops: 0000 soundcore ide-cd cdrom nfs nfsd lockd usbserial lp parport netconsole mvfs vnode sunrpc autofs4 e1000 floppy sg microcode keybdev mousedev hid input usb-uhci CPU: 0 EIP: 0060:[<c0182097>] Tainted: PF EFLAGS: 00013206 EIP is at iput [kernel] 0x37 (2.4.21-40.2.ELsmp/i686) eax: 5069c782 ebx: dd7de900 ecx: dd7de910 edx: cb7d8c00 esi: 5069c782 edi: cd7dd800 ebp: cd7dd800 esp: f7f0ff6c ds: 0068 es: 0068 ss: 0068 Process kswapd (pid: 11, stackpage=f7f0f000) Stack: 00000003 f7e25f98 f8e7aae7 cb7d8c18 cb7d8c00 dd7de900 c017f05a dd7de900 dd7de900 c03aac00 00003281 00000000 00000040 c017f568 0000eb19 00000000 c01577f0 00000006 000001d0 00000014 00000000 00000000 0000652d 00000000 Call Trace: [<f8e7aae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f0ff74) [<c017f05a>] prune_dcache [kernel] 0x1ca (0xf7f0ff84) [<c017f568>] shrink_dcache_memory [kernel] 0x68 (0xf7f0ffa0) [<c01577f0>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f0ffac) [<c01579b8>] kswapd [kernel] 0x68 (0xf7f0ffd0) [<c0157950>] kswapd [kernel] 0x0 (0xf7f0ffe4) [<c01095cd>] kernel_thread_helper [kernel] 0x5 (0xf7f0fff0) Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 1c f6 3a c0 8d CPU#0 is executing netdump. CPU#1 is frozen. CPU#2 is frozen. CPU#3 is frozen.