Bug 177451 - Kernel panic : Unable to handle kernel paging request at virtual address 6668c79a
Summary: Kernel panic : Unable to handle kernel paging request at virtual address ...
Keywords:
Status: CLOSED DUPLICATE of bug 175216
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Anderson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix
TreeView+ depends on / blocked
 
Reported: 2006-01-10 20:08 UTC by Sev Binello
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-23 21:13:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Attached is the sysreport for the server (1.47 MB, text/plain)
2006-01-10 20:36 UTC, Sev Binello
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Sev Binello 2006-01-10 20:08:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.7.12-1.1.3.2

Description of problem:
Operational file server has crashed twice within a week
with the same panic signature included below.
Nothing unusual known to be occuring when the machine crashed,
except that both times it was preceeded by a busy inodes mesg, see below

Jan 10 09:17:38 
VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...

 Jan 10 09:35:57 
Unable to handle kernel paging request at virtual address 6668c79a
printing eip:
c01815f7
*pde = 00000000
Oops: 0000
qla2300_conf ide-cd cdrom nfsd nfs lockd usbserial lp parport mvfs vnode sunrpc
autofs4 e1000 floppy sg microcode keybdev mousedev hid input usb-uhci usbcore
CPU:    0
EIP:    0060:[<c01815f7>]    Tainted: PF
EFLAGS: 00210206

EIP is at iput [kernel] 0x37 (2.4.21-37.ELsmp/i686)
eax: 6668c782   ebx: eb54bb00   ecx: eb54bb10   edx: f289f380
esi: 6668c782   edi: e50abc00   ebp: 00000f17   esp: f7f0ff6c
ds: 0068   es: 0068   ss: 0068
 Process kswapd (pid: 11, stackpage=f7f0f000)
Stack: efb82100 c017e2c0 f8e5cae7 f289f398 f289f380 eb54bb00 c017e7ca eb54bb00

       eb54bb00 c03a7b80 0000b7c7 00000000 00000040 c017eb98 00011103 00000000

       c0157180 00000006 000001d0 00000014 00000000 00000000 0001768a 00000000

Call Trace:   [<c017e2c0>] dput [kernel] 0x30 (0xf7f0ff70)
[<f8e5cae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f0ff74)
[<c017e7ca>] prune_dcache [kernel] 0x18a (0xf7f0ff84)
[<c017eb98>] shrink_dcache_memory [kernel] 0x68 (0xf7f0ffa0)
[<c0157180>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f0ffac)
[<c0157348>] kswapd [kernel] 0x68 (0xf7f0ffd0)

[<c01572e0>] kswapd [kernel] 0x0 (0xf7f0ffe4)
[<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xf7f0fff0)

Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 9c c5 3a c0 8d
Kernel panic: Fatal exception

Rebooting in 60 seconds..


Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-37.EL

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
No known way to reproduce crash

Additional info:

Comment 1 Sev Binello 2006-01-10 20:36:01 UTC
Created attachment 123010 [details]
Attached is the sysreport for the server

Also is a panic report from the previous crash
It is identical to the one we had today.

Jan  6 04:11:16 VFS: Busy inodes after unmount. Self-destruct in 5 seconds. 
Have a nice day...

Jan  6 04:11:16  Jan  6 04:26:24 Unable to handle kernel paging request at
virtual address 6668c79a

Jan  6 04:26:24  printing eip:

Jan  6 04:26:24 c0181257

Jan  6 04:26:24 *pde = 00000000

Jan  6 04:26:25 Oops: 0000

Jan  6 04:26:25 mvfs vnode nfsd nfs lockd sunrpc usbserial lp parport autofs4
e1000 floppy sg mi
Jan  6 04:26:25 crocode keybdev mousedev hid input usb-uhci usbcore ext3 jbd
raid1 qla2300 qla

Jan  6 04:26:25 CPU:	1

Jan  6 04:26:25 EIP:	0060:[<c0181257>]    Tainted: PF

Jan  6 04:26:25 EFLAGS: 00010206

Jan  6 04:26:25

Jan  6 04:26:25 EIP is at iput [kernel] 0x37 (2.4.21-32.0.1.ELsmp/i686)

Jan  6 04:26:25 eax: 6668c782	ebx: f2665980	ecx: f2665990	edx: eea8f600

Jan  6 04:26:25 esi: 6668c782	edi: ee457400	ebp: 00010ae4	esp: f7f03f6c

Jan  6 04:26:25 ds: 0068   es: 0068   ss: 0068

Jan  6 04:26:25 Process kswapd (pid: 11, stackpage=f7f03000)

Jan  6 04:26:25  Jan  6 04:26:25 Stack: f1b89d00 c017df70 f8cb5ae7 eea8f618
eea8f600 f2665980 c017e47a f2665980

Jan  6 04:26:25  Jan  6 04:26:25	f2665980 c03a7b00 00007c49 00000000
00000040 c017e848 00011d2d 00000000

Jan  6 04:26:25  Jan  6 04:26:25	c0157000 00000006 000001d0 00000014
00000000 00000000 0000f8d8 00000000

Jan  6 04:26:25  Jan  6 04:26:25 Call Trace:   [<c017df70>] dput [kernel] 0x30
(0xf7f03f70)

Jan  6 04:26:25 [<f8cb5ae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f03f74)

Jan  6 04:26:26 [<c017e47a>] prune_dcache [kernel] 0x18a (0xf7f03f84)

Jan  6 04:26:26 [<c017e848>] shrink_dcache_memory [kernel] 0x68 (0xf7f03fa0)

Jan  6 04:26:26 [<c0157000>] do_try_to_free_pages_kswapd [kernel] 0x150
(0xf7f03fac)

Jan  6 04:26:26 [<c01571c8>] kswapd [kernel] 0x68 (0xf7f03fd0)

Jan  6 04:26:26 [<c0157160>] kswapd [kernel] 0x0 (0xf7f03fe4)

Jan  6 04:26:26 [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xf7f03ff0)

Jan  6 04:26:26
Jan  6 04:26:26  Jan  6 04:26:26 Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44
24 04 1c c5 3a c0 8d

Jan  6 04:26:26

Jan  6 04:26:26 Kernel panic: Fatal exception

Comment 2 Ernie Petrides 2006-01-10 23:35:07 UTC
Can this problem be reproduced without a tainted kernel?

Comment 3 Sev Binello 2006-01-11 15:16:51 UTC
We can't intentionally reproduce this error period.
I can only say we have 2 servers with mvfs modules.
Only one so far has crashed.
Is there any reason to think it is related to the mvfs module ?

Comment 4 Dave Anderson 2006-01-11 15:29:11 UTC
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds.

> Is there any reason to think it is related to the mvfs module ?

What filesystem umount caused this message?




Comment 6 Sev Binello 2006-01-11 15:38:18 UTC
No idea.
Are you saying it was mvfs ?

Comment 7 Dave Anderson 2006-01-11 15:47:59 UTC
> Are you saying it was mvfs ?

No, I'm just asking.  Without a core dump there's no way of telling; and
even with a core dump, it still may also be impossible to tell, and would
require a debug kernel.

But when that "VFS Busy Inodes" message occurs, it means that there are
one or more leftover inode(s) from the unmounted filesystem hanging around,
containing stale pointers, and eventually some other entity is going to run
into them, and cause a subsequent crash like you're seeing.  

The "self-destruct" message is letting you know to expect disaster
in the near future.



Comment 8 Sev Binello 2006-01-11 16:20:49 UTC
Okay, we have started the netdump utility.

Comment 9 Sev Binello 2006-01-11 16:38:02 UTC
Can you tell me if the immediately preceeding unmount,
is the cause of the busy inodes error message.
Or, is there not such a sequential relationship ?

Comment 10 Dave Anderson 2006-01-11 16:54:44 UTC
Yes -- if during an unmount, all attempts to flush all of the inodes in
that filesystem fails (which should never happen normally), you will
get that message.  Those "dangling" inodes remain on in-kernel inode
lists that are later parsed and dealt with, at which time the stale
(freed) super_block pointer in the inode is used.  Depending upon what
happened to the memory previously used by the freed super_block, these
types of crashes will occur.

This is from the kill_super() function, which is called during the
umount system call:

        if (invalidate_inodes(sb)) {
                printk(KERN_ERR "VFS: Busy inodes after unmount. "
                        "Self-destruct in 5 seconds.  Have a nice day...\n");
        }



Comment 11 Sev Binello 2006-01-11 18:18:40 UTC
Here are the umounts in the message file immediatly preceeding the "have a nice
day" message for both crashes. They refer to /cfsad which is an ext3 file system
that just contains user home areas

Jan  6 04:10:53 acnlin80 rpc.mountd: authenticated unmount request from
acnlin22.pbn.bnl.gov:967 for /cfsad (/cfsad)
Jan  6 04:10:55 acnlin80 rpc.mountd: authenticated unmount request from
acnlin43.pbn.bnl.gov:724 for /cfsad (/cfsad)
Jan  6 04:11:03 acnlin80 kernel: VFS: Busy inodes after unmount. Self-destruct
in 5 seconds.  Have a nice day...

Jan 10 09:17:11 acnlin80 rpc.mountd: authenticated unmount request from
acnlin43.pbn.bnl.gov:626 for /cfsad (/cfsad)
Jan 10 09:17:31 acnlin80 kernel: VFS: Busy inodes after unmount. Self-destruct
in 5 seconds.  Have a nice day...

Comment 12 Dave Anderson 2006-01-11 18:38:13 UTC
Ok, so we wait for a vmcore, since the oops messages don't
give us anything to debug.  


Comment 16 Sev Binello 2006-01-19 20:18:33 UTC
This latest problem is almost identical to one we reported 
last September 167839 รข kernel crashes with an Ooops.
At the time we were told the problem was fixed in U6.
We have upgraded our systems, but as you see the
problem persists.

Comment 17 Dave Anderson 2006-01-20 14:32:23 UTC
Agreed -- the NEEDINFO_REPORTER refers to the vmcore request
in comment #12.



Comment 18 Dave Anderson 2006-01-20 20:17:18 UTC
*** Bug 167839 has been marked as a duplicate of this bug. ***

Comment 19 Sev Binello 2006-02-23 00:21:10 UTC
We had the same problem today 
I'm having problems attaching   a core dump
says it's too large. How do I get it to you ?
We also had the added complexity that we couln't reboot
We kept getting the following message.
Not sure what the relationship is to the original crash
 Kernel panic: no init found. Try passing init= option to kernel.

here is the oops...
Unable to handle kernel paging request at virtual address 6668c79a
 printing eip:
c01815f7
*pde = 00000000
Oops: 0000
netconsole iptable_nat ip_conntrack iptable_filter ip_tables ide-cd cdrom nfsd
nfs lockd usbserial lp parport mvfs vnode sunrpc autofs4 e1000 floppy sg microc
CPU:    2
EIP:    0060:[<c01815f7>]    Tainted: PF
EFLAGS: 00010206
 
EIP is at iput [kernel] 0x37 (2.4.21-37.ELsmp/i686)
eax: 6668c782   ebx: e73e5480   ecx: e73e5490   edx: d8910880
esi: 6668c782   edi: e92b8400   ebp: 000077a4   esp: f7f0ff6c
ds: 0068   es: 0068   ss: 0068
Process kswapd (pid: 11, stackpage=f7f0f000)
Stack: c9d42180 c017e2c0 f8e5cae7 d8910898 d8910880 e73e5480 c017e7ca e73e5480
       e73e5480 c03a7b80 00003d59 00000000 00000040 c017eb98 00011928 00000000
       c0157180 00000006 000001d0 00000014 00000000 00000000 00007ae0 00000000
Call Trace:   [<c017e2c0>] dput [kernel] 0x30 (0xf7f0ff70)
[<f8e5cae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f0ff74)
[<c017e7ca>] prune_dcache [kernel] 0x18a (0xf7f0ff84)
[<c017eb98>] shrink_dcache_memory [kernel] 0x68 (0xf7f0ffa0)
[<c0157180>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f0ffac)
[<c0157348>] kswapd [kernel] 0x68 (0xf7f0ffd0)
[<c01572e0>] kswapd [kernel] 0x0 (0xf7f0ffe4)
[<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xf7f0fff0)
                                                                               
                                                 
Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 9c c5 3a c0 8d
 
CPU#0 is frozen.
CPU#1 is frozen.
CPU#2 is executing netdump.
CPU#3 is frozen.
< netdump activated - performing handshake with the server. >


Comment 21 Issue Tracker 2006-02-23 02:30:43 UTC
From User-Agent: XML-RPC

The file can be uploaded to our ftp server:

Hostname: enterprise.redhat.com

Note: All ftp users are anonymous. No password required.
# How to Access the ftp server

Note: Provided is one of the many ways to the ftp server.
To upload file(s):
>lftp enterprise.redhat.com:/incoming
>put unique-filename
Or
>mput unique-filename1 unique-filename2 ... unique-filenameX


Let us know what the unique filename is, because the contents of the
incoming directory are not viewable.

Thanks.


This event sent from IssueTracker by kbaxley 
 issue 85922

Comment 22 Dave Anderson 2006-02-23 14:10:56 UTC
In all probability, this is the same issue fixed in BZ #175216,
presuming that the crash was preceded by "VFS: Busy inodes after
unmount. Self-destruct in 5 seconds.  Have a nice day..."

The oops location is identical to several of those seen in #175216,
where one or more inodes were left "dangling" after the faulty umount,
and their inode->i_sb pointers contain a stale references to the freed
super_block -- which was subsequently re-allocated as something else.
Later on, the inode gets iput(), where the invalid super_block->s_op field
is used, and the crash occurs when accessing op->put_inode: 

void iput(struct inode *inode)
{
        if (inode) {
                struct super_block *sb = inode->i_sb;
                struct super_operations *op = NULL;

                if (inode->i_state == I_CLEAR)
                        BUG();

                if (sb && sb->s_op)
                        op = sb->s_op;
                if (op && op->put_inode)
                        op->put_inode(inode);


Comment 23 Sev Binello 2006-02-23 15:12:39 UTC
Yes it was preceeded by the have a nice day mesg.
I can't seem to read this bug, tells me I'm not authorized ?
How can I read about it ?
Was there a fix ?
What action is recommended ?
I'm assuming you then no longer need the core file ?
If you do, how do I get it to you ?


Comment 24 Dave Anderson 2006-02-23 15:23:24 UTC
> Yes it was preceeded by the have a nice day mesg.
> I can't seem to read this bug, tells me I'm not authorized ?

Ah, sorry, apparently that's a private bugzilla.

> How can I read about it ?

You can't.

> Was there a fix ?

Yes.

> What action is recommended ?

There appears to be a hotfix kernel available that can be used prior to
RHEL3-U8.  Your SEG or TAM representative can help you with that.

> I'm assuming you then no longer need the core file ?

It would be nice to confirm it, but probably not absolutely necessary.

> If you do, how do I get it to you ?

As indicated in comment #21 above (or in the Issue Tracker).


Comment 26 Ernie Petrides 2006-02-23 21:13:34 UTC
A fix for this problem was committed to the RHEL3 U8 patch pool
on 17-Feb-2006 (in kernel version 2.4.21-40.2.EL).

*** This bug has been marked as a duplicate of 175216 ***

Comment 27 Ernie Petrides 2006-04-28 21:47:04 UTC
Adding a couple dozen bugs to CanFix list so I can complete the stupid advisory.

Comment 28 Sev Binello 2006-05-09 14:44:47 UTC
Seems bug is still around even with hot fix kernel   2.4.21-40.2.ELsmp

VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...

Unable to handle kernel paging request at virtual address 5069c79a
 printing eip:
c0182097
*pde = 00000000
Oops: 0000
soundcore ide-cd cdrom nfs nfsd lockd usbserial lp parport netconsole mvfs vnode
sunrpc autofs4 e1000 floppy sg microcode keybdev mousedev hid
input usb-uhci
CPU:    0
EIP:    0060:[<c0182097>]    Tainted: PF
EFLAGS: 00013206
 
EIP is at iput [kernel] 0x37 (2.4.21-40.2.ELsmp/i686)
eax: 5069c782   ebx: dd7de900   ecx: dd7de910   edx: cb7d8c00
esi: 5069c782   edi: cd7dd800   ebp: cd7dd800   esp: f7f0ff6c
ds: 0068   es: 0068   ss: 0068
Process kswapd (pid: 11, stackpage=f7f0f000)
Stack: 00000003 f7e25f98 f8e7aae7 cb7d8c18 cb7d8c00 dd7de900 c017f05a dd7de900
       dd7de900 c03aac00 00003281 00000000 00000040 c017f568 0000eb19 00000000
       c01577f0 00000006 000001d0 00000014 00000000 00000000 0000652d 00000000
Call Trace:   [<f8e7aae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f0ff74)
[<c017f05a>] prune_dcache [kernel] 0x1ca (0xf7f0ff84)
[<c017f568>] shrink_dcache_memory [kernel] 0x68 (0xf7f0ffa0)
[<c01577f0>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f0ffac)
[<c01579b8>] kswapd [kernel] 0x68 (0xf7f0ffd0)
[<c0157950>] kswapd [kernel] 0x0 (0xf7f0ffe4)
[<c01095cd>] kernel_thread_helper [kernel] 0x5 (0xf7f0fff0)
 
Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 1c f6 3a c0 8d
 
CPU#0 is executing netdump.
CPU#1 is frozen.
CPU#2 is frozen.
CPU#3 is frozen.




Note You need to log in before you can comment on or make changes to this bug.