Bug 167839 - kernel crashes with an Ooops
Summary: kernel crashes with an Ooops
Keywords:
Status: CLOSED DUPLICATE of bug 175216
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Anderson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix
TreeView+ depends on / blocked
 
Reported: 2005-09-08 19:04 UTC by Sev Binello
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-01-20 20:17:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sysreport info (2.13 MB, application/x-bzip2)
2005-09-08 19:21 UTC, Sev Binello
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Sev Binello 2005-09-08 19:04:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050719 Red Hat/1.7.10-1.1.3.1

Description of problem:
Kernel crashes with the following Oops info...

Sep  4 11:15:18 VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
Sep  4 11:15:18 

Sep  4 11:21:49 Unable to handle kernel paging request at virtual address a16cc79a

Sep  4 11:21:49  printing eip:

Sep  4 11:21:49 c0181257

Sep  4 11:21:49 *pde = 0804e000

Sep  4 11:21:49 Oops: 0000

Sep  4 11:21:49 ide-cd cdrom nfs nfsd lockd sunrpc usbserial lp parport autofs4 e1000 floppy sg 
Sep  4 11:21:49 microcode keybdev mousedev hid input usb-uhci usbcore ext3 jbd raid1 qla2300 q

Sep  4 11:21:49 CPU:    1

Sep  4 11:21:49 EIP:    0060:[<c0181257>]    Not tainted

Sep  4 11:21:49 EFLAGS: 00010286

Sep  4 11:21:49 
Sep  4 11:21:49 

Sep  4 11:21:49 EIP is at iput [kernel] 0x37 (2.4.21-32.0.1.ELsmp/i686)

Sep  4 11:21:49 eax: a16cc782   ebx: e7428a80   ecx: e7428a90   edx: f3dce400

Sep  4 11:21:50 esi: a16cc782   edi: ea56dc00   ebp: 0000c9ba   esp: c4cbdf6c

Sep  4 11:21:50 ds: 0068   es: 0068   ss: 0068

Sep  4 11:21:50 Process kswapd (pid: 11, stackpage=c4cbd000)

Sep  4 11:21:50 Stack: caa77300 c017df70 f8cd4ae7 f3dce418 f3dce400 e7428a80 c017e47a e7428a80 
Sep  4 11:21:50 

Sep  4 11:21:50        e7428a80 c03a7b00 00000cfb 00000000 00000040 c017e848 000185a4 00000000 
Sep  4 11:21:50 

Sep  4 11:21:50        c0157000 00000006 000001d0 00000014 00000000 00000000 00001a61 00000000 
Sep  4 11:21:50 

Sep  4 11:21:50 Call Trace:   [<c017df70>] dput [kernel] 0x30 (0xc4cbdf70)

Sep  4 11:21:50 [<f8cd4ae7>] nfs_dentry_iput [nfs] 0x57 (0xc4cbdf74)

Sep  4 11:21:50 [<c017e47a>] prune_dcache [kernel] 0x18a (0xc4cbdf84)

Sep  4 11:21:50 [<c017e848>] shrink_dcache_memory [kernel] 0x68 (0xc4cbdfa0)

Sep  4 11:21:50 [<c0157000>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xc4cbdfac)

Sep  4 11:21:50 [<c01571c8>] kswapd [kernel] 0x68 (0xc4cbdfd0)

Sep  4 11:21:50 [<c0157160>] kswapd [kernel] 0x0 (0xc4cbdfe4)

Sep  4 11:21:50 [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xc4cbdff0)

Sep  4 11:21:50 

Sep  4 11:21:51 Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 1c c5 3a c0 8d

Sep  4 11:21:51 

Sep  4 11:21:51 Kernel panic: Fatal exception

Sep  4 11:21:51  

Sep  4 11:22:51 Rebooting in 60 seconds..



Version-Release number of selected component (if applicable):
2.4.21-32.0.1.ELsmp

How reproducible:
Couldn't Reproduce


Additional info:

Problem seems similar to bug 167385, but that is with a 2.6 kernel.
No responses noted for that bug.

Comment 1 Sev Binello 2005-09-08 19:21:42 UTC
Created attachment 118605 [details]
sysreport info

Comment 2 Larry Woodman 2005-09-09 13:08:26 UTC
This appears to be corruption of the inode cache.  Is this reproducable and if
so, is the customer willing to run a debug kernel with slab debugging enabled?

Larry Woodman


Comment 3 Sev Binello 2005-09-09 13:46:10 UTC
No, I can't intentionally reproduce it.

We are willing to assist.
Let me know what needs to be done,
an what the impact might be.
Keep in mind this is a production system,
and that we may have to run it in debug for a while
before another crash.
I don't know what "slab" debugging is.

Comment 4 Larry Woodman 2005-09-30 19:04:46 UTC
Sev, can you try to reproduce this problem with the RHEL3-U6 kernel?
We have multiple fixes in that kernel that could prevent inode cache 
corruption.

Larry Woodman


Comment 5 Sev Binello 2005-09-30 20:53:01 UTC
Well, I can't reproduce it even now.
But I guess this means we should upgrade

Comment 8 Ernie Petrides 2005-10-10 21:52:48 UTC
A fix for this problem was committed to the RHEL3 U6 patch pool
on 13-May-2005 (in kernel version 2.4.21-32.4.EL).

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html


*** This bug has been marked as a duplicate of 155289 ***

Comment 9 Sev Binello 2006-01-19 22:31:16 UTC
We have had a similar crash on a different sever even after 
going to U6. Please see bug# 177451

Comment 10 Dave Anderson 2006-01-20 20:17:05 UTC

*** This bug has been marked as a duplicate of 177451 ***

Comment 11 Ernie Petrides 2006-02-23 21:18:32 UTC
A fix for this problem was committed to the RHEL3 U8 patch pool
on 17-Feb-2006 (in kernel version 2.4.21-40.2.EL).


*** This bug has been marked as a duplicate of 175216 ***

Comment 12 Ernie Petrides 2006-04-28 21:50:41 UTC
Adding a couple dozen bugs to CanFix list so I can complete the stupid advisory.

Comment 13 Sev Binello 2006-05-09 14:42:47 UTC
Seems bug is still around even with hot fix kernel   2.4.21-40.2.ELsmp

VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...

Unable to handle kernel paging request at virtual address 5069c79a
 printing eip:
c0182097
*pde = 00000000
Oops: 0000
soundcore ide-cd cdrom nfs nfsd lockd usbserial lp parport netconsole mvfs vnode
sunrpc autofs4 e1000 floppy sg microcode keybdev mousedev hid
input usb-uhci
CPU:    0
EIP:    0060:[<c0182097>]    Tainted: PF
EFLAGS: 00013206
 
EIP is at iput [kernel] 0x37 (2.4.21-40.2.ELsmp/i686)
eax: 5069c782   ebx: dd7de900   ecx: dd7de910   edx: cb7d8c00
esi: 5069c782   edi: cd7dd800   ebp: cd7dd800   esp: f7f0ff6c
ds: 0068   es: 0068   ss: 0068
Process kswapd (pid: 11, stackpage=f7f0f000)
Stack: 00000003 f7e25f98 f8e7aae7 cb7d8c18 cb7d8c00 dd7de900 c017f05a dd7de900
       dd7de900 c03aac00 00003281 00000000 00000040 c017f568 0000eb19 00000000
       c01577f0 00000006 000001d0 00000014 00000000 00000000 0000652d 00000000
Call Trace:   [<f8e7aae7>] nfs_dentry_iput [nfs] 0x57 (0xf7f0ff74)
[<c017f05a>] prune_dcache [kernel] 0x1ca (0xf7f0ff84)
[<c017f568>] shrink_dcache_memory [kernel] 0x68 (0xf7f0ffa0)
[<c01577f0>] do_try_to_free_pages_kswapd [kernel] 0x150 (0xf7f0ffac)
[<c01579b8>] kswapd [kernel] 0x68 (0xf7f0ffd0)
[<c0157950>] kswapd [kernel] 0x0 (0xf7f0ffe4)
[<c01095cd>] kernel_thread_helper [kernel] 0x5 (0xf7f0fff0)
 
Code: 8b 46 18 85 c0 0f 85 d1 02 00 00 c7 44 24 04 1c f6 3a c0 8d
 
CPU#0 is executing netdump.
CPU#1 is frozen.
CPU#2 is frozen.
CPU#3 is frozen.


Comment 14 Dave Anderson 2006-05-09 17:38:48 UTC
What's tainting the kernel?

Comment 15 Sev Binello 2006-05-09 18:17:53 UTC
We have a IBM(Rational) clearcase module installed


Note You need to log in before you can comment on or make changes to this bug.