Bug 175728 - Kernel panic. Server hangs and is totally unresponsive until a power cycle brings it back online.
Summary: Kernel panic. Server hangs and is totally unresponsive until a power cycle br...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 168429
TreeView+ depends on / blocked
 
Reported: 2005-12-14 12:22 UTC by Mark Monaghan
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-07 21:05:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0132 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3 2006-03-09 16:31:00 UTC

Description Mark Monaghan 2005-12-14 12:22:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

Description of problem:
One of our developers reported that the system had hung when she was trying to shutdown the Oracle databases we have installed on the server. Searching the logfiles I came over the following error: kernel BUG at mm/prio_tree.c:527! (invalid operand: 0000 [#1])

I think the version of Oracle is 9i, the latest database release, with the 10g web interface loaded in front of this. As the developer is unavailable as I write this, I can't confirm the exact versions. All I know is that it's the latest available Oracle packages. No confirmation on how up to date the patches are either. Sorry.

What happens is that when the scripts are run to start cleanly shutting down the databases, the server drops the remote X console session. Attempts to reconnect fail, and there is no response from the console either. All that I could see that was happening was that the caps lock and scroll lock lights were flashing on and off at a frequency of once per second. Searching for information on any diagnostics lights available for this type of server proved fruitless. I'm unsure if this is the OS or the hardware warning of a problem.

Serching the logfiles showed up the error that's attached later on in this report, to which I have dropped back to the previous kernel release (kernel-smp-2.6.9-11.EL) to see if this resolves the problem. 

Lastly, the system was only running for a rough time of 28 days since the last reboot, and hasn't had this problem before then, or until now. I don't know if the databases were stopped and started before last night or not. Since the last reboot, the only package added to the system was the Veritas Netbackup Enterprise client software (V4.6(FP6) - No agents are installed that allow backups to be made of live and running databases), a few days previous to this happening. I'm unsure if this has anything to do with this or not, but I do know that the backups are running fine, and another two standby RHL ES4 servers with the smp-2.6.9-22.EL kernel are working fine with no problems, although they have no Oracle software installed on them at present.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-22.EL

How reproducible:
Didn't try

Steps to Reproduce:
1. Invoke shutdown script to close the Oracle databases running on the server.


Actual Results:  Have not attempted this yet as the developer knows the script command, and I don't. Will fill in more information here when I get the commands and attempt to recreate this on both versions of the kernel that I'm running on the server.

Expected Results:  Oracle databases should have shutdown cleanly, allowing Vertias to back them up (No agent running to allow backup of live databases).

Additional info:

From the Kernel Log File:

Dec 13 18:01:58 itssoracleweb2 kernel: ------------[ cut here ]------------

Dec 13 18:01:58 itssoracleweb2 kernel: kernel BUG at mm/prio_tree.c:527!

Dec 13 18:01:58 itssoracleweb2 kernel: invalid operand: 0000 [#1]

Dec 13 18:01:58 itssoracleweb2 kernel: SMP 

Dec 13 18:01:58 itssoracleweb2 kernel: Modules linked in: 8021q sg cpqci(U) parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc button battery ac md5 ipv6 ohci_hcd tg3 bcm5700(U) floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod cciss aic7xxx sd_mod scsi_mod

Dec 13 18:01:58 itssoracleweb2 kernel: CPU:    2

Dec 13 18:01:58 itssoracleweb2 kernel: EIP:    0060:[<c0145093>]    Tainted: P      VLI

Dec 13 18:01:58 itssoracleweb2 kernel: EFLAGS: 00210206   (2.6.9-22.0.1.ELsmp) 

Dec 13 18:01:58 itssoracleweb2 kernel: EIP is at vma_prio_tree_add+0x10/0x95

Dec 13 18:01:58 itssoracleweb2 kernel: eax: d55b938c   ebx: d55b938c   ecx: 00000000   edx: 000001f3

Dec 13 18:01:58 itssoracleweb2 kernel: esi: ec69a90c   edi: b6f92000   ebp: d6e8dc80   esp: c9c79ee0

Dec 13 18:01:58 itssoracleweb2 kernel: Untitled 1ds: 007b   es: 007b   ss: 0068

Dec 13 18:01:58 itssoracleweb2 kernel: Process oracle (pid: 15520, threadinfo=c9c79000 task=e795a830)

Dec 13 18:01:58 itssoracleweb2 kernel: Stack: f01a9ddc d55b9124 c014e386 dc399c80 00000000 00000000 00000000 f7295380 

Dec 13 18:01:58 itssoracleweb2 kernel:        f72a015c f72a0140 00000000 b6ea2000 d55b938c d55b9124 b6ea2000 b6f92000 

Dec 13 18:01:58 itssoracleweb2 kernel:        c014f6f2 00000203 d55b938c d6e8dc80 b6ea2000 d55b9124 b6e92000 d6e8dc80 

Dec 13 18:01:58 itssoracleweb2 kernel: Call Trace:

Dec 13 18:01:58 itssoracleweb2 kernel:  [<c014e386>] vma_adjust+0x166/0x2d6

Dec 13 18:01:58 itssoracleweb2 kernel:  [<c014f6f2>] split_vma+0xa2/0xd2

Dec 13 18:01:58 itssoracleweb2 kernel:  [<c014f7f9>] do_munmap+0xd7/0x137

Dec 13 18:01:58 itssoracleweb2 kernel:  [<c014eaed>] do_mmap_pgoff+0x311/0x666

Dec 13 18:01:58 itssoracleweb2 kernel:  [<c010b67f>] sys_mmap2+0x7e/0xaf

Dec 13 18:01:58 itssoracleweb2 kernel:  [<c02d0fb7>] syscall_call+0x7/0xb

Dec 13 18:01:58 itssoracleweb2 kernel: Code: ff ff 8d 4c 24 04 89 e2 89 d8 e8 fa fd ff ff 85 c0 74 d4 eb a2 8b 03 5a 59 5b c3 56 89 d6 8b 4e 4c 53 8b 50 4c 89 c3 39 ca 74 08 <0f> 0b 0f 02 04 4c 2e c0 8b 43 08 2b 43 04 c1 e8 0c 8d 54 02 ff 

Dec 13 18:01:58 itssoracleweb2 kernel:  <0>Fatal exception: panic in 5 seconds

Dec 13 18:01:59 itssoracleweb2 dbus: Can't send to audit system: USER_AVC pid=3360 uid=81 loginuid=-1 message=avc:  denied  { send_msg } for  scontext=user_u:system_r:unconfined_t tcontext=user_u:system_r:initrc_t tclass=dbus

Comment 1 Jason Baron 2005-12-14 14:59:43 UTC
hi Mark,

we are well aware of this issue, and already have the fix in the U3 beta kernel.
I believe U1, -11 kernel and U2, -22, will exhibit this problem, although i
can't say whether or not one is more likely to hit it than the other. If this is
persistent problem i would suggest running the U3 beta kernel,
http://people.redhat.com/~jbaron/rhel4, but that is a beta kernel is yet to go
through a long test period. 

thanks.

*** This bug has been marked as a duplicate of 171778 ***

Comment 2 Red Hat Bugzilla 2006-03-07 21:05:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html



Note You need to log in before you can comment on or make changes to this bug.