Bug 178372 - Kernel panic in latest stable kernel for RedHat AS4 w/Oracl 10gr2 and OCFS2
Summary: Kernel panic in latest stable kernel for RedHat AS4 w/Oracl 10gr2 and OCFS2
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Jason Baron
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-01-19 21:21 UTC by Chris Naude
Modified: 2013-03-06 05:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-01-24 18:30:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
node one 1 error log (140.86 KB, text/plain)
2006-01-19 21:21 UTC, Chris Naude
no flags Details
ndoe two error log (3.26 KB, text/plain)
2006-01-19 21:22 UTC, Chris Naude
no flags Details

Description Chris Naude 2006-01-19 21:21:25 UTC
Description of problem:
I have Oracle 10gr2 installed on two nodes. They are clustered using OCFS2. The
Kernel is running in 64bit mode with version 1.0.9 of OCFS2 module. When I issue
a shutdown -r on node 1 both systems panic. It seems that any time a node drops
out of the cluster the other node panics as well. In addition we have opened a
ticket with Oracle.

Version-Release number of selected component (if applicable):
2.6.9-22.0.1.ELsmp  x86_64

How reproducible:
Easily

Steps to Reproduce:
1. Install Red Hat AS4 x86_64 on Intel Xeon
2. Install OCFS2(1.0.9) from Oracle
3. Install Oracle 10gr2.
4. Cluster the two nodes using gigabit interconnect. 
5. Remove one node from cluster by rebooting, or dropping connection.
  
Actual results:
One or both nodes immediately panic.

Expected results:
The untouched node should remain running without causing a kernel panic.

Additional info:

Comment 1 Chris Naude 2006-01-19 21:21:25 UTC
Created attachment 123455 [details]
node one 1 error log

Comment 2 Chris Naude 2006-01-19 21:22:14 UTC
Created attachment 123456 [details]
ndoe two error log

Comment 3 Chris Naude 2006-01-19 22:44:44 UTC
Upon further research it looks like this may be an Oracle o2cb configuration
problem. But then again it isn't quite behaving like I expect the self fencing
to behave... I'll dig further and see what I can find out.

Comment 4 Jason Baron 2006-01-24 02:41:23 UTC
ok. thanks. I did think that there was a chance this was an oracle issue based
on the node 2 error log...I'm going to put this in NEEDINFO, pending an update
from you. thanks.

Comment 5 Chris Naude 2006-01-24 18:30:40 UTC
I got this from Oracle's Bugzilla:

Quite a few bugs have been filed on this issue. See bug#630.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=630

So this looks like a 'feature' and not really a bug. ;) I've modified the
shutdown/reboot scripts so that the ocfs2 module is unmounted at the same time
the network is stopped. This solved the problem for us. It's a little annoying
that the default behavior of the latest ocfs2 rpm's is not quite right, but I've
made it work.



Note You need to log in before you can comment on or make changes to this bug.