178372 – Kernel panic in latest stable kernel for RedHat AS4 w/Oracl 10gr2 and OCFS2

Bug 178372 - Kernel panic in latest stable kernel for RedHat AS4 w/Oracl 10gr2 and OCFS2

Summary: Kernel panic in latest stable kernel for RedHat AS4 w/Oracl 10gr2 and OCFS2

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jason Baron
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-01-19 21:21 UTC by Chris Naude
Modified:	2013-03-06 05:59 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-01-24 18:30:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
node one 1 error log (140.86 KB, text/plain) 2006-01-19 21:21 UTC, Chris Naude	no flags	Details
ndoe two error log (3.26 KB, text/plain) 2006-01-19 21:22 UTC, Chris Naude	no flags	Details
View All

Description Chris Naude 2006-01-19 21:21:25 UTC

Description of problem:
I have Oracle 10gr2 installed on two nodes. They are clustered using OCFS2. The
Kernel is running in 64bit mode with version 1.0.9 of OCFS2 module. When I issue
a shutdown -r on node 1 both systems panic. It seems that any time a node drops
out of the cluster the other node panics as well. In addition we have opened a
ticket with Oracle.

Version-Release number of selected component (if applicable):
2.6.9-22.0.1.ELsmp  x86_64

How reproducible:
Easily

Steps to Reproduce:
1. Install Red Hat AS4 x86_64 on Intel Xeon
2. Install OCFS2(1.0.9) from Oracle
3. Install Oracle 10gr2.
4. Cluster the two nodes using gigabit interconnect. 
5. Remove one node from cluster by rebooting, or dropping connection.
  
Actual results:
One or both nodes immediately panic.

Expected results:
The untouched node should remain running without causing a kernel panic.

Additional info:

Comment 1 Chris Naude 2006-01-19 21:21:25 UTC

Created attachment 123455 [details]
node one 1 error log

Comment 2 Chris Naude 2006-01-19 21:22:14 UTC

Created attachment 123456 [details]
ndoe two error log

Comment 3 Chris Naude 2006-01-19 22:44:44 UTC

Upon further research it looks like this may be an Oracle o2cb configuration
problem. But then again it isn't quite behaving like I expect the self fencing
to behave... I'll dig further and see what I can find out.

Comment 4 Jason Baron 2006-01-24 02:41:23 UTC

ok. thanks. I did think that there was a chance this was an oracle issue based
on the node 2 error log...I'm going to put this in NEEDINFO, pending an update
from you. thanks.

Comment 5 Chris Naude 2006-01-24 18:30:40 UTC

I got this from Oracle's Bugzilla:

Quite a few bugs have been filed on this issue. See bug#630.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=630

So this looks like a 'feature' and not really a bug. ;) I've modified the
shutdown/reboot scripts so that the ocfs2 module is unmounted at the same time
the network is stopped. This solved the problem for us. It's a little annoying
that the default behavior of the latest ocfs2 rpm's is not quite right, but I've
made it work.

Note You need to log in before you can comment on or make changes to this bug.