Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 124237

Summary:	Clumanager behaving differently under 2.4.21-15.ELhugemem kernel
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Steve Pierce <spierce>
Component:	clumanager	Assignee:	Lon Hohberger <lhh>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	high	Docs Contact:
Priority:	medium
Version:	3	CC:	cluster-maint
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-07-15 20:32:32 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Steve Pierce 2004-05-24 22:16:33 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Description of problem:
After upgrading the kernel from 2.4.21-9.0.3.ELhugemem to 2.4.21-
15.ELhugemem the behavior of cluster manager has changed. Currently I 
have cluster manager managing an Oracle instance. Under the 2.4.21-
9.0.3.ELhugemem kernel, the dba's used to be able to stop and start 
the managed oracle instances through sqldba. After upgrading the 
kernel to version 2.4.21-15.ELhugemem, the dba's are still able to 
shutdown the databases, but when they try and restart them it causes 
the machine to failover to the backup server. 

Version-Release number of selected component (if applicable):
clumanager-1.2.9-1

How reproducible:
Always

Steps to Reproduce:
1. Manage an Oracle instance using clumanager
2. Shutdown the database using sqldba
3. Restart the instance using sqldba
    

Actual Results:  The clustermanager failed the processes over to the 
backup server

Expected Results:  The database would startup.

Additional info:

The server configuration is as follows:
1) HP DL-740, 8 processors, 35 GB Memory
2) Storage HP XP512 SAN connect with fibre through Emulex LP-9002 
fibre channel cards

The only thing logged in /var/log/messages on the failover server is:

May 23 19:33:38 prod2-rh clusvcmgrd[1057]: <crit> Invalid reply!
May 23 19:33:43 prod2-rh clusvcmgrd[1057]: <crit> Couldn't connect to 
member #0: Connection timed out
May 23 19:34:07 prod2-rh cluquorumd[1012]: <crit> STONITH: Data 
integrity may be compromised!
May

Comment 1 Lon Hohberger 2004-05-25 13:46:07 UTC

The 'Invalid Reply' is a red herring; generally it means the locks
timed out waiting for a response (typically due to slow I/O times). 
In the U2 version, this has been replaced with a <debug> level message
and it properly retries; simply upgrading to the latest erratum may
solve your problems.

If it's reproducible on the latest erratum, you should add this to
your /etc/syslog.conf:

local4.* /var/log/clumanager

and restart syslogd; then reproduce.  /var/log/messages doesn't
generally contain all of the cluster's log messages (if it did, it'd
grow really fast.

You may want to consider buying some power switches.

Comment 2 Lon Hohberger 2004-06-23 18:54:23 UTC

Additionally, you may want to increase your membership failure
detection by several seconds.  You'll want to file a ticket with Red
Hat Support as well:

http://www.redhat.com/apps/support/

It may be a simple matter of re-tuning your failover time.

Any additional information you could provide would be helpful,
specifically logs during reproduction after following the instructions
in the previous comment.

Comment 3 Suzanne Hillman 2004-07-15 20:32:32 UTC

It has been a month that this has been in NEEDINFO. Closing. Please
reopen if there is additional information.

Comment 4 Lon Hohberger 2007-12-21 15:10:01 UTC

Fixing product name.  Clumanager on RHEL3 was part of RHCS3, not RHEL3