Bug 124237 - Clumanager behaving differently under 2.4.21-15.ELhugemem kernel
Clumanager behaving differently under 2.4.21-15.ELhugemem kernel
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: clumanager (Show other bugs)
3
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Lon Hohberger
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-05-24 18:16 EDT by Steve Pierce
Modified: 2009-04-16 16:14 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-07-15 16:32:32 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Steve Pierce 2004-05-24 18:16:33 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Description of problem:
After upgrading the kernel from 2.4.21-9.0.3.ELhugemem to 2.4.21-
15.ELhugemem the behavior of cluster manager has changed. Currently I 
have cluster manager managing an Oracle instance. Under the 2.4.21-
9.0.3.ELhugemem kernel, the dba's used to be able to stop and start 
the managed oracle instances through sqldba. After upgrading the 
kernel to version 2.4.21-15.ELhugemem, the dba's are still able to 
shutdown the databases, but when they try and restart them it causes 
the machine to failover to the backup server. 

Version-Release number of selected component (if applicable):
clumanager-1.2.9-1

How reproducible:
Always

Steps to Reproduce:
1. Manage an Oracle instance using clumanager
2. Shutdown the database using sqldba
3. Restart the instance using sqldba
    

Actual Results:  The clustermanager failed the processes over to the 
backup server

Expected Results:  The database would startup.

Additional info:

The server configuration is as follows:
1) HP DL-740, 8 processors, 35 GB Memory
2) Storage HP XP512 SAN connect with fibre through Emulex LP-9002 
fibre channel cards

The only thing logged in /var/log/messages on the failover server is:

May 23 19:33:38 prod2-rh clusvcmgrd[1057]: <crit> Invalid reply!
May 23 19:33:43 prod2-rh clusvcmgrd[1057]: <crit> Couldn't connect to 
member #0: Connection timed out
May 23 19:34:07 prod2-rh cluquorumd[1012]: <crit> STONITH: Data 
integrity may be compromised!
May
Comment 1 Lon Hohberger 2004-05-25 09:46:07 EDT
The 'Invalid Reply' is a red herring; generally it means the locks
timed out waiting for a response (typically due to slow I/O times). 
In the U2 version, this has been replaced with a <debug> level message
and it properly retries; simply upgrading to the latest erratum may
solve your problems.

If it's reproducible on the latest erratum, you should add this to
your /etc/syslog.conf:

local4.* /var/log/clumanager

and restart syslogd; then reproduce.  /var/log/messages doesn't
generally contain all of the cluster's log messages (if it did, it'd
grow really fast.

You may want to consider buying some power switches.
Comment 2 Lon Hohberger 2004-06-23 14:54:23 EDT
Additionally, you may want to increase your membership failure
detection by several seconds.  You'll want to file a ticket with Red
Hat Support as well:

http://www.redhat.com/apps/support/

It may be a simple matter of re-tuning your failover time.

Any additional information you could provide would be helpful,
specifically logs during reproduction after following the instructions
in the previous comment.
Comment 3 Suzanne Hillman 2004-07-15 16:32:32 EDT
It has been a month that this has been in NEEDINFO. Closing. Please
reopen if there is additional information.
Comment 4 Lon Hohberger 2007-12-21 10:10:01 EST
Fixing product name.  Clumanager on RHEL3 was part of RHCS3, not RHEL3

Note You need to log in before you can comment on or make changes to this bug.