Bug 143555

Summary: Want redundant quorum partitions
Product: [Retired] Red Hat Cluster Suite Reporter: Pietro Dania <p.dania>
Component: clumanagerAssignee: Rob Kenna <rkenna>
Status: CLOSED WONTFIX QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: cluster-maint
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-29 18:19:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pietro Dania 2004-12-22 08:43:22 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.3; Linux) (KHTML, like Gecko)

Description of problem:
It would be nice to have redundant quorum partitions.
As for now, both are MANDATORY for cluster services to start.
But what if i have two delocalized storage arrays, Oracle 10g with ASM (datafiles mirroring) and want the cluster to run even if one of them is broken/unpowered/stolen-by-martians ?
Say:
/dev/sda resides on array 1 in Rome
/dev/sdb resides on array 2 in Paris

if Rome gets hit by a meteor the surviving cluster nodes will not start anymore.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Make one of the quorum raw devices unavailable
2.service clumanager start

    

Actual Results:  Starting Red Hat Cluster Manager...
Starting Quorum Daemon:
Message from syslogd@dbrhs2 at Wed Dec 22 08:33:46 2004 ...
dbrhs2 cluquorumd[6538]: <emerg> Not Starting Cluster Manager: Shared State Error: Bad file descriptor
Unable to open /dev/raw/raw2 read/write.
initSharedFD: unable to validate partition /dev/raw/raw2. Configuration error?
                                                           [FAILED]


Expected Results:  Starting Red Hat Cluster Manager...
Starting Quorum Daemon:                                    [  OK  ]


Additional info:

Comment 1 Lon Hohberger 2005-01-03 16:23:53 UTC
Are you intending to use back-end (e.g. storage-handled) data replication?

If so, simply mirror both quorum partitions as well.  Note that
disaster tolerance (e.g. site-disaster) doesn't currently work without
manual intervention.

That is, you can mirror all of the data (as long as the cluster
members' hostnames and service IP addresses can be the same in both
locations) and run the cluster in either place.  When site A fails, an
administrator must (a) prevent site A from restarting the cluster
services and (b) start the cluster services at site B.

In future releases of Cluster Suite, quorum partitions will not be
required.

Evaluating for future enhancement.

Comment 2 Pietro Dania 2005-01-04 12:26:12 UTC
No back-end data replication is available. 
I don't matter about Data, as it's managed by Oracle 10g Advanced 
Storage Management. 
I have 2 node and 2 arrays: node1 and array1 in site A and node2 and 
array2 in site B: 
sda -> quorum on array1 (raw1) 
sdb -> ASM data on array1 
sdc -> quorum on array2 (raw2) 
sdd -> ASM data on array2 
if either sites fail (fire, blackout, meteor) the node in the other 
site also fails (can't reach one of the quorum partition). 
 
I've resolved it this way: 
sda -> Linux raid autodetect on array1 
sdb -> Linux raid autodetect on array1 
sdc -> ASM data on array1 
sdd -> Linux raid autodetect on array2 
sde -> Linux raid autodetect on array2 
sdf -> ASM data on array2 
md0 -> RAID1 (sda sdd) -> quorum partition (raw1) 
md1 -> RAID1 (sdb sde) -> quorum partition (raw2) 
 
seems to work 
but md is not cluster aware; i am prone to incosistency between 
nodes :-( 
maybe using an md as raw pretects me from inconsistency ? 
 
Thanks 

Comment 3 Lon Hohberger 2005-01-10 18:43:21 UTC
I don't think MD as raw will entirely fix this situation.  However, if
you're not actually sharing data (and the only shared data is the
shared clumanager partitions themselves), the chances of an
inconsistency should be very low (or nonexistent).  Clumanager only
shares small pieces of data, and those pieces are typically protected
by a lock.

(That said, running clumanager atop of MD or software LVM partitions 
is not supported.  What is needed for this to work properly is a
clustered LVM.)

Comment 4 Lon Hohberger 2005-09-29 18:19:23 UTC
Clustered LVM is available for RHEL4, and RHEL4 doesn't use shared partitions
for state information.

Since they're not required any longer, I'm closing this against RHCS3.