143555 – Want redundant quorum partitions

Bug 143555 - Want redundant quorum partitions

Summary: Want redundant quorum partitions

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	clumanager
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Rob Kenna
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-12-22 08:43 UTC by Pietro Dania
Modified:	2009-04-16 20:26 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-09-29 18:19:23 UTC
Embargoed:

Attachments	(Terms of Use)

Description Pietro Dania 2004-12-22 08:43:22 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.3; Linux) (KHTML, like Gecko)

Description of problem:
It would be nice to have redundant quorum partitions.
As for now, both are MANDATORY for cluster services to start.
But what if i have two delocalized storage arrays, Oracle 10g with ASM (datafiles mirroring) and want the cluster to run even if one of them is broken/unpowered/stolen-by-martians ?
Say:
/dev/sda resides on array 1 in Rome
/dev/sdb resides on array 2 in Paris

if Rome gets hit by a meteor the surviving cluster nodes will not start anymore.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Make one of the quorum raw devices unavailable
2.service clumanager start

    

Actual Results:  Starting Red Hat Cluster Manager...
Starting Quorum Daemon:
Message from syslogd@dbrhs2 at Wed Dec 22 08:33:46 2004 ...
dbrhs2 cluquorumd[6538]: <emerg> Not Starting Cluster Manager: Shared State Error: Bad file descriptor
Unable to open /dev/raw/raw2 read/write.
initSharedFD: unable to validate partition /dev/raw/raw2. Configuration error?
                                                           [FAILED]


Expected Results:  Starting Red Hat Cluster Manager...
Starting Quorum Daemon:                                    [  OK  ]


Additional info:

Comment 1 Lon Hohberger 2005-01-03 16:23:53 UTC

Are you intending to use back-end (e.g. storage-handled) data replication?

If so, simply mirror both quorum partitions as well.  Note that
disaster tolerance (e.g. site-disaster) doesn't currently work without
manual intervention.

That is, you can mirror all of the data (as long as the cluster
members' hostnames and service IP addresses can be the same in both
locations) and run the cluster in either place.  When site A fails, an
administrator must (a) prevent site A from restarting the cluster
services and (b) start the cluster services at site B.

In future releases of Cluster Suite, quorum partitions will not be
required.

Evaluating for future enhancement.

Comment 2 Pietro Dania 2005-01-04 12:26:12 UTC

No back-end data replication is available. 
I don't matter about Data, as it's managed by Oracle 10g Advanced 
Storage Management. 
I have 2 node and 2 arrays: node1 and array1 in site A and node2 and 
array2 in site B: 
sda -> quorum on array1 (raw1) 
sdb -> ASM data on array1 
sdc -> quorum on array2 (raw2) 
sdd -> ASM data on array2 
if either sites fail (fire, blackout, meteor) the node in the other 
site also fails (can't reach one of the quorum partition). 
 
I've resolved it this way: 
sda -> Linux raid autodetect on array1 
sdb -> Linux raid autodetect on array1 
sdc -> ASM data on array1 
sdd -> Linux raid autodetect on array2 
sde -> Linux raid autodetect on array2 
sdf -> ASM data on array2 
md0 -> RAID1 (sda sdd) -> quorum partition (raw1) 
md1 -> RAID1 (sdb sde) -> quorum partition (raw2) 
 
seems to work 
but md is not cluster aware; i am prone to incosistency between 
nodes :-( 
maybe using an md as raw pretects me from inconsistency ? 
 
Thanks

Comment 3 Lon Hohberger 2005-01-10 18:43:21 UTC

I don't think MD as raw will entirely fix this situation.  However, if
you're not actually sharing data (and the only shared data is the
shared clumanager partitions themselves), the chances of an
inconsistency should be very low (or nonexistent).  Clumanager only
shares small pieces of data, and those pieces are typically protected
by a lock.

(That said, running clumanager atop of MD or software LVM partitions 
is not supported.  What is needed for this to work properly is a
clustered LVM.)

Comment 4 Lon Hohberger 2005-09-29 18:19:23 UTC

Clustered LVM is available for RHEL4, and RHEL4 doesn't use shared partitions
for state information.

Since they're not required any longer, I'm closing this against RHCS3.

Note You need to log in before you can comment on or make changes to this bug.