From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3.3; Linux) (KHTML, like Gecko) Description of problem: It would be nice to have redundant quorum partitions. As for now, both are MANDATORY for cluster services to start. But what if i have two delocalized storage arrays, Oracle 10g with ASM (datafiles mirroring) and want the cluster to run even if one of them is broken/unpowered/stolen-by-martians ? Say: /dev/sda resides on array 1 in Rome /dev/sdb resides on array 2 in Paris if Rome gets hit by a meteor the surviving cluster nodes will not start anymore. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Make one of the quorum raw devices unavailable 2.service clumanager start Actual Results: Starting Red Hat Cluster Manager... Starting Quorum Daemon: Message from syslogd@dbrhs2 at Wed Dec 22 08:33:46 2004 ... dbrhs2 cluquorumd[6538]: <emerg> Not Starting Cluster Manager: Shared State Error: Bad file descriptor Unable to open /dev/raw/raw2 read/write. initSharedFD: unable to validate partition /dev/raw/raw2. Configuration error? [FAILED] Expected Results: Starting Red Hat Cluster Manager... Starting Quorum Daemon: [ OK ] Additional info:
Are you intending to use back-end (e.g. storage-handled) data replication? If so, simply mirror both quorum partitions as well. Note that disaster tolerance (e.g. site-disaster) doesn't currently work without manual intervention. That is, you can mirror all of the data (as long as the cluster members' hostnames and service IP addresses can be the same in both locations) and run the cluster in either place. When site A fails, an administrator must (a) prevent site A from restarting the cluster services and (b) start the cluster services at site B. In future releases of Cluster Suite, quorum partitions will not be required. Evaluating for future enhancement.
No back-end data replication is available. I don't matter about Data, as it's managed by Oracle 10g Advanced Storage Management. I have 2 node and 2 arrays: node1 and array1 in site A and node2 and array2 in site B: sda -> quorum on array1 (raw1) sdb -> ASM data on array1 sdc -> quorum on array2 (raw2) sdd -> ASM data on array2 if either sites fail (fire, blackout, meteor) the node in the other site also fails (can't reach one of the quorum partition). I've resolved it this way: sda -> Linux raid autodetect on array1 sdb -> Linux raid autodetect on array1 sdc -> ASM data on array1 sdd -> Linux raid autodetect on array2 sde -> Linux raid autodetect on array2 sdf -> ASM data on array2 md0 -> RAID1 (sda sdd) -> quorum partition (raw1) md1 -> RAID1 (sdb sde) -> quorum partition (raw2) seems to work but md is not cluster aware; i am prone to incosistency between nodes :-( maybe using an md as raw pretects me from inconsistency ? Thanks
I don't think MD as raw will entirely fix this situation. However, if you're not actually sharing data (and the only shared data is the shared clumanager partitions themselves), the chances of an inconsistency should be very low (or nonexistent). Clumanager only shares small pieces of data, and those pieces are typically protected by a lock. (That said, running clumanager atop of MD or software LVM partitions is not supported. What is needed for this to work properly is a clustered LVM.)
Clustered LVM is available for RHEL4, and RHEL4 doesn't use shared partitions for state information. Since they're not required any longer, I'm closing this against RHCS3.