Description of problem: We use the priority option with our active/active cluster to insure that all of our databases don’t start on the same node. What we’ve noticed is that when a node is fenced and then returns to the cluster the databases are automatically redistributed based on the priority in the cluster.conf. The issue with this behavior is that if a node crashes, comes back up, crashes again and comes back up again the database will be needlessly failed over several times until someone logs into the cluster and stops it. We request an option to have the databases stay put when a failed node returns to the cluster. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This can be manually configured currently by specifying nofailback="1" as part of the failover domain configuration in cluster.conf, for example: <failoverdomains> <failoverdomain name="all" ordered="1" nofailback="1"> <failoverdomainnode name="molly" priority="2"/> <failoverdomainnode name="frederick" priority="1"/> </failoverdomain> </failoverdomains> Note that this option does not work when dealing with a service bound to a domain but currently running outside of its failover domain - for example, if a service was running on a node named "buster", it would currently move to either "molly" or "frederick" if one of those nodes came online irrespective of the nofailback option. The nofailback option is available for RHEL4 and RHEL5
(In reply to comment #2) > The nofailback option is available for RHEL4 and RHEL5 Thanks. Just to be clear -- this feature exists but has been undocumented to date. I assume we will updating the documentation to cover this feature, correct?
re: comment #3. Yes, we will be adding this to the documentation. - Rob
Fix will be backported from RHEL5.
Removing dependency from 182423.
Adding the feature described in bug #333181 introduced a regression that causes an exception to be thrown when updating or adding a failover domain. This makes it impossible to create or update failover domains via conga.
fix verified in 0.11.1-4.el4, GUI option in the Failover Domain subsection of the cluster tab creates the cluster.conf option described above.
This needs to be documented in RHEL 4 Cluster_Administration. Likewise for RHEL 5 Cluster_Administration. Because the RHEL 5 bug for this is closed, I have opened a doc bug for it: bug #450777. Will port change to RHEL 4 doc.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0798.html