From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041215 Firefox/1.0 Red Hat/1.0-12.EL4 Description of problem: RFE: Provide the ability for a service to have exclusive ownership of the node it is running on. If the service is sharing the nodes of a cluster with other services, allow the configuration in such a manner that once it is executing on a node, no other service will be able to use that node as a resource. Nodes with no services (ie backups) can be used as failover nodes for any service, but once the exclusive service is running, those nodes should no longer be available. If no non-exclusive node exists and a failure occurs, then the system will not be able to failover. Version-Release number of selected component (if applicable): How reproducible: Didn't try Additional info:
Idea #1: Part of this could be accomplished by simply adding a "least-services" automatic failover policy to failover domains. So: dom1 {node1, {node6, node7}} dom2 {node2, {node6, node7}} dom3 {node3, {node6, node7}} dom4 {node4, {node6, node7}} dom5 {node5, {node6, node7}} The idea here is that if node1 fails, the least-service-loaded of {node6, node7} will take over the service. Suppose it takes node6. If node2 fails, the service in dom2 it will fail over to node7. However, this would not prevent a third failover from occurring (e.g. two services would be running on either node7 or node6). Idea #2: Adding an 'exclusive' flag to resource groups (services) which says "Only run on a node if and only if that node has no other services running" is another option, and fairly simple. Combined with a restricted failover domain, this would have the desired effect, I think. "Run on node1. If node1 fails, move to node6 or node 7, but only if no other services are running on the target. If node6 and node7 both have services, stop and wait..." It would require some work to make the service start in the case that it was stopped (nothing available) and node6 / node7 becomes available again.
Implemented as follows: A resource group may be tagged as 'exclusive'. This means that: (1) No resource groups will automatically fail over to a node running an exclusive resource group. This means that, for instance, if there are two exclusive resource groups on a 2-node cluster, that there is a loss of availability for one of the resource groups in the event that one of the nodes fails. (2) If no empty nodes are available, the resource group is placed in the 'stopped' state until a node becomes available. Most users would NOT want this option. Manual specification overrides this behavior. In a pinch, an administrator may start an exclusive resource group on a chosen node regardless of the resource group's exclusive flag or any resource groups that node may be running.
This will need a checkbox in the GUI for the "exclusive" attribute for the "resourcegroup" element.